r/StableDiffusion – Telegram

r/StableDiffusion

@rStableDiffusion

55 subscribers

45.3K photos

2.97K videos

1 file

21K links

reddit.com/r/StableDiffusion || reddit.com/r/sdforall

@reddit2telegram || @r_channels

Download Telegram

About

Blog

Apps

Platform

r/StableDiffusion

r/StableDiffusion

Chroma1-HD Character Transfer with Flux.2 Dev

Chroma1-HD with Flux.2 Dev character transfer

This workflow gives multi-modal capabilities to open-source image models. In particular, this workflow combines a text-to-image workflow (Comfy's official Chroma1-HD workflow) and an image-to-image workflow (Comfy's official Flux.2 Dev workflow).

Link to workflow: https://huggingface.co/ussaaron/workflows/blob/main/chroma\_flux\_character\_transfer.json

This workflow is the final result of a ton of experimentation to solve one problem: Using an image reference for a consistent character kneecaps the creativity of an image model. For example, if I want to create a cool cinematic shot with a specific style, including an image reference will reduce the image model's style output into a pretty narrow lane. Generally, the final image will share most of the stylistic elements present in the character image and that's not ideal.

I selected the models for this workflow, because after a ton of testing, I determined that they are the best for each modality. I concluded that Chroma1-HD is the best open source model for style flexibility and professional photography. I concluded that Flux.2 Dev is the best open source model for facial fidelity and character consistency.

However, just combining these two models is not enough to produce a consistent character transfer solution. I also structured the prompts for both sides of the workflow in a specific way to ensure cohesion from end-to-end. The full prompts are included in the workflow for you to check out.

And here's how it went.

This is my character reference for Crystal Sparkle - a Sora character. I made a 1980's style model composite of her with an 80's hairstyle (make sure your character has a hairstyle consistent with the era in your Chroma image).

Model composite for Crystal Sparkle

This is the output of the Chroma prompt for a blonde woman wandering through a post-apocalyptic New York City inspired by 1980s grindhouse and sci-fi b-movies.

Choma1-HD Text-to-image output

This is the Flux.2 Dev output after completing the character transfer for Crystal Sparkle.

Flux.2 Dev Image-to-image output

The final result is exactly what I wanted. The Chroma1-HD style, grain, grunge elements were retained and Crystal was cleanly added into the shot. This example is just one of thousands of possibilities that are now available with Chroma1-HD.

Note: The settings in this workflow are tuned more for people that want professional photography output. All the settings can be dialed back as needed. Also, there are a few optional LoRAs that can be removed as needed.

Let me know if you have any questions. Cheers!

https://redd.it/1tbdj5o
@rStableDiffusion

8 views21:40

r/StableDiffusion

"Masked Generative Transformer Is What You Need for Image Editing"
https://github.com/weichow23/EditMGT

https://redd.it/1tbahnp
@rStableDiffusion

GitHub - weichow23/EditMGT: Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image…

Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing> - weichow23/EditMGT

7 views22:40

r/StableDiffusion

Maybe Krea 2 will be open source.
https://redd.it/1tbhpng
@rStableDiffusion

7 views23:40

r/StableDiffusion

Anima Question

Loving the Anima model with various lora's etc, but sometimes running it without LORA's produces some interesting styles.

Is there any way to extract the style when it's from the models "brain"? or do I just post it and hope someone knows?

Cheers.

https://redd.it/1tbhzl5
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views00:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Cel animation outpainting: Avatar: The Last Airbender 4:3 -> 16:9 with no crop

https://redd.it/1tbjinj
@rStableDiffusion

6 views01:40

r/StableDiffusion

I implemented NegPip on the Z-image series.
https://redd.it/1tblwvx
@rStableDiffusion

14 views02:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

OmniNFT: A LoRA that improves the quality of LTX-2.

https://redd.it/1tbnlbo
@rStableDiffusion

6 views03:40

r/StableDiffusion

I built a local GUI + AI builder for creating ComfyUI custom node packs

I've been working on ComfyUI Node Builder, a local app for building custom ComfyUI nodes without hand-writing all the boilerplate every time.

The demo shows:

1. user describes a node idea
2. AI creates the node contract and Python
3. dependencies/files are updated
4. the pack is deployed and tested in ComfyUI

It is open-source and local. The AI Builder can create nodes, edit generated files, explain validation errors, run checks, and request deploy only when deploy permission is enabled.

GitHub:
https://github.com/caoool/comfyui-node-canvas

Landing page:
https://caoool.github.io/comfyui-node-canvas/

Node ideas and feedback:
https://github.com/caoool/comfyui-node-canvas/issues/2

I'd especially like feedback from people who build custom nodes: what node authoring workflow should this support next?

https://redd.it/1tbk8zv
@rStableDiffusion

GitHub - caoool/comfyui-node-canvas: AI-powered GUI app for building, editing, deploying, and testing ComfyUI custom nodes and…

AI-powered GUI app for building, editing, deploying, and testing ComfyUI custom nodes and node packs. - caoool/comfyui-node-canvas

6 views04:40

r/StableDiffusion

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
https://github.com/zghhui/OmniNFT

https://redd.it/1tbmfzm
@rStableDiffusion

GitHub - zghhui/OmniNFT: Code for "OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation"

Code for "OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation" - zghhui/OmniNFT

6 views05:40

r/StableDiffusion

LTX 2.3 INT8 Benchmarks (2x Faster on Ampere)

Saw some interest in INT8 for LTX 2.3 after my last post, so here are the resources.

>Quick Warning: INT8 acceleration is specifically effective for Ampere GPUs (e.g., RTX 3080 Ti). If you’re already rocking an RTX 5090, you can safely ignore this.

The setup is easy—only the model loading part of the workflow changes. Everything else stays the same.

https://preview.redd.it/p1kqwomsgu0h1.png?width=931&format=png&auto=webp&s=626a72c691107d452a492acb4e1f3c169c7490e1

Performance Gain:

Stock: 118.77s

INT8: 66.45s

Result: \~2x speedup 🚀

Links:

weight & comfyui workflow

custom node

https://redd.it/1tbqxb5
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views06:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

LTX2.3 I2V Messing up the text details, anyone facing the same??

https://redd.it/1tbpd7h
@rStableDiffusion

13 views07:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

LTX 2.3 adding unwanted subtitles in generated videos even when not mentioned in prompt

https://redd.it/1tbrsf7
@rStableDiffusion

6 views08:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

Scenema Audio: Zero-shot expressive voice cloning and speech generation

https://redd.it/1tbzgi3
@rStableDiffusion

14 views13:40

r/StableDiffusion

ComfyUI Pixaroma Nodes: New Load Image, Notify & Utility Nodes (Ep17)
https://www.youtube.com/watch?v=dXH7Qx9pzyc

https://redd.it/1tc2fuz
@rStableDiffusion

ComfyUI Pixaroma Nodes: New Load Image, Notify & Utility Nodes (Ep17)

In this episode, I’ll show you the latest updates in the Pixaroma node pack for ComfyUI and Easy Install. We’ll look at the new Pixaroma Load Image node, new Copy and Open buttons, filename outputs, date-based save folders, smarter image resizing, width and…

5 views15:40

r/StableDiffusion

v13 vs. v14 - Coming Soon

https://redd.it/1tc3fwb
@rStableDiffusion

From the StableDiffusion community on Reddit: v13 vs. v14 - Coming Soon

Explore this post and more from the StableDiffusion community

6 views16:40

r/StableDiffusion

5 views16:40

r/StableDiffusion

LTX 2.3 video generation notes after testing H100, RTX 5090, A100, L40, FP8, BF16, and CPU offload

This community helped me a lot in my last post so here's my contribution back. If you're looking to generate LTX 2.3 videos, these notes might save you a few hundred dollars on wasted cloud rentals.

H100:

\- 5s distilled FP8, 704x1280, 121f: 48s

\- 5s distilled no-quant, 704x1280, 121f: 45s

\- 5s HQ/no-quant, 704x1280, 121f, 20 steps: 121s

\- 20s HQ/no-quant, 704x1280, 481f, 20 steps: 321s

\- 20s HQ/no-quant, 704x1280, 481f, 28 steps: 380-390s

RTX 5090:

\- 5s distilled FP8, 704x1280, 121f: 43s

\- 5s HQ FP8, 704x1280, 121f, 20 steps: 151s

\- 20s distilled FP8, 704x1280, 481f: failed/OOM after 55s

\- 20s distilled FP8, 576x1024, 481f: 104s

\- 20s distilled, no quantization, CPU offload, 704x1280, 481f: 299s

A100:

\- 5s image-conditioned, 704x1280: 401-425s

\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless render step: 608s

\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless remote total: 713s

\- 20s HQ/no-quant, 704x1280, 481f, 20 steps, serverless local wall time: 797s

L40:

(I left a note about this in the lessons paragraph below.)

\- 5s distilled, no quantization, CPU offload, 704x1280, 121f: 1199s

\- 5s distilled FP8, 704x1280, 121f: 197s

\- 20s distilled FP8, 704x1280, 481f, max batch 4: failed/OOM after 189s

\- 20s distilled FP8 low-memory, 704x1280, 481f, max batch 1: 365s

\- 20s distilled FP8 low-memory, 704x1280, 481f, repeated runs: 433-453s

Some lessons:

\- For some reason, the output of A100 was worse than H100 for exact setup. I generated around 20 videos on each GPU from the same cloud host and A100 output was always worse. A100 scenes were less realistic than H100.

\- I did not like 5090 results on distilled + FP8. Distilled with offloading to CPU RAM is better.

- The L40 cloud I rented could generate 20s 704x1280 clips, but only with a lower-memory FP8 setup for some reason. I am guessing the cloud rental device was not in the best state.

\- For spoken words, try to target around 45-52 words per 20 seconds.

\- Avoid ending with important words. The model sometimes cuts off the final syllable. A short final sentence helps.

I am still exploring this so feel free to let me know if there's anything additional I can do. Happy to contribute to the community if you're looking for any generated samples or examples.

https://redd.it/1tc5s73
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views17:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

trying more serious TNG content with LTX2.3

https://redd.it/1tc70et
@rStableDiffusion

4 views18:40