r/StableDiffusion

Is it possible to add audio to a WAN video with LTX?

I prefer WAN over LTX. It would be nice to add audio to WAN.

https://redd.it/1ti5pvo
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views01:40

r/StableDiffusion

Vibecoded a SPEED sampler for Anima in ComfyUI

https://redd.it/1tiff8k
@rStableDiffusion

From the StableDiffusion community on Reddit: Vibecoded a SPEED sampler for Anima in ComfyUI

Explore this post and more from the StableDiffusion community

5 views08:40

r/StableDiffusion

5 views08:40

r/StableDiffusion

Released a Safe Chunked Image Blend node for ComfyUI — explicit CUDA resize/blend instead of hidden full-batch CPU resizing

I put together a small ComfyUI custom node called Safe Chunked Image Blend.

The short version: it is a replacement-style image blend node for cases where large image/video tensors get unstable, slow, or freeze when a blend node silently resizes one input to match the other.

GitHub:

https://github.com/xmarre/ComfyUI-Safe-Chunked-Image-Blend

Also available via ComfyUI-Manager.

The issue I ran into was with large upscaled image/video workflows. The standard blend path follows the device of the incoming image tensor. If the images arrive as CPU float32 tensors, the resize and blend happen on CPU. If the two inputs are different sizes, the resize can happen inside the blend node without being very obvious.

That can turn into a bad path like:

image1 = (2, 5464, 3800, 3)
image2 = (2, 2732, 1900, 3)

hidden resize:
image2 -> 5464x3800
then blend

For big batched tensors, especially in video/upscale workflows, that can be a large CPU resize/blend operation, which in turn can then freeze/wedge WSL ComfyUI setups.

This node makes that behavior explicit.

What it does:

CPU input tensors
-> move one chunk/frame to CUDA if requested
-> resize the mismatched input explicitly
-> blend
-> copy the finished chunk back to CPU float32

Main features:

explicit `compute_device`: `cuda`, `cpu`, `image1`, or `image2`
explicit resize policy:
error if sizes mismatch
resize image2 to image1
resize image1 to image2
chunked processing by batch/frame
output preallocation instead of concatenating large temporary chunks
optional CUDA sync per chunk for easier debugging
detailed logs showing shape, dtype, device, resize step, blend step, and output copy
includes an Image Pair Shape Probe helper node for checking both input tensor shapes/devices

Recommended starting settings for large upscale/video blends:

resizepolicy = resizeimage2toimage1
resizemethod = bilinear
chunksize = 1
computedevice = cuda
outputcpufloat32 = true
synchronizeeachchunk = true
emptycudacacheeachchunk = false
logprogress = true

I would start with bilinear first. Bicubic is heavier, and I would only switch to it after confirming the workflow is stable.

https://redd.it/1tig51k
@rStableDiffusion

GitHub

GitHub - xmarre/ComfyUI-Safe-Chunked-Image-Blend: Safe Chunked Image Blend for ComfyUI: explicit CUDA/CPU resize+blend, per-frame…

Safe Chunked Image Blend for ComfyUI: explicit CUDA/CPU resize+blend, per-frame chunking, shape checks, and logging for large batched IMAGE tensors. - xmarre/ComfyUI-Safe-Chunked-Image-Blend

5 views10:40

r/StableDiffusion

ComfyUI HiDream text->image and image-edit templates - multiple reference image facility. Discuss please.

A recent ComfyUI update has included the two new HiDream templates mentioned in the title. I should welcome responses to the following questions.

1. The general pros and cons of HiDream.
2. Use of multiple reference images. How best to organise? How many? How to integrate with textual instructions?
3. Is the use of multiple reference images implemented for other visual AI models?

https://redd.it/1tihygu
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views12:40

r/StableDiffusion

How to achieve this style where the face is anime but the body is a realistic 3D render?

https://redd.it/1tiksdz
@rStableDiffusion

From the StableDiffusion community on Reddit: How to achieve this style where the face is anime but the body is a realistic 3D…

Explore this post and more from the StableDiffusion community

6 views13:40

r/StableDiffusion

6 views13:40

r/StableDiffusion

LTX Color Shifting

reference image

I'm having a problem with color changing basically since I started usng the id lora node with LTX 2.3, even though I don't think he is behind this, but every generation since then is iffy. At first, it started by color changing when the video progressed, now it became less and less perceptible since I reduced the reference image size and increased the weight in "LTXImgToVideoInplace" at the upscale stage to values above one. But the results still iffy. The problem always happens at the upscale stage, regardless of the upscaler I'm using, here are some of the examples of how it is supposed to be and how it is now.

working example

color shift example

https://redd.it/1tijjkf
@rStableDiffusion

5 views14:40

r/StableDiffusion

Worth Upgrading just GPU or entire System needs upgrade?

Hello,
Read some about different GPU's and ram requirement and see some conflicting stuff. But i think my system needs full upgrade, just wanna confirmation, before overspending.

Right now I have Ryzen 3600 CPU, AMD R5700 GPU 8GB and 32 GB ram (4x 8gb) mobo is MSI Gaming Plus B450 so PCIE 3.0 slot, 650W Corsairs RMx PSU

So idea was maybe to get a 5060 TI 16GB or 5070TI 16GB (as what i read, dont bother with AMD, Intel if you want out of the box working and less tinkering and Windows)

Also have access to wife's PC that is AMD 5600 CPU, 5060 8GB GPU and 16GB ram, B550M Pro-VDH motherboard has PCIE 4.0

So Worth to get a 16GB GPU in either system with 32GB ram. Or also need 64GB ram? Or better get a newer AM5 system with like 64-128GB ram and 16GB card?

A used 3090 here is around 800eur, refurb 900+eur
64GB ram DDR4 - 500eur
5060TI 16GB - 600eur
5070TI 16GB - 1000eur
5080 - 1.5k
5090 - 3.6k and up :D

Would like to have Image and Video gen, TTS, make consistent chars, images with same char, like comic etc :) Try new stuff.

New system would cost me like 3k with a 5070TI, with 64GB ram, new PSU to support 2 GPU's and Taichi Motherboard (as wanna try local LLM later also)

But for now, i would like to see if i can get by with existing system and if its even worth trying, or need to save up a bit and get a complete new system.

Thanks for answers and help :)

https://redd.it/1tiljco
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views15:40

r/StableDiffusion

Announcing the release of Stable Audio 3!

Taken straight from the HarmonAI discord server.

We're excited to announce the launch of Stable Audio 3, our new family of text-to-audio models for music and sound effects, including new *open-weights models*! We're releasing three models today on Hugging Face as well as a GitHub repo specifically tailored to Stable Audio 3 inference, as well as LoRA fine-tuning.

* Stable Audio 3 Small Music ([https://huggingface.co/stabilityai/stable-audio-3-small-music](https://huggingface.co/stabilityai/stable-audio-3-small-music))
* Stable Audio 3 Small SFX ([https://huggingface.co/stabilityai/stable-audio-3-small-sfx](https://huggingface.co/stabilityai/stable-audio-3-small-sfx))
* Stable Audio 3 Medium ([https://huggingface.co/stabilityai/stable-audio-3-medium](https://huggingface.co/stabilityai/stable-audio-3-medium))

Stable Audio 3 GitHub: [https://github.com/Stability-AI/stable-audio-3](https://github.com/Stability-AI/stable-audio-3) The Medium model generates music and sound effects with lengths up to **six minutes and twenty seconds**, inferencing in a matter of seconds on NVIDIA GPUs. The Small models make music and sound effects (respectively) with lengths up to **two minutes**, and can be optimized to run efficiently on CPUs. These models are licensed under our Stability AI Community License, meaning it's totally free for personal and creative use. We don't claim any royalties or ownership on the model outputs, they're yours to do with as you please. We've also published two academic papers on this model as well the new SAME autoencoder architecture the models are based on.

Stable Audio 3 paper: [https://arxiv.org/abs/2605.17991](https://arxiv.org/abs/2605.17991)
SAME paper: [https://arxiv.org/abs/2605.18613](https://arxiv.org/abs/2605.18613)
Blog post: [https://stability.ai/news-updates/meet-stable-audio-3-the-model-family-built-for-artistic-experimentation-with-open-weight-models](https://stability.ai/news-updates/meet-stable-audio-3-the-model-family-built-for-artistic-experimentation-with-open-weight-models)

We're so excited to share this release with you, and we can't wait to see what you make with it!

https://redd.it/1tiq820
@rStableDiffusion

huggingface.co

stabilityai/stable-audio-3-small-music · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

7 views16:40

r/StableDiffusion

How do you figure out which samplers to use?

I usually just use what is given to me in example workflows but there are so many to choose from. Will reading and learning about the model help inform the decision on what sampler to use? Things like skipping steps and 2 step samplers are they just trial and error or is their a method to the madness?

https://redd.it/1tiofkw
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

5 views18:40

r/StableDiffusion

Extreme realism with Klein 9B distilled 2 loras together

https://redd.it/1tiwruj
@rStableDiffusion

From the StableDiffusion community on Reddit: Extreme realism with Klein 9B distilled 2 loras together

Explore this post and more from the StableDiffusion community

4 views20:40

r/StableDiffusion

4 views20:40

About

Blog

Apps

Platform