This media is not supported in your browser
VIEW IN TELEGRAM
Local I2V finally feels less like image wiggle and more like shot direction with LTX Director

https://redd.it/1thuq4k
@rStableDiffusion
Kijai just uploaded LTX2.3 OmniNFT RL-LoRA for better video and audio!

Reposting this from Twitter (wildminder):

"LTX2.3 OmniNFT RL-LoRA generates high-quality video/audio + visuals and sound are perfectly synchronized, no laggy or mismatched audio.

\- realistic Lip-Sync

\- action-matched sound

\- reduces synchronization errors by 52%

really nice output"

https://reddit.com/link/1thxd1p/video/cygvtd81a52h1/player

Reddit keeps blocking my posts (removed by filters), so I'm editing the links to see if this post will work (just remove the spaces, sorry):

Project page: zghhui . github . io/OmniNFT/

Kijai HF repo: huggingface . co/Kijai/LTX2.3_comfy/tree/main

https://redd.it/1thxd1p
@rStableDiffusion
building a shared hair library for SD prompts - who's down to help

hair is probably the most inconsistent thing I generate and I reckon a lot of you feel the same. prompts like "wolf cut" or "space buns" work sometimes and totally miss other times depending on the checkpoint, lighting, face angle, even sampler settings and CFG. there's no universal hairstyle taxonomy baked into SD prompts the way there is for art styles or character, archetypes - there are some community prompt packs floating around but nothing really structured or tested across models. so I want to build something actually useful: a shared hair library. basically a structured list of hairstyle prompt terms, what models they tend to work on, what, breaks them, and practical notes on ControlNet, IP-Adapter, or reference image approaches for the trickier ones. not just a name dump - actual tested prompts with context on what conditions they need to land properly. things like aspect ratio, whether you need a LoRA to reinforce the shape, whether regional prompting helps when you're fighting bleed from the rest of the composition. worth noting: for anything beyond simple styles, prompts alone usually aren't enough. most reliable workflows I've seen lean on LoRAs for specific cuts, ControlNet for structure, or IP-Adapter/reference-only modes for style transfer. would be good to document what combination actually works per style rather than pretending a single tag is going to do the job. anyone already doing something like this or have a system that works for you? and when a hairstyle prompt just isn't cooperating, what's your fallback - reference images, inpainting, hair-specific LoRA, something else?

https://redd.it/1thz0ma
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
RL lora for LTX2.3. It greatly increases coherence and quality while reducing artifacts.

https://redd.it/1ti3jar
@rStableDiffusion
Is it possible to add audio to a WAN video with LTX?

I prefer WAN over LTX. It would be nice to add audio to WAN.

https://redd.it/1ti5pvo
@rStableDiffusion
Released a Safe Chunked Image Blend node for ComfyUI — explicit CUDA resize/blend instead of hidden full-batch CPU resizing

I put together a small ComfyUI custom node called Safe Chunked Image Blend.

The short version: it is a replacement-style image blend node for cases where large image/video tensors get unstable, slow, or freeze when a blend node silently resizes one input to match the other.

GitHub:

https://github.com/xmarre/ComfyUI-Safe-Chunked-Image-Blend

Also available via ComfyUI-Manager.

The issue I ran into was with large upscaled image/video workflows. The standard blend path follows the device of the incoming image tensor. If the images arrive as CPU float32 tensors, the resize and blend happen on CPU. If the two inputs are different sizes, the resize can happen inside the blend node without being very obvious.

That can turn into a bad path like:

image1 = (2, 5464, 3800, 3)
image2 = (2, 2732, 1900, 3)

hidden resize:
image2 -> 5464x3800
then blend

For big batched tensors, especially in video/upscale workflows, that can be a large CPU resize/blend operation, which in turn can then freeze/wedge WSL ComfyUI setups.

This node makes that behavior explicit.

What it does:

CPU input tensors
-> move one chunk/frame to CUDA if requested
-> resize the mismatched input explicitly
-> blend
-> copy the finished chunk back to CPU float32

Main features:

explicit `compute_device`: `cuda`, `cpu`, `image1`, or `image2`
explicit resize policy:
error if sizes mismatch
resize image2 to image1
resize image1 to image2
chunked processing by batch/frame
output preallocation instead of concatenating large temporary chunks
optional CUDA sync per chunk for easier debugging
detailed logs showing shape, dtype, device, resize step, blend step, and output copy
includes an Image Pair Shape Probe helper node for checking both input tensor shapes/devices

Recommended starting settings for large upscale/video blends:

resizepolicy = resizeimage2toimage1
resizemethod = bilinear
chunk
size = 1
computedevice = cuda
output
cpufloat32 = true
synchronize
eachchunk = true
empty
cudacacheeachchunk = false
log
progress = true

I would start with bilinear first. Bicubic is heavier, and I would only switch to it after confirming the workflow is stable.

https://redd.it/1tig51k
@rStableDiffusion
ComfyUI HiDream text->image and image-edit templates - multiple reference image facility. Discuss please.

A recent ComfyUI update has included the two new HiDream templates mentioned in the title. I should welcome responses to the following questions.

1. The general pros and cons of HiDream.
2. Use of multiple reference images. How best to organise? How many? How to integrate with textual instructions?
3. Is the use of multiple reference images implemented for other visual AI models?

https://redd.it/1tihygu
@rStableDiffusion
LTX Color Shifting

reference image

I'm having a problem with color changing basically since I started usng the id lora node with LTX 2.3, even though I don't think he is behind this, but every generation since then is iffy. At first, it started by color changing when the video progressed, now it became less and less perceptible since I reduced the reference image size and increased the weight in "LTXImgToVideoInplace" at the upscale stage to values above one. But the results still iffy. The problem always happens at the upscale stage, regardless of the upscaler I'm using, here are some of the examples of how it is supposed to be and how it is now.

working example

color shift example



https://redd.it/1tijjkf
@rStableDiffusion
Worth Upgrading just GPU or entire System needs upgrade?

Hello,
Read some about different GPU's and ram requirement and see some conflicting stuff. But i think my system needs full upgrade, just wanna confirmation, before overspending.

Right now I have Ryzen 3600 CPU, AMD R5700 GPU 8GB and 32 GB ram (4x 8gb) mobo is MSI Gaming Plus B450 so PCIE 3.0 slot, 650W Corsairs RMx PSU


So idea was maybe to get a 5060 TI 16GB or 5070TI 16GB (as what i read, dont bother with AMD, Intel if you want out of the box working and less tinkering and Windows)

Also have access to wife's PC that is AMD 5600 CPU, 5060 8GB GPU and 16GB ram, B550M Pro-VDH motherboard has PCIE 4.0

So Worth to get a 16GB GPU in either system with 32GB ram. Or also need 64GB ram? Or better get a newer AM5 system with like 64-128GB ram and 16GB card?

A used 3090 here is around 800eur, refurb 900+eur
64GB ram DDR4 - 500eur
5060TI 16GB - 600eur
5070TI 16GB - 1000eur
5080 - 1.5k
5090 - 3.6k and up :D


Would like to have Image and Video gen, TTS, make consistent chars, images with same char, like comic etc :) Try new stuff.

New system would cost me like 3k with a 5070TI, with 64GB ram, new PSU to support 2 GPU's and Taichi Motherboard (as wanna try local LLM later also)

But for now, i would like to see if i can get by with existing system and if its even worth trying, or need to save up a bit and get a complete new system.



Thanks for answers and help :)



https://redd.it/1tiljco
@rStableDiffusion