This media is not supported in your browser
VIEW IN TELEGRAM
Local I2V finally feels less like image wiggle and more like shot direction with LTX Director

https://redd.it/1thuq4k
@rStableDiffusion
Kijai just uploaded LTX2.3 OmniNFT RL-LoRA for better video and audio!

Reposting this from Twitter (wildminder):

"LTX2.3 OmniNFT RL-LoRA generates high-quality video/audio + visuals and sound are perfectly synchronized, no laggy or mismatched audio.

\- realistic Lip-Sync

\- action-matched sound

\- reduces synchronization errors by 52%

really nice output"

https://reddit.com/link/1thxd1p/video/cygvtd81a52h1/player

Reddit keeps blocking my posts (removed by filters), so I'm editing the links to see if this post will work (just remove the spaces, sorry):

Project page: zghhui . github . io/OmniNFT/

Kijai HF repo: huggingface . co/Kijai/LTX2.3_comfy/tree/main

https://redd.it/1thxd1p
@rStableDiffusion
building a shared hair library for SD prompts - who's down to help

hair is probably the most inconsistent thing I generate and I reckon a lot of you feel the same. prompts like "wolf cut" or "space buns" work sometimes and totally miss other times depending on the checkpoint, lighting, face angle, even sampler settings and CFG. there's no universal hairstyle taxonomy baked into SD prompts the way there is for art styles or character, archetypes - there are some community prompt packs floating around but nothing really structured or tested across models. so I want to build something actually useful: a shared hair library. basically a structured list of hairstyle prompt terms, what models they tend to work on, what, breaks them, and practical notes on ControlNet, IP-Adapter, or reference image approaches for the trickier ones. not just a name dump - actual tested prompts with context on what conditions they need to land properly. things like aspect ratio, whether you need a LoRA to reinforce the shape, whether regional prompting helps when you're fighting bleed from the rest of the composition. worth noting: for anything beyond simple styles, prompts alone usually aren't enough. most reliable workflows I've seen lean on LoRAs for specific cuts, ControlNet for structure, or IP-Adapter/reference-only modes for style transfer. would be good to document what combination actually works per style rather than pretending a single tag is going to do the job. anyone already doing something like this or have a system that works for you? and when a hairstyle prompt just isn't cooperating, what's your fallback - reference images, inpainting, hair-specific LoRA, something else?

https://redd.it/1thz0ma
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
RL lora for LTX2.3. It greatly increases coherence and quality while reducing artifacts.

https://redd.it/1ti3jar
@rStableDiffusion
Is it possible to add audio to a WAN video with LTX?

I prefer WAN over LTX. It would be nice to add audio to WAN.

https://redd.it/1ti5pvo
@rStableDiffusion
Released a Safe Chunked Image Blend node for ComfyUI — explicit CUDA resize/blend instead of hidden full-batch CPU resizing

I put together a small ComfyUI custom node called Safe Chunked Image Blend.

The short version: it is a replacement-style image blend node for cases where large image/video tensors get unstable, slow, or freeze when a blend node silently resizes one input to match the other.

GitHub:

https://github.com/xmarre/ComfyUI-Safe-Chunked-Image-Blend

Also available via ComfyUI-Manager.

The issue I ran into was with large upscaled image/video workflows. The standard blend path follows the device of the incoming image tensor. If the images arrive as CPU float32 tensors, the resize and blend happen on CPU. If the two inputs are different sizes, the resize can happen inside the blend node without being very obvious.

That can turn into a bad path like:

image1 = (2, 5464, 3800, 3)
image2 = (2, 2732, 1900, 3)

hidden resize:
image2 -> 5464x3800
then blend

For big batched tensors, especially in video/upscale workflows, that can be a large CPU resize/blend operation, which in turn can then freeze/wedge WSL ComfyUI setups.

This node makes that behavior explicit.

What it does:

CPU input tensors
-> move one chunk/frame to CUDA if requested
-> resize the mismatched input explicitly
-> blend
-> copy the finished chunk back to CPU float32

Main features:

explicit `compute_device`: `cuda`, `cpu`, `image1`, or `image2`
explicit resize policy:
error if sizes mismatch
resize image2 to image1
resize image1 to image2
chunked processing by batch/frame
output preallocation instead of concatenating large temporary chunks
optional CUDA sync per chunk for easier debugging
detailed logs showing shape, dtype, device, resize step, blend step, and output copy
includes an Image Pair Shape Probe helper node for checking both input tensor shapes/devices

Recommended starting settings for large upscale/video blends:

resizepolicy = resizeimage2toimage1
resizemethod = bilinear
chunk
size = 1
computedevice = cuda
output
cpufloat32 = true
synchronize
eachchunk = true
empty
cudacacheeachchunk = false
log
progress = true

I would start with bilinear first. Bicubic is heavier, and I would only switch to it after confirming the workflow is stable.

https://redd.it/1tig51k
@rStableDiffusion
ComfyUI HiDream text->image and image-edit templates - multiple reference image facility. Discuss please.

A recent ComfyUI update has included the two new HiDream templates mentioned in the title. I should welcome responses to the following questions.

1. The general pros and cons of HiDream.
2. Use of multiple reference images. How best to organise? How many? How to integrate with textual instructions?
3. Is the use of multiple reference images implemented for other visual AI models?

https://redd.it/1tihygu
@rStableDiffusion