This media is not supported in your browser
VIEW IN TELEGRAM
Local I2V finally feels less like image wiggle and more like shot direction with LTX Director
https://redd.it/1thuq4k
@rStableDiffusion
https://redd.it/1thuq4k
@rStableDiffusion
Kijai just uploaded LTX2.3 OmniNFT RL-LoRA for better video and audio!
Reposting this from Twitter (wildminder):
"LTX2.3 OmniNFT RL-LoRA generates high-quality video/audio + visuals and sound are perfectly synchronized, no laggy or mismatched audio.
\- realistic Lip-Sync
\- action-matched sound
\- reduces synchronization errors by 52%
really nice output"
https://reddit.com/link/1thxd1p/video/cygvtd81a52h1/player
Reddit keeps blocking my posts (removed by filters), so I'm editing the links to see if this post will work (just remove the spaces, sorry):
Project page: zghhui . github . io/OmniNFT/
Kijai HF repo: huggingface . co/Kijai/LTX2.3_comfy/tree/main
https://redd.it/1thxd1p
@rStableDiffusion
Reposting this from Twitter (wildminder):
"LTX2.3 OmniNFT RL-LoRA generates high-quality video/audio + visuals and sound are perfectly synchronized, no laggy or mismatched audio.
\- realistic Lip-Sync
\- action-matched sound
\- reduces synchronization errors by 52%
really nice output"
https://reddit.com/link/1thxd1p/video/cygvtd81a52h1/player
Reddit keeps blocking my posts (removed by filters), so I'm editing the links to see if this post will work (just remove the spaces, sorry):
Project page: zghhui . github . io/OmniNFT/
Kijai HF repo: huggingface . co/Kijai/LTX2.3_comfy/tree/main
https://redd.it/1thxd1p
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
building a shared hair library for SD prompts - who's down to help
hair is probably the most inconsistent thing I generate and I reckon a lot of you feel the same. prompts like "wolf cut" or "space buns" work sometimes and totally miss other times depending on the checkpoint, lighting, face angle, even sampler settings and CFG. there's no universal hairstyle taxonomy baked into SD prompts the way there is for art styles or character, archetypes - there are some community prompt packs floating around but nothing really structured or tested across models. so I want to build something actually useful: a shared hair library. basically a structured list of hairstyle prompt terms, what models they tend to work on, what, breaks them, and practical notes on ControlNet, IP-Adapter, or reference image approaches for the trickier ones. not just a name dump - actual tested prompts with context on what conditions they need to land properly. things like aspect ratio, whether you need a LoRA to reinforce the shape, whether regional prompting helps when you're fighting bleed from the rest of the composition. worth noting: for anything beyond simple styles, prompts alone usually aren't enough. most reliable workflows I've seen lean on LoRAs for specific cuts, ControlNet for structure, or IP-Adapter/reference-only modes for style transfer. would be good to document what combination actually works per style rather than pretending a single tag is going to do the job. anyone already doing something like this or have a system that works for you? and when a hairstyle prompt just isn't cooperating, what's your fallback - reference images, inpainting, hair-specific LoRA, something else?
https://redd.it/1thz0ma
@rStableDiffusion
hair is probably the most inconsistent thing I generate and I reckon a lot of you feel the same. prompts like "wolf cut" or "space buns" work sometimes and totally miss other times depending on the checkpoint, lighting, face angle, even sampler settings and CFG. there's no universal hairstyle taxonomy baked into SD prompts the way there is for art styles or character, archetypes - there are some community prompt packs floating around but nothing really structured or tested across models. so I want to build something actually useful: a shared hair library. basically a structured list of hairstyle prompt terms, what models they tend to work on, what, breaks them, and practical notes on ControlNet, IP-Adapter, or reference image approaches for the trickier ones. not just a name dump - actual tested prompts with context on what conditions they need to land properly. things like aspect ratio, whether you need a LoRA to reinforce the shape, whether regional prompting helps when you're fighting bleed from the rest of the composition. worth noting: for anything beyond simple styles, prompts alone usually aren't enough. most reliable workflows I've seen lean on LoRAs for specific cuts, ControlNet for structure, or IP-Adapter/reference-only modes for style transfer. would be good to document what combination actually works per style rather than pretending a single tag is going to do the job. anyone already doing something like this or have a system that works for you? and when a hairstyle prompt just isn't cooperating, what's your fallback - reference images, inpainting, hair-specific LoRA, something else?
https://redd.it/1thz0ma
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
RL lora for LTX2.3. It greatly increases coherence and quality while reducing artifacts.
https://redd.it/1ti3jar
@rStableDiffusion
https://redd.it/1ti3jar
@rStableDiffusion
Is it possible to add audio to a WAN video with LTX?
I prefer WAN over LTX. It would be nice to add audio to WAN.
https://redd.it/1ti5pvo
@rStableDiffusion
I prefer WAN over LTX. It would be nice to add audio to WAN.
https://redd.it/1ti5pvo
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Released a Safe Chunked Image Blend node for ComfyUI — explicit CUDA resize/blend instead of hidden full-batch CPU resizing
I put together a small ComfyUI custom node called Safe Chunked Image Blend.
The short version: it is a replacement-style image blend node for cases where large image/video tensors get unstable, slow, or freeze when a blend node silently resizes one input to match the other.
GitHub:
https://github.com/xmarre/ComfyUI-Safe-Chunked-Image-Blend
Also available via ComfyUI-Manager.
The issue I ran into was with large upscaled image/video workflows. The standard blend path follows the device of the incoming image tensor. If the images arrive as CPU
That can turn into a bad path like:
image1 = (2, 5464, 3800, 3)
image2 = (2, 2732, 1900, 3)
hidden resize:
image2 -> 5464x3800
then blend
For big batched tensors, especially in video/upscale workflows, that can be a large CPU resize/blend operation, which in turn can then freeze/wedge WSL ComfyUI setups.
This node makes that behavior explicit.
What it does:
CPU input tensors
-> move one chunk/frame to CUDA if requested
-> resize the mismatched input explicitly
-> blend
-> copy the finished chunk back to CPU float32
Main features:
explicit `compute_device`: `cuda`, `cpu`, `image1`, or `image2`
explicit resize policy:
error if sizes mismatch
resize image2 to image1
resize image1 to image2
chunked processing by batch/frame
output preallocation instead of concatenating large temporary chunks
optional CUDA sync per chunk for easier debugging
detailed logs showing shape, dtype, device, resize step, blend step, and output copy
includes an
Recommended starting settings for large upscale/video blends:
resizepolicy = resizeimage2toimage1
resizemethod = bilinear
chunksize = 1
computedevice = cuda
outputcpufloat32 = true
synchronizeeachchunk = true
emptycudacacheeachchunk = false
logprogress = true
I would start with
https://redd.it/1tig51k
@rStableDiffusion
I put together a small ComfyUI custom node called Safe Chunked Image Blend.
The short version: it is a replacement-style image blend node for cases where large image/video tensors get unstable, slow, or freeze when a blend node silently resizes one input to match the other.
GitHub:
https://github.com/xmarre/ComfyUI-Safe-Chunked-Image-Blend
Also available via ComfyUI-Manager.
The issue I ran into was with large upscaled image/video workflows. The standard blend path follows the device of the incoming image tensor. If the images arrive as CPU
float32 tensors, the resize and blend happen on CPU. If the two inputs are different sizes, the resize can happen inside the blend node without being very obvious.That can turn into a bad path like:
image1 = (2, 5464, 3800, 3)
image2 = (2, 2732, 1900, 3)
hidden resize:
image2 -> 5464x3800
then blend
For big batched tensors, especially in video/upscale workflows, that can be a large CPU resize/blend operation, which in turn can then freeze/wedge WSL ComfyUI setups.
This node makes that behavior explicit.
What it does:
CPU input tensors
-> move one chunk/frame to CUDA if requested
-> resize the mismatched input explicitly
-> blend
-> copy the finished chunk back to CPU float32
Main features:
explicit `compute_device`: `cuda`, `cpu`, `image1`, or `image2`
explicit resize policy:
error if sizes mismatch
resize image2 to image1
resize image1 to image2
chunked processing by batch/frame
output preallocation instead of concatenating large temporary chunks
optional CUDA sync per chunk for easier debugging
detailed logs showing shape, dtype, device, resize step, blend step, and output copy
includes an
Image Pair Shape Probe helper node for checking both input tensor shapes/devicesRecommended starting settings for large upscale/video blends:
resizepolicy = resizeimage2toimage1
resizemethod = bilinear
chunksize = 1
computedevice = cuda
outputcpufloat32 = true
synchronizeeachchunk = true
emptycudacacheeachchunk = false
logprogress = true
I would start with
bilinear first. Bicubic is heavier, and I would only switch to it after confirming the workflow is stable.https://redd.it/1tig51k
@rStableDiffusion
GitHub
GitHub - xmarre/ComfyUI-Safe-Chunked-Image-Blend: Safe Chunked Image Blend for ComfyUI: explicit CUDA/CPU resize+blend, per-frame…
Safe Chunked Image Blend for ComfyUI: explicit CUDA/CPU resize+blend, per-frame chunking, shape checks, and logging for large batched IMAGE tensors. - xmarre/ComfyUI-Safe-Chunked-Image-Blend
ComfyUI HiDream text->image and image-edit templates - multiple reference image facility. Discuss please.
A recent ComfyUI update has included the two new HiDream templates mentioned in the title. I should welcome responses to the following questions.
1. The general pros and cons of HiDream.
2. Use of multiple reference images. How best to organise? How many? How to integrate with textual instructions?
3. Is the use of multiple reference images implemented for other visual AI models?
https://redd.it/1tihygu
@rStableDiffusion
A recent ComfyUI update has included the two new HiDream templates mentioned in the title. I should welcome responses to the following questions.
1. The general pros and cons of HiDream.
2. Use of multiple reference images. How best to organise? How many? How to integrate with textual instructions?
3. Is the use of multiple reference images implemented for other visual AI models?
https://redd.it/1tihygu
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
How to achieve this style where the face is anime but the body is a realistic 3D render?
https://redd.it/1tiksdz
@rStableDiffusion
https://redd.it/1tiksdz
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: How to achieve this style where the face is anime but the body is a realistic 3D…
Explore this post and more from the StableDiffusion community