Training a LTX 2.3 I2V LORA

I have searched Reddit and asked 4 AIs, and I got widely different information about the subject.

I want to create a series of LORAs capturing certain human motions, and I would like to know from you guys with experience:

What is a minimum acceptable amount of video clips in a dataset? 
What length and frame rate are your clips at? 
Do you use 1:1 or 16:9 ratio clips and at what resolution? 

Bonus question:

Do you also add still images of from the same dartaset videos?  

I looking for some basic settings, just get me going with my first training and I am thinking of getting a H100 on Runpod to do the job.

Thanks!

https://redd.it/1ta0in5
@rStableDiffusion
HiDream-Studio v.01 has been released! It is fast and powerful and open-sourced on Github | Easy Install
https://redd.it/1ta2aek
@rStableDiffusion
LipDub (Beta): new open-source lipsync IC-LoRA

Today we're releasing a beta of LipDub, a new open-source lipsync capability built on LTX.

LipDub is an IC-LoRA adapter that takes an existing video and replaces the dialogue by regenerating speech and lip motion together in a single pass. Give it a source video and a text prompt with your new dialogue, and it preserves everything except the lip region: the speaker's appearance, vocal identity, tone, and delivery.

**This beta includes:**

* 1080p Full HD output
* Up to 8-second clips
* Single-speaker support
* Validated languages: English, French, Spanish, German, and Russian.

**What you can do with it:**

* Dub into another language
* Rephrase or replace dialogue in the original language
* Talking-head generation workflows

**Links:**

* **HuggingFace**: [https://huggingface.co/Lightricks/LTX-2.3-22b-IC-LoRA-LipDub](https://huggingface.co/Lightricks/LTX-2.3-22b-IC-LoRA-LipDub)
* **ComfyUI workflow**: [https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example\_workflows/2.3/LTX-2.3\_ICLoRA\_Lipdub\_Two\_Stage\_Distilled.json](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_ICLoRA_Lipdub_Two_Stage_Distilled.json)
* **Python pipeline**: [https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/src/ltx\_pipelines/lipdub.py](https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/src/ltx_pipelines/lipdub.py)
* **Documentation**:[ https://docs.ltx.video/open-source-model/usage-guides/lip-dub-beta](https://docs.ltx.video/open-source-model/usage-guides/lip-dub-beta)

This is an early open-source beta release. We're putting it in the community's hands before the API ships. Please explore it, break it, build with it, and let us know what you find.

LipDub is grounded in our research paper, [*Video Dubbing via Joint Audio-Visual Diffusion*](https://justdubit.github.io/), from researchers at Lightricks and Tel Aviv University, which goes into why joint audio-visual generation outperforms modular pipelines.

https://redd.it/1ta66f1
@rStableDiffusion
SmartAttentionDispatcher — ComfyUI node that patches model attention with SageAttention

# 1. What is it and why

A node that replaces PyTorch SDPA with SageAttention kernels (SA2 / SA3) without restarting ComfyUI and without launch flags. Automatically detects GPU architecture, installed libraries, and available kernels. Shows active mode, GPU tier, SA2/SA3 availability, and model architecture in the node status panel after each run.

Inspired by Kijai's node, SmartAttentionDispatcher extends it with additional capabilities: specific kernel selection, dynamic combine mode, and support for models that import attention locally (ErnieImage, Qwen, ACE-Step).

https://preview.redd.it/5b7moef2th0h1.png?width=804&format=png&auto=webp&s=2c68bfffbd5d9b070532ad3d96634b28a77edb05

Recommended launch flag: --fast

⚠️ Do not use --use-sage-attention together with this node — it conflicts with the patching mechanism.

# 2. Model patching specifics

Most DiT models (Flux, SD3.5, Z-Image, LTX, Wan) are patched through the standard ComfyUI transformer_options mechanism. However, some models import optimized_attention locally at module load time — a regular patch does not reach them. For these models the node additionally scans sys.modules and patches all found references. Confirmed for ErnieImage, Qwen-Image/Edit, and ACE-Step.

SDXL (UNet architecture) is also supported via SA2, though speed gain is minimal — sequences are too short for SA to provide advantage.

⚠️ Qwen 2512 in SA3 mode produces results that do not match the prompt — unstable FP4 math at long sequences (seq > 7000). SA2 on Qwen works correctly.

# 3. Modes

When sdpa=False and all other parameters are disable — this is standard PyTorch SDPA, the node changes nothing. When sdpa=True — also SDPA, but all other node settings are forcibly ignored.

SA2 — SageAttention2 on all steps. Kernels: `auto`, `fp16`, `fp8`, `fp8++`, `triton`. `auto` selects the best kernel for your GPU automatically.
SA3 — SageAttention3 on all steps. Blackwell only (RTX 50xx), CUDA 12.8+, separate sageattn3 package. Works from Python 3.10+.
Combine (dynamic mode) — switches between SA2 and SA3 depending on the diffusion step. First and last step — SA2 (or SDPA if SA2 is also disabled), middle steps — SA3. Displayed in the node as `SA2-SA3-SA2` or `SDPA-SA3-SDPA`.

How to connect in workflow: The node is placed directly before KSampler — after model loading, after applying LoRA, after any nodes that shift or modify the model. Input `model` → output `model`. The node detects the architecture and applies the patch automatically.

# 4. Tested models

|Model|SA2|SA3|Patch|Notes|
|:-|:-|:-|:-|:-|
|SDXL 1.0||—|transformer\_options|SA3 not tested on UNet, minimal gain|
|SD3.5|||transformer\_options|cross-attn layers auto-fallback to SDPA|
|Flux.1 dev (Kontext, Krea)|||transformer\_options|—|
|Flux.2 dev (Klein)|||transformer\_options|—|
|Z-Image turbo|||transformer\_options|—|
|Qwen-Image 2512 / Edit 2511||⚠️|sys.modules|SA3 unstable at long sequences|
|ERNIE-Image turbo|||sys.modules|—|
|LTX 2.3 (dev, distilled)|||transformer\_options|—|
|Wan2.2||⚠️|transformer\_options|SA3 OOM at 1280x720 on 16GB VRAM|
|HunyuanVideo 1.5||—|transformer\_options|not fully tested|
|ACE-Step 1.5|—|—|sys.modules|may work, not tested|

# 5. Image generation benchmark

Model: `flux-2-klein-base-9b-fp8` \+ `qwen_3_8b_fp8mixed` text encoder
Settings: 896×1152, 30 steps, dpmpp\_2m\_sde, cfg=5
GPU: RTX 5060 Ti 16GB | PyTorch 2.11.0+cu130 | Python 3.14.4 | SM 12.0 Blackwell

Why this model — 9GB fits entirely in VRAM, attention is the real bottleneck, clean results without RAM/VRAM swap overhead.

18 images split into rows:

Row SDPA

https://preview.redd.it/si9nwf08th0h1.png?width=896&format=png&auto=webp&s=1a12c88246dced527d48353c25d6740102aa9ef4

Row SA2: fp8, fp8++

https://preview.redd.it/2pocu859th0h1.jpg?width=1822&format=pjpg&auto=webp&s=ce642ac994a89f96a6ba301e8cc73a239aaf1f83

Row SA3: standard,
per_block_mean

https://preview.redd.it/396ct36ath0h1.jpg?width=1822&format=pjpg&auto=webp&s=fb49bd85b2632e5a2c83de438f84a7914c691717

Row combine: SA2-SA3-SA2 and SDPA-SA3-SDPA with different kernel combinations

https://preview.redd.it/d8ct5gbbth0h1.jpg?width=2728&format=pjpg&auto=webp&s=ea0f499a320b1becf511efe4c715c4c2a8ada066

https://preview.redd.it/8el7yqbhth0h1.jpg?width=2728&format=pjpg&auto=webp&s=7d1509d4a573c02be7284506cb2cab00fa60d572

Row without node: --fast, --use-sage-attention, --fast --use-sage-attention

https://preview.redd.it/qnwccz7kth0h1.jpg?width=2728&format=pjpg&auto=webp&s=c1a0650562757c14f1a7b914a32923bb7f39a641

https://preview.redd.it/b8rrp37lth0h1.jpg?width=3634&format=pjpg&auto=webp&s=1527b8f451167cfb9feb7890f657fe48a06c54b2

|Mode|Flags|s/it|Total|vs SDPA|
|:-|:-|:-|:-|:-|
|SDPA (baseline)|vanilla|2.42|73.70s|0.0%|
|SA2 fp8|vanilla|2.22|67.48s|\+8.3%|
|SA2 fp8++|vanilla|2.20|66.81s|\+9.1%|
|SA3 standard|vanilla|2.22|67.50s|\+8.3%|
|SA3 per_block_mean|vanilla|2.20|67.00s|\+9.1%|
|SDPA-SA3-SDPA standard|vanilla|2.24|68.36s|\+7.4%|
|SDPA-SA3-SDPA per_block_mean|vanilla|2.24|68.26s|\+7.4%|
|SA2-SA3-SA2 fp8 + standard|vanilla|2.24|68.10s|\+7.4%|
|SA2-SA3-SA2 fp8 + per_block_mean|vanilla|2.24|68.06s|\+7.4%|
|SA2-SA3-SA2 fp8++ + standard|vanilla|2.23|67.74s|\+7.9%|
|SA2-SA3-SA2 fp8++ + per_block_mean|vanilla|2.24|68.03s|\+7.4%|
|SA2 fp8|\--fast --force-channels-last --fp16-intermediates|2.13|64.87s|\+12.0%|
|SA2 fp8++|\--fast --force-channels-last --fp16-intermediates|2.13|64.93s|\+12.0%|
|SA3 standard|\--fast --force-channels-last --fp16-intermediates|2.17|66.26s|\+10.3%|
|SDPA|\--fast|2.39|72.55s|\+1.2%|
|\--use-sage-attention|vanilla|2.11|64.43s|\+12.8%|
|\--use-sage-attention|\--fast|2.08|63.45s|\+14.0%|
|\--use-sage-attention|\--fast --force-channels-last --fp16-intermediates|2.08|63.48s|\+14.0%|

⚠️ --force-channels-last causes crashes with Wan. --fp16-intermediates breaks audio in LTX video+audio pipelines. For universal use only --fast is recommended.

# 6. Video models benchmark

|Model|Resolution|SDPA s/it|SA2 fp8++ s/it|Gain|Notes|
|:-|:-|:-|:-|:-|:-|
|ltx-2.3-22b-distilled bf16|1280x720|Ph1: 12.83 / Ph2: 63.75|Ph1: 11.07 / Ph2: 46.89|\+14% / +26%|—|
|Wan2.2 (VAE from Wan2.1)|960x544|Ph1: 126.82 / Ph2: 126.08|Ph1: 60.28 / Ph2: 58.81|\+52% / +53%|—|
|Wan2.2 (VAE from Wan2.1)|1280x720|—|—|—|SA3 per_block_mean OOM (740MB), requires >16GB VRAM + 64GB RAM|
|HunyuanVideo 1.5|1280x720|184s/it|73s/it|\+60%|stopped — unrealistic time for 5s video on 16GB|

# 7. Links

GitHub: https://github.com/Rogala/ComfyUI-rogala
All nodes available via ComfyUI Manager.

Google Drive with test images, videos, workflow and LogicIfElse node:
https://drive.google.com/drive/folders/17jy3g\_FTlM09YfM-Fwh5KWNIlvX0UCyc?usp=sharing

LogicIfElse — helper node for conditional model or parameter selection in workflow, not yet in the main repository as it is still being refined.

Built with the assistance of Claude.



https://redd.it/1ta0ewm
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

https://redd.it/1ta7aq9
@rStableDiffusion