r/StableDiffusion

Made a 4 minute video with a 53 word single prompt, with my new video pipeline tool that goes from a simple or complex single prompt to a full video. I haven't fully tested the maximum length based on the context window I have but its a revolutionary product on consumer hardware. RTX 4090 laptop

https://redd.it/1sdn1ga
@rStableDiffusion

7 views09:40

r/StableDiffusion

Testing LTX-Video 2.3 — 11 Models, PainterLTXV2 Workflow

# System Environment

|ComfyUI|v0.18.5 (7782171a)|
|:-|:-|
|GPU|NVIDIA RTX 5060 Ti (15.93 GB VRAM, Driver 595.79, CUDA 13.2)|
|CPU|Intel Core i3-12100F 12th Gen (4C/8T)|
|RAM|63.84 GB|
|Python|3.14.3|
|Torch|2.11.0+cu130|
|Triton|3.6.0.post26|
|Sage-Attn 2|2.2.0|

# Models Tested

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev.safetensors|43.0|
|ltx-2.3-22b-dev-fp8.safetensors|27.1|
|ltx-2.3-22b-dev-nvfp4.safetensors|20.2|
|ltx-2.3-22b-distilled.safetensors|43.0|
|ltx-2.3-22b-distilled-fp8.safetensors|27.5|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev\_transformer\_only\_fp8\_scaled.safetensors|21.9|
|ltx-2-3-22b-dev\_transformer\_only\_fp8\_input\_scaled.safetensors|23.3|
|ltx-2.3-22b-distilled\_transformer\_only\_fp8\_scaled.safetensors|21.9|
|ltx-2.3-22b-distilled\_transformer\_only\_fp8\_input\_scaled\_v3.safetensors|23.3|

**From** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev-Q8\_0.gguf|21.2|
|ltx-2.3-22b-distilled-Q8\_0.gguf|21.2|

# Additional Components

**Text Encoders**

**From** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2/tree/main/split_files/text_encoders)

|File|Size (GB)|
|:-|:-|
|gemma\_3\_12B\_it\_fpmixed.safetensors|12.8|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy) **and** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|File|Size (GB)|
|:-|:-|
|ltx-2.3\_text\_projection\_bf16.safetensors|2.2|
|ltx-2.3-22b-dev\_embeddings\_connectors.safetensors|2.2|
|ltx-2.3-22b-distilled\_embeddings\_connectors.safetensors|2.2|

**LoRAs**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23) **and** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2)

|File|Size (GB)|Weight used|
|:-|:-|:-|
|ltx-2.3-22b-distilled-lora-384.safetensors|7.1|0.6 (dev models only)|
|ltx-2.3-id-lora-celebvhq-3k.safetensors|1.1|0.3 (all models)|

**VAE**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23) **/** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2)

|File|Size (GB)|
|:-|:-|
|LTX23\_audio\_vae\_bf16.safetensors|0.3|
|LTX23\_video\_vae\_bf16.safetensors|1.4|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy) **and** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|File|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev\_audio\_vae.safetensors|0.3|
|ltx-2.3-22b-dev\_video\_vae.safetensors|1.4|
|ltx-2.3-22b-distilled\_audio\_vae.safetensors|0.3|
|ltx-2.3-22b-distilled\_video\_vae.safetensors|1.4|

**Latent Upscale**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23)

|File|Size (GB)|
|:-|:-|
|ltx-2.3-spatial-upscaler-x2-1.1.safetensors|0.9|

# Workflow

The official workflows from [ComfyUI/Lightricks](https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3), [RuneXX](https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main), and unsloth (GGUF) all felt too bloated and unclear to work with comfortably. **But maybe I just didn't fully grasp the power of their parameters and the range of possibilities they offer.** I ended up basing everything on [princepainter's ComfyUI-PainterLTXV2](https://github.com/princepainter/ComfyUI-PainterLTXV2) — his combined dual KSampler node is great, and he has solid WAN-2.2 workflows too.

I haven't managed to get truly clean results yet, but I'm getting closer. Still not sure how others are pulling off such high-quality outputs.

Below is an example workflow for Dev models — kept as simple and readable as possible.

https://preview.redd.it/f8qx4rup3gtg1.png?width=1503&format=png&auto=webp&s=e35fb2346b79dd65a966a764fe406e4ae0c5f2c2

Not all videos are included here — only the ones I thought were the best (and even those are just decent in dev). Everything else, including all workflow files, is available on Google Drive with model names in the filenames: [**Google Drive

huggingface.co

LTX-2.3 - a Lightricks Collection

LTX-2.3 base models, quantized models and accompanying LoRAs and IC-LoRAs

6 views10:40

r/StableDiffusion

folder**](https://drive.google.com/drive/folders/1Hdm2dfRT62d0dDg5ldX1Wr8lazboRbW5?usp=sharing)

# Benchmark Results

Each model was run twice — first to load, second to measure time. With GGUF models something weird happened: upscale iteration time grew several times over, which inflated total generation time significantly.

**Dev — 1280x720, steps=35, cfg=3, fps=24, duration=10s (241 frames), no upscale** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/1bknutt85gtg1.png?width=1500&format=png&auto=webp&s=968daecc39d5bf57b6d1a05e472e099f3ae41e04

*Dev-FULL*

https://reddit.com/link/1sdgu9x/video/2ixoekc04gtg1/player

**Distilled — 1280x720, steps=15, cfg=1, fps=24, duration=10s (241 frames), no upscale** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/0ng8zas95gtg1.png?width=1500&format=png&auto=webp&s=138d310b69ba141556d38b79e25d507f254efc1a

*Distilled-FULL*

https://reddit.com/link/1sdgu9x/video/z9p7hn7a4gtg1/player

**Dev - Distilled + Upscale — input 960x544 → target 1920x1080, steps=8+4, cfg=1, fps=24, duration=10s (241 frames), upscale x2** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/3rpk26db5gtg1.png?width=1600&format=png&auto=webp&s=af9b5b39d90beab395dcf4592fffa07dc4030246

*Distilled-FP8+Upscale*

https://reddit.com/link/1sdgu9x/video/eby8rljl4gtg1/player

**Dev - Distilled transformer + GGUF + Upscale — input 960x544 → target 1920x1080, steps=8+4, cfg=1, fps=24, duration=10s (241 frames), upscale x2** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/gd631mac5gtg1.png?width=1920&format=png&auto=webp&s=e8862a4fdfc18a90de0b83d2d9ec2b4d285638d1

*Distilled-gguf+Upscaler*

https://reddit.com/link/1sdgu9x/video/a4spdwi25gtg1/player

# Shameless Self-Promo

I built this node after finishing the tests — and honestly wish I had it during them. Would have made organizing and labeling output footage a lot easier.

[**Aligned Text Overlay Video**](https://github.com/Rogala/ComfyUI-rogala?tab=readme-ov-file#aligned-text-overlay-video)

Renders a multi-line text block onto every frame of a video tensor. Supports `%NodeTitle.param%` template tags resolved from the active ComfyUI prompt.

https://preview.redd.it/nepdj0h65gtg1.png?width=1829&format=png&auto=webp&s=c9ad0041e503ff3079d5d17047c34abcfde47002

Check out my GitHub page for a few more repos: [**github.com/Rogala**](https://github.com/Rogala)

https://redd.it/1sdgu9x
@rStableDiffusion

6 views10:40

About

Blog

Apps

Platform