Media is too big
VIEW IN TELEGRAM
Made a 4 minute video with a 53 word single prompt, with my new video pipeline tool that goes from a simple or complex single prompt to a full video. I haven't fully tested the maximum length based on the context window I have but its a revolutionary product on consumer hardware. RTX 4090 laptop

https://redd.it/1sdn1ga
@rStableDiffusion
Testing LTX-Video 2.3 — 11 Models, PainterLTXV2 Workflow

# System Environment

|ComfyUI|v0.18.5 (7782171a)|
|:-|:-|
|GPU|NVIDIA RTX 5060 Ti (15.93 GB VRAM, Driver 595.79, CUDA 13.2)|
|CPU|Intel Core i3-12100F 12th Gen (4C/8T)|
|RAM|63.84 GB|
|Python|3.14.3|
|Torch|2.11.0+cu130|
|Triton|3.6.0.post26|
|Sage-Attn 2|2.2.0|

# Models Tested

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev.safetensors|43.0|
|ltx-2.3-22b-dev-fp8.safetensors|27.1|
|ltx-2.3-22b-dev-nvfp4.safetensors|20.2|
|ltx-2.3-22b-distilled.safetensors|43.0|
|ltx-2.3-22b-distilled-fp8.safetensors|27.5|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev\_transformer\_only\_fp8\_scaled.safetensors|21.9|
|ltx-2-3-22b-dev\_transformer\_only\_fp8\_input\_scaled.safetensors|23.3|
|ltx-2.3-22b-distilled\_transformer\_only\_fp8\_scaled.safetensors|21.9|
|ltx-2.3-22b-distilled\_transformer\_only\_fp8\_input\_scaled\_v3.safetensors|23.3|

**From** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|Model|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev-Q8\_0.gguf|21.2|
|ltx-2.3-22b-distilled-Q8\_0.gguf|21.2|

# Additional Components

**Text Encoders**

**From** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2/tree/main/split_files/text_encoders)

|File|Size (GB)|
|:-|:-|
|gemma\_3\_12B\_it\_fpmixed.safetensors|12.8|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy) **and** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|File|Size (GB)|
|:-|:-|
|ltx-2.3\_text\_projection\_bf16.safetensors|2.2|
|ltx-2.3-22b-dev\_embeddings\_connectors.safetensors|2.2|
|ltx-2.3-22b-distilled\_embeddings\_connectors.safetensors|2.2|

**LoRAs**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23) **and** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2)

|File|Size (GB)|Weight used|
|:-|:-|:-|
|ltx-2.3-22b-distilled-lora-384.safetensors|7.1|0.6 (dev models only)|
|ltx-2.3-id-lora-celebvhq-3k.safetensors|1.1|0.3 (all models)|

**VAE**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23) **/** [**Comfy-Org**](https://huggingface.co/Comfy-Org/ltx-2)

|File|Size (GB)|
|:-|:-|
|LTX23\_audio\_vae\_bf16.safetensors|0.3|
|LTX23\_video\_vae\_bf16.safetensors|1.4|

**From** [**Kijai**](https://huggingface.co/Kijai/LTX2.3_comfy) **and** [**unsloth**](https://huggingface.co/unsloth/LTX-2.3-GGUF)

|File|Size (GB)|
|:-|:-|
|ltx-2.3-22b-dev\_audio\_vae.safetensors|0.3|
|ltx-2.3-22b-dev\_video\_vae.safetensors|1.4|
|ltx-2.3-22b-distilled\_audio\_vae.safetensors|0.3|
|ltx-2.3-22b-distilled\_video\_vae.safetensors|1.4|

**Latent Upscale**

**From** [**Lightricks**](https://huggingface.co/collections/Lightricks/ltx-23)

|File|Size (GB)|
|:-|:-|
|ltx-2.3-spatial-upscaler-x2-1.1.safetensors|0.9|

# Workflow

The official workflows from [ComfyUI/Lightricks](https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3), [RuneXX](https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main), and unsloth (GGUF) all felt too bloated and unclear to work with comfortably. **But maybe I just didn't fully grasp the power of their parameters and the range of possibilities they offer.** I ended up basing everything on [princepainter's ComfyUI-PainterLTXV2](https://github.com/princepainter/ComfyUI-PainterLTXV2) — his combined dual KSampler node is great, and he has solid WAN-2.2 workflows too.

I haven't managed to get truly clean results yet, but I'm getting closer. Still not sure how others are pulling off such high-quality outputs.

Below is an example workflow for Dev models — kept as simple and readable as possible.

https://preview.redd.it/f8qx4rup3gtg1.png?width=1503&format=png&auto=webp&s=e35fb2346b79dd65a966a764fe406e4ae0c5f2c2

Not all videos are included here — only the ones I thought were the best (and even those are just decent in dev). Everything else, including all workflow files, is available on Google Drive with model names in the filenames: [**Google Drive
folder**](https://drive.google.com/drive/folders/1Hdm2dfRT62d0dDg5ldX1Wr8lazboRbW5?usp=sharing)

# Benchmark Results

Each model was run twice — first to load, second to measure time. With GGUF models something weird happened: upscale iteration time grew several times over, which inflated total generation time significantly.

**Dev — 1280x720, steps=35, cfg=3, fps=24, duration=10s (241 frames), no upscale** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/1bknutt85gtg1.png?width=1500&format=png&auto=webp&s=968daecc39d5bf57b6d1a05e472e099f3ae41e04

*Dev-FULL*

https://reddit.com/link/1sdgu9x/video/2ixoekc04gtg1/player

**Distilled — 1280x720, steps=15, cfg=1, fps=24, duration=10s (241 frames), no upscale** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/0ng8zas95gtg1.png?width=1500&format=png&auto=webp&s=138d310b69ba141556d38b79e25d507f254efc1a

*Distilled-FULL*

https://reddit.com/link/1sdgu9x/video/z9p7hn7a4gtg1/player

**Dev - Distilled + Upscale — input 960x544 → target 1920x1080, steps=8+4, cfg=1, fps=24, duration=10s (241 frames), upscale x2** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/3rpk26db5gtg1.png?width=1600&format=png&auto=webp&s=af9b5b39d90beab395dcf4592fffa07dc4030246

*Distilled-FP8+Upscale*

https://reddit.com/link/1sdgu9x/video/eby8rljl4gtg1/player

**Dev - Distilled transformer + GGUF + Upscale — input 960x544 → target 1920x1080, steps=8+4, cfg=1, fps=24, duration=10s (241 frames), upscale x2** samplers: euler | schedulers: linear\_quadratic

https://preview.redd.it/gd631mac5gtg1.png?width=1920&format=png&auto=webp&s=e8862a4fdfc18a90de0b83d2d9ec2b4d285638d1

*Distilled-gguf+Upscaler*

https://reddit.com/link/1sdgu9x/video/a4spdwi25gtg1/player

# Shameless Self-Promo

I built this node after finishing the tests — and honestly wish I had it during them. Would have made organizing and labeling output footage a lot easier.

[**Aligned Text Overlay Video**](https://github.com/Rogala/ComfyUI-rogala?tab=readme-ov-file#aligned-text-overlay-video)

Renders a multi-line text block onto every frame of a video tensor. Supports `%NodeTitle.param%` template tags resolved from the active ComfyUI prompt.

https://preview.redd.it/nepdj0h65gtg1.png?width=1829&format=png&auto=webp&s=c9ad0041e503ff3079d5d17047c34abcfde47002

Check out my GitHub page for a few more repos: [**github.com/Rogala**](https://github.com/Rogala)

https://redd.it/1sdgu9x
@rStableDiffusion
Just a Reminder: if you want ComfyUI to generate faster, just ask it! Add `--fast` to your starting parameters (your *.bat file), to get about 20-25% boost (depends on the model).

https://redd.it/1sdv4nz
@rStableDiffusion
FLUX.2 [dev] (FULL - not Klein) works really well in ComfyUI now!
https://redd.it/1sdyjcr
@rStableDiffusion
new models for prompt generation - Qwen3

While I do not provide the inferencing services anymore, i do like to train models. I took base model that does well in UGI leaderboards (its my favorite Qwen3 model because its hard to uncap a thinking model) , its small enough you can run on a potato, but sucks at writing prompts. I am lazy so i want to give an idea and get 1...maybe 10 prompts generated for me. Also they shouldn't read like stupid for image generation, the base model though abliterated couldn't figure it out.

So here's the first cut that solves the problem. I have compared the base model with tuned model and its much much better in writing prompts. Its subjective so I read the outputs. I was happy.

The safetensor version https://huggingface.co/goonsai-com/Qwen3-gabliterated-image-generation


GGUF version: https://huggingface.co/goonsai-com/Qwen3-gabliterated-image-generation-gguf


This stuff isn't even hard anymore but its hard in other ways.

I'd love to hear from you if it works for video as well as it does for writing image prompts. SO the way I do this is give it an instruction around the idea.

```
You have to write image generation prompts for images 1 to 4 with the following concepts. each prompts is independent of context to the image generation model.

{story or premise or idea}
```

https://redd.it/1sdvlan
@rStableDiffusion