Whats the verdict on Sage Attention 3 now? or stick with Sage 2.2?

I use Image Z Turbo, Wan 2.2 and LTX 2.3

I noticed that Sage Attention 3 altered the dress in a video of a dancing woman to a trousers when using LTX 2.3, I switched to Sage 2.2 and also tried disabling it and the issue was fixed

I actually thought it was the GGUF text encoder that causes the dress to turn into a pants but to my surprise it was Sage 3 that was causing it.

I went back to 2.2 only lost a few seconds speed by the quality was like if it' was disabled very good.

https://redd.it/1s73r4e
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
I went from being a total dummy at ComfyUi to generating this I2V using LTX 2.3, I feel so proud of myself.

https://redd.it/1s76eod
@rStableDiffusion
What can you do if your hardware can generate 15,000 token/s?

[https://taalas.com/](https://taalas.com/)

Demo:

[https://chatjimmy.ai/](https://chatjimmy.ai/)

Saw this posted from r/Qwen_AI and r/LocalLLM today. I also remember seeing this from a few years ago when they first published their studies, but completely forgot about it.

Basically instead of inference on a graphics card where models are loaded onto memory, we burn the model into hardware. Remember CDs? It is cheap to build this compare to GPUs, they are using 6nm chips instead of the latest tech, no memories needed! The biggest downside is you can't swap models, there is no flexibility.

Thoughts? Would this making live streaming AI movies, games possible? You can have a MMO where every single npc have their own unique dialog with no delay for thousands of players.

What a crazy world we live in.

https://redd.it/1s77t1e
@rStableDiffusion
I see many people praising Klein, Zimage (turbo, base), and other models. But few examples. Please post here what you consider to represent the pinnacle of each model. Especially for photorealism.
https://redd.it/1s7ahcc
@rStableDiffusion
I developed an LTX 2.3 program based on the desktop version of LTX, with optimizations that bypass the 32GB VRAM limitation. It integrates features such as start/end frames, text-to-video, image-to-video, lip-sync, and video enhancement. The links are in the comments.
https://redd.it/1s7g50w
@rStableDiffusion
Z-image character lora great success with onetrainer with these settings.

For z-image base.

Onetrainer github: https://github.com/Nerogar/OneTrainer

Go here https://civitai.com/articles/25701 and grab the file named z-image-base-onetrainer.json from the resources section. I can't share the results because reasons but give it a try, it blew my mind. Made it from random tips i also read on multiple subs so I thought I'd share it back.

I used around 50 images captioned briefly ( trigger. expression. Pose. Angle. Clothes. Background - 2-3 words each ) ex: "Natasha. Neutral expression. Reclined on sofa. Low angle handheld selfie. Wearing blue dress. Living room background."

Poses, long shots, low angles, high angles, selfies, positions, expressions, everything works like a charm (provided you captioned for them in your dataset).

Would be great if I found something similar for Chroma next.

My contribution is configured it so it works with 1024 res images since most of the guides I see are for 512.

Works incredible with generating at FHD; i use the distill lora with 8 steps so its reasonably fast: workflow: https://pastebin.com/UacpHZUG

There are more tips on how to setup the dataset and captions in the article as well as well if interested.

https://redd.it/1s7fr2b
@rStableDiffusion
Inspired by u/goddesspeeler's work, I created a "VACE Transition Builder" node.

u/goddess
peeler shared a great workflow he did yesterday.
It allows entering the path to a folder and having all the clips stitched together using VACE.

This works amazingly well and thought of converting it into a node instead.

https://preview.redd.it/hbth1oy1f4sg1.png?width=1891&format=png&auto=webp&s=7c1b496afabd1947dcb1e0bcccd8fb2b9812d802

For those that haven't seen his post. It basically allow creating automatic transitions between clips and then stitching them all together. Making long video generation a breeze. This node aims to replicate his workflow, but with the added bonus of being more streamlined and allowing for easy clip selection or re-ordering. Mousing over a clip shows a preview if it.

The option node is only needed if you want to tweak the defaults. When not added it uses the same defaults found in the workflow. I plan on exposing some of these to the comfy preferences, so we could make changes to what the defaults are.

You can find this node here
Hats off again to goddess_peeler for a great solution!

I'm still unsure about the name though..
I hesitated between this or VACE Stitcher... any preference? 😅

https://redd.it/1s7ilwe
@rStableDiffusion
Can LTX-2.3 do video to video, like LTX-2?

A great feature of LTX-2 is that it can take a video sequence as input, and use the voices and motions in it as seed for generating a new video starting with the last frame.

Can LTX-2.3 do that too? I haven't seen a workflow yet that does this.

https://redd.it/1s7ixma
@rStableDiffusion