LTX 2.3 ID-LoRA with First-Last Frame
The official ComfyUI ID-LoRA workflow for LTX-Video 2.3 only supports first-frame conditioning out of the box, which limits how much control you have over character motion and pose. I wanted to add last-frame support with minimal changes to the original — no restructuring, no new samplers, just surgical node edits. You can grab the modified workflow here.
What was changed:
The default workflow uses
Concretely:
1. Added last-frame preprocessing — two new nodes mirror the existing first-frame preprocessing pipeline: a
2. Low-res pass — The
3. High-res pass — Same conversion applied to the conditioning node after
4. New subgraph input — A
That's it — 2 node type swaps, 2 preprocessing nodes, 1 new input. Everything else (sampler, audio conditioning, LoRA stacking, the upscale pipeline) is untouched from the official Comfy Cloud release. Let me know if you have any questions. Cheers!
https://redd.it/1t71x0r
@rStableDiffusion
The official ComfyUI ID-LoRA workflow for LTX-Video 2.3 only supports first-frame conditioning out of the box, which limits how much control you have over character motion and pose. I wanted to add last-frame support with minimal changes to the original — no restructuring, no new samplers, just surgical node edits. You can grab the modified workflow here.
What was changed:
The default workflow uses
LTXVImgToVideoInplace (comfy-core) for image conditioning in both the low-res and high-res sampling passes. This node only handles a single frame at a fixed position. The fix was to swap both instances out for LTXVImgToVideoInplaceKJ from KJNodes, which supports multiple images at arbitrary frame positions in a single call.Concretely:
1. Added last-frame preprocessing — two new nodes mirror the existing first-frame preprocessing pipeline: a
ResizeImagesByLongerEdge (1536px) followed by LTXVPreprocess. These feed the last-frame image into both sampling passes.2. Low-res pass — The
LTXVImgToVideoInplace node was replaced with LTXVImgToVideoInplaceKJ configured for 2 images: first frame at position 0, last frame at position -1, both at strength 0.7. One node, both frames conditioned simultaneously.3. High-res pass — Same conversion applied to the conditioning node after
LTXVLatentUpsampler. Both frames re-conditioned at strength 1.0 so the last frame gets sharpened in the upscale pass just like the first frame. Without this step the last frame came out noticeably blurrier.4. New subgraph input — A
last_frame image input was added to the workflow's subgraph, wired to a LoadImage node on the canvas.That's it — 2 node type swaps, 2 preprocessing nodes, 1 new input. Everything else (sampler, audio conditioning, LoRA stacking, the upscale pipeline) is untouched from the official Comfy Cloud release. Let me know if you have any questions. Cheers!
https://redd.it/1t71x0r
@rStableDiffusion
Comfy
Comfy — Professional Control of Visual AI
Comfy is the AI creation engine for visual professionals who demand control over every model, every parameter, and every output.
Why is it that 3 years old SDXL is still the best base for porn checkpoints, where the best ones on civitai produce materially better images than the z image or flux porn checkpoints in terms of realism and skin texture?
https://redd.it/1t71cs5
@rStableDiffusion
https://redd.it/1t71cs5
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Continuous-Time Distribution Matching: A new SOTA method for step distillation.
https://redd.it/1t76p1t
@rStableDiffusion
https://redd.it/1t76p1t
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Continuous-Time Distribution Matching: A new SOTA method for step distillation.
Explore this post and more from the StableDiffusion community
Inpainting with LTXV 2.3. Results after two weeks of R&D.
Hello!
I am a designer at DOGMA, we do AI work for tv ads, shows and movies, a Netflix show we worked on recently came out on Netflix Ita, the company had the first meeting in Hollywood last month.
50% of our work is inpainting on videos, 100% of our work for Netflix was inpaintings, so I've spent the last few weeks doing R&D with LTXV 2.3 to see if and how the tool can help in the practical needs of the movie business. We strongly believe in the sociocultural importance of open-source.
First of all huge thanks to u/ltxmodel for becoming the main paladin of the democratization of open-source video generation tools and for the constant improvements on their model, the incredible HDR lora is something we were not expecting so soon, please keep up the amazing work; from our tests LTXV 2.3 T2V and I2V can be pushed locally up to 5K resolution, with results that have very little to envy from the closed-source Seedance 2. Congratulations also to u/RoundAwareness5490 for his outstanding experimental work and effort in creating loras that extend the capabilities of the main model.
Here is the recap of the R&D (translated from italian to eng).
\---
Method 1 / No inpainting LoRA:
You use Add Guide Multi with 2 reference frames, first and last, while the original video goes into VAE Encode. Then you apply an LTXV latent mask to the area that needs to be modified.
Problems: as always when using multiple guide inputs for inpainting, some parts flicker and do not match the original video, especially in the frames close to the first and last reference frames. There is no other way to provide reference frames with this method except by adding more entries in Add Guide Multi. In practice, it is a kind of denoise. It works very well if you do not need precision and can avoid reference frames, relying only on the prompt/lora.
\---
Method 2 / Inpainting with the model ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors:
The 3000-step version seems to be the only one that works most of the time.
This model is trained to take as input a video where the original video is on the right, with the part to be inpainted marked in magenta, and a small reference frame on the left. As output, it provides the final inpainted video using that reference. It does sometimes work also if you send as input the whole video with no reference and a white overlay on the masked area (similar to VACE).
Problems: it is excellent if you put Trump’s face in the small reference frame, but terrible if you need something precise, because the mini-frame is not even 200px wide, so it has no way to capture precise information. Adding Add Guide Multi partly solves this, but then you are back to the Add Guide Multi problem, meaning flickering and, above all, a mismatch with the original video close to the reference frames. Sending as input only the video with the purple masked area, with the first and last frames already set the way you want them, often, but not always, results in videos where the purple or white artifacts come back in form of smoke or solid color.
\--
Method 3 / Inpainting with the model
ltx23_inpaint_rank128_v1_02500steps.safetensors
or the model
ltx23_inpaint_rank128_v1_10000steps.safetensors
This model does in fact take the area to be inpainted in the same way VACE did. Here, it seems that the masked area should be white instead of purple. This LoRA does not support any kind of reference, so it is useful for inpainting based only on the prompt. Here too, Add Guide Multi can be used to force it to use start and end reference frames, with all the problems and inconsistencies of usage of the previous method.
I tried many variations for each method. For example, I tried passing only the video with the mask applied to all frames except the first and last. I tried using a KSampler Advanced to apply denoise only during the final steps. I tried raising the CFG up to 2.5. All these methods sometimes produce decent results, but never consistent ones.
Hello!
I am a designer at DOGMA, we do AI work for tv ads, shows and movies, a Netflix show we worked on recently came out on Netflix Ita, the company had the first meeting in Hollywood last month.
50% of our work is inpainting on videos, 100% of our work for Netflix was inpaintings, so I've spent the last few weeks doing R&D with LTXV 2.3 to see if and how the tool can help in the practical needs of the movie business. We strongly believe in the sociocultural importance of open-source.
First of all huge thanks to u/ltxmodel for becoming the main paladin of the democratization of open-source video generation tools and for the constant improvements on their model, the incredible HDR lora is something we were not expecting so soon, please keep up the amazing work; from our tests LTXV 2.3 T2V and I2V can be pushed locally up to 5K resolution, with results that have very little to envy from the closed-source Seedance 2. Congratulations also to u/RoundAwareness5490 for his outstanding experimental work and effort in creating loras that extend the capabilities of the main model.
Here is the recap of the R&D (translated from italian to eng).
\---
Method 1 / No inpainting LoRA:
You use Add Guide Multi with 2 reference frames, first and last, while the original video goes into VAE Encode. Then you apply an LTXV latent mask to the area that needs to be modified.
Problems: as always when using multiple guide inputs for inpainting, some parts flicker and do not match the original video, especially in the frames close to the first and last reference frames. There is no other way to provide reference frames with this method except by adding more entries in Add Guide Multi. In practice, it is a kind of denoise. It works very well if you do not need precision and can avoid reference frames, relying only on the prompt/lora.
\---
Method 2 / Inpainting with the model ltx23_inpaint_masked_r2v_rank32_v1_3000steps.safetensors:
The 3000-step version seems to be the only one that works most of the time.
This model is trained to take as input a video where the original video is on the right, with the part to be inpainted marked in magenta, and a small reference frame on the left. As output, it provides the final inpainted video using that reference. It does sometimes work also if you send as input the whole video with no reference and a white overlay on the masked area (similar to VACE).
Problems: it is excellent if you put Trump’s face in the small reference frame, but terrible if you need something precise, because the mini-frame is not even 200px wide, so it has no way to capture precise information. Adding Add Guide Multi partly solves this, but then you are back to the Add Guide Multi problem, meaning flickering and, above all, a mismatch with the original video close to the reference frames. Sending as input only the video with the purple masked area, with the first and last frames already set the way you want them, often, but not always, results in videos where the purple or white artifacts come back in form of smoke or solid color.
\--
Method 3 / Inpainting with the model
ltx23_inpaint_rank128_v1_02500steps.safetensors
or the model
ltx23_inpaint_rank128_v1_10000steps.safetensors
This model does in fact take the area to be inpainted in the same way VACE did. Here, it seems that the masked area should be white instead of purple. This LoRA does not support any kind of reference, so it is useful for inpainting based only on the prompt. Here too, Add Guide Multi can be used to force it to use start and end reference frames, with all the problems and inconsistencies of usage of the previous method.
I tried many variations for each method. For example, I tried passing only the video with the mask applied to all frames except the first and last. I tried using a KSampler Advanced to apply denoise only during the final steps. I tried raising the CFG up to 2.5. All these methods sometimes produce decent results, but never consistent ones.
The video that came out well yesterday was a complete fluke. If you change the mask by 1px, it may suddenly, randomly, come out well. Change the seed or change the mask by 1px, and the white or purple little clouds may come back.
\--
Besides, the author of the inpainting LoRA himself added a huge number of clarifications on the project page, which basically means: it does not work always perfectly without fiddling with parameters, which means we can use it but we can hardly pass a general workflow to a junior at the company to speed up production.
None of the official or unofficial workflows I found does the exact kind of work we need: replacing only one part of a video with something for which we provide an exact visual reference, eventually mixed with depth/canny masks, while keeping and matching the original input video exactly, both in terms of resolution and spatiotemporal coherence.
In all these cases, the only way to get back the original video with only the inpainted part changed is still to recomposite the model output over the original video using the mask. This happens because even if you run inference only on a masked part of the latent, your video will still pass through the VAE and therefore it will be modified. We knew this already, but we always keep hoping they will make an ad hoc model or nodes for this.
There are ways to solve it, and as you saw yesterday, somehow, sooner or later, you can get a result that works. But it requires too much time and too many attempts, at least based on what I have tested so far. What we need is an easy, fast, stable, consistent, and precisely customizable solution.
\---------------
I will start re-testing today VACE 2.1 and the experimental 2.2 merge to see how it compares, VACE 2.1 felt almost magical, you could feed it very complex videos with depth maps, reference frames, pose maps, masks, all nested in a single guiding video and with zero prompt you would get exactly what you were expecting, but its generation capabilities are too old for May 2026.
https://redd.it/1t77h3n
@rStableDiffusion
\--
Besides, the author of the inpainting LoRA himself added a huge number of clarifications on the project page, which basically means: it does not work always perfectly without fiddling with parameters, which means we can use it but we can hardly pass a general workflow to a junior at the company to speed up production.
None of the official or unofficial workflows I found does the exact kind of work we need: replacing only one part of a video with something for which we provide an exact visual reference, eventually mixed with depth/canny masks, while keeping and matching the original input video exactly, both in terms of resolution and spatiotemporal coherence.
In all these cases, the only way to get back the original video with only the inpainted part changed is still to recomposite the model output over the original video using the mask. This happens because even if you run inference only on a masked part of the latent, your video will still pass through the VAE and therefore it will be modified. We knew this already, but we always keep hoping they will make an ad hoc model or nodes for this.
There are ways to solve it, and as you saw yesterday, somehow, sooner or later, you can get a result that works. But it requires too much time and too many attempts, at least based on what I have tested so far. What we need is an easy, fast, stable, consistent, and precisely customizable solution.
\---------------
I will start re-testing today VACE 2.1 and the experimental 2.2 merge to see how it compares, VACE 2.1 felt almost magical, you could feed it very complex videos with depth maps, reference frames, pose maps, masks, all nested in a single guiding video and with zero prompt you would get exactly what you were expecting, but its generation capabilities are too old for May 2026.
https://redd.it/1t77h3n
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Revisiting WAN 2.2 for real-person realism, consented LoRA, retuned settings
https://redd.it/1t7cnaj
@rStableDiffusion
https://redd.it/1t7cnaj
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Revisiting WAN 2.2 for real-person realism, consented LoRA, retuned settings
Explore this post and more from the StableDiffusion community
Z-Image Turbo for character LoRAs — honest comparison vs Flux after training the same character on both
https://redd.it/1t7g8de
@rStableDiffusion
https://redd.it/1t7g8de
@rStableDiffusion