ComfyUI Tutorial: LTX 2.3 Just Got Better With Timeline Control On 6GB VRAM
https://youtu.be/L7eE5z4Ih_0
https://redd.it/1tn29fs
@rStableDiffusion
https://youtu.be/L7eE5z4Ih_0
https://redd.it/1tn29fs
@rStableDiffusion
YouTube
ComfyUI Tutorial: LTX Just Got Better With Timeline Control On 6GB VRAM #comfyui #comfyuitutorial
Hello everyone, in this tutorial we'll explore LTX Director, a powerful tool for segmented video generation within the LTX 2.3 model. This custom comfyui workflow integrates images, text prompts, and audio files to create dynamic sequences, boosting your…
This media is not supported in your browser
VIEW IN TELEGRAM
Nvidia solved VAE? Fast and High-Resolution Latent Decoding
with Pixel Diffusion
https://redd.it/1tn3m6n
@rStableDiffusion
with Pixel Diffusion
https://redd.it/1tn3m6n
@rStableDiffusion
my experience with generating non anime art with the anima base model so far
https://redd.it/1tn39e4
@rStableDiffusion
https://redd.it/1tn39e4
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: my experience with generating non anime art with the anima base model so far
Explore this post and more from the StableDiffusion community
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8, now supports more models
https://github.com/woct0rdho/ComfyUI-FeatherOps
There was not much update on the kernel itself since March, and I did a lot on ComfyUI integration. Currently tested models are Anima, LTX 2.3, Qwen-Image, Wan, and other models may also work out of the box. For some workloads you may see 30~50% speedup, but your mileage may vary.
https://redd.it/1tn0noo
@rStableDiffusion
https://github.com/woct0rdho/ComfyUI-FeatherOps
There was not much update on the kernel itself since March, and I did a lot on ComfyUI integration. Currently tested models are Anima, LTX 2.3, Qwen-Image, Wan, and other models may also work out of the box. For some workloads you may see 30~50% speedup, but your mileage may vary.
https://redd.it/1tn0noo
@rStableDiffusion
GitHub
GitHub - woct0rdho/ComfyUI-FeatherOps: Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8
Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8 - woct0rdho/ComfyUI-FeatherOps
Super detailed comparaison between klein-4b ; nucleus-image ; z-image-turbo ; sana-1.5-1.6b & qwen-image-gen
https://preview.redd.it/jlzq6sumba3h1.png?width=2496&format=png&auto=webp&s=5e384a54de5831ed5041b0ddbcbe435739d8f0d2
The gallery showcases images for all models for 192 prompts.
Full gallery here: https://imagebench.ai/gallery?v=shhhhhssshs.ssssss
Let me know which model to test next!
https://redd.it/1tn97ls
@rStableDiffusion
https://preview.redd.it/jlzq6sumba3h1.png?width=2496&format=png&auto=webp&s=5e384a54de5831ed5041b0ddbcbe435739d8f0d2
The gallery showcases images for all models for 192 prompts.
Full gallery here: https://imagebench.ai/gallery?v=shhhhhssshs.ssssss
Let me know which model to test next!
https://redd.it/1tn97ls
@rStableDiffusion
Want to pose your characters? Here's Wan 2.2 Pose Control workflow
https://i.redd.it/2qr1rvpwma3h1.gif
# Wan 2.2 Pose Control
For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: Wan2.2 I2V Video. Character consistency is something you can expect from a video model, right?
After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image_1 into pose from image_2".
Here's the workflow link for the impatient.
So, our task sounds like this:
"Take this character on the left and make her copy the pose on the right"
https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960
There are two ways to do this using local open-weight models:
1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (this is what this post is about)
And this is what the result looks like for each method:
https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be
Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.
https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2
The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:
1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame
Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.
And yes, we generate 80 frames just to get the single image.
# How to write structured prompt
Here's two prompts that were used in the example video above:
Silver hair woman
0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background
Black beard man
0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet
Subject description is repeated so we can extract it using
https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b
We can extract subject description and get this template:
Silver hair woman
0s:
https://i.redd.it/2qr1rvpwma3h1.gif
# Wan 2.2 Pose Control
For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: Wan2.2 I2V Video. Character consistency is something you can expect from a video model, right?
After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image_1 into pose from image_2".
Here's the workflow link for the impatient.
So, our task sounds like this:
"Take this character on the left and make her copy the pose on the right"
https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960
There are two ways to do this using local open-weight models:
1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (this is what this post is about)
And this is what the result looks like for each method:
https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be
Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.
https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2
The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:
1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame
Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.
And yes, we generate 80 frames just to get the single image.
# How to write structured prompt
Here's two prompts that were used in the example video above:
Silver hair woman
0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background
Black beard man
0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet
Subject description is repeated so we can extract it using
Apply Text Template from comfy-mtb extension.https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b
We can extract subject description and get this template:
Silver hair woman
0s:
Civitai
Wan 2.2 Pose Control - v1.1 | Wan Video Workflows | Civitai
Workflow that lets you pose characters using First-Frame-Last-Frame with target pose as the last frame. For some time I've been trying to solve cha...