This media is not supported in your browser
VIEW IN TELEGRAM
Nvidia solved VAE? Fast and High-Resolution Latent Decoding
with Pixel Diffusion

https://redd.it/1tn3m6n
@rStableDiffusion
FeatherOps: Fast fp8 matmul on RDNA3 without native fp8, now supports more models

https://github.com/woct0rdho/ComfyUI-FeatherOps

There was not much update on the kernel itself since March, and I did a lot on ComfyUI integration. Currently tested models are Anima, LTX 2.3, Qwen-Image, Wan, and other models may also work out of the box. For some workloads you may see 30~50% speedup, but your mileage may vary.

https://redd.it/1tn0noo
@rStableDiffusion
Super detailed comparaison between klein-4b ; nucleus-image ; z-image-turbo ; sana-1.5-1.6b & qwen-image-gen

https://preview.redd.it/jlzq6sumba3h1.png?width=2496&format=png&auto=webp&s=5e384a54de5831ed5041b0ddbcbe435739d8f0d2

The gallery showcases images for all models for 192 prompts.

Full gallery here: https://imagebench.ai/gallery?v=shhhhhssshs.ssssss

Let me know which model to test next!

https://redd.it/1tn97ls
@rStableDiffusion
Want to pose your characters? Here's Wan 2.2 Pose Control workflow

https://i.redd.it/2qr1rvpwma3h1.gif

# Wan 2.2 Pose Control

For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: Wan2.2 I2V Video. Character consistency is something you can expect from a video model, right?

After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image_1 into pose from image_2".

Here's the workflow link for the impatient.

So, our task sounds like this:
"Take this character on the left and make her copy the pose on the right"

https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960

There are two ways to do this using local open-weight models:

1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (this is what this post is about)

And this is what the result looks like for each method:

https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be

Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.

https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2

The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:

1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame

Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.

And yes, we generate 80 frames just to get the single image.

# How to write structured prompt

Here's two prompts that were used in the example video above:

Silver hair woman

0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background


Black beard man

0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet


Subject description is repeated so we can extract it using Apply Text Template from comfy-mtb extension.

https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b

We can extract subject description and get this template:

Silver hair woman

0s:
Want to pose your characters? Here's Wan 2.2 Pose Control workflow

https://i.redd.it/2qr1rvpwma3h1.gif

# Wan 2.2 Pose Control

For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: **Wan2.2 I2V Video**. Character consistency is something you can expect from a video model, right?

After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image\_1 into pose from image\_2".

Here's [the workflow link](https://civitai.com/models/2650202/wan-22-pose-control) for the impatient.

So, our task sounds like this:
**"Take this character on the left and make her copy the pose on the right"**

https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960

There are two ways to do this using local open-weight models:

1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (**this is what this post is about**)

And this is what the result looks like for each method:

https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be

Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.

https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2

The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:

1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame

Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.

And yes, **we generate 80 frames just to get the single image**.

# How to write structured prompt

Here's two prompts that were used in the example video above:

Silver hair woman

0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background


Black beard man

0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet


Subject description is repeated so we can extract it using `Apply Text Template` from **comfy-mtb** extension.

https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b

We can extract subject description and get this template:

Silver hair woman

0s: