r/StableDiffusion – Telegram

r/StableDiffusion

@rStableDiffusion

57 subscribers

45.7K photos

3.01K videos

1 file

21.2K links

reddit.com/r/StableDiffusion || reddit.com/r/sdforall

@reddit2telegram || @r_channels

Download Telegram

About

Blog

Apps

Platform

r/StableDiffusion

r/StableDiffusion

9 views10:40

r/StableDiffusion

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8, now supports more models

https://github.com/woct0rdho/ComfyUI-FeatherOps

There was not much update on the kernel itself since March, and I did a lot on ComfyUI integration. Currently tested models are Anima, LTX 2.3, Qwen-Image, Wan, and other models may also work out of the box. For some workloads you may see 30~50% speedup, but your mileage may vary.

https://redd.it/1tn0noo
@rStableDiffusion

GitHub - woct0rdho/ComfyUI-FeatherOps: Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8

Fast fp16-fp8 mixed precision matmul on RDNA3/3.5 GPUs without native fp8 - woct0rdho/ComfyUI-FeatherOps

8 views11:40

r/StableDiffusion

Super detailed comparaison between klein-4b ; nucleus-image ; z-image-turbo ; sana-1.5-1.6b & qwen-image-gen

https://preview.redd.it/jlzq6sumba3h1.png?width=2496&format=png&auto=webp&s=5e384a54de5831ed5041b0ddbcbe435739d8f0d2

The gallery showcases images for all models for 192 prompts.

Full gallery here: https://imagebench.ai/gallery?v=shhhhhssshs.ssssss

Let me know which model to test next!

https://redd.it/1tn97ls
@rStableDiffusion

7 views14:40

r/StableDiffusion

Want to pose your characters? Here's Wan 2.2 Pose Control workflow

https://i.redd.it/2qr1rvpwma3h1.gif

# Wan 2.2 Pose Control

For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: Wan2.2 I2V Video. Character consistency is something you can expect from a video model, right?

After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image_1 into pose from image_2".

Here's the workflow link for the impatient.

So, our task sounds like this:
"Take this character on the left and make her copy the pose on the right"

https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960

There are two ways to do this using local open-weight models:

1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (this is what this post is about)

And this is what the result looks like for each method:

https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be

Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.

https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2

The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:

1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame

Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.

And yes, we generate 80 frames just to get the single image.

# How to write structured prompt

Here's two prompts that were used in the example video above:

Silver hair woman

0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background

Black beard man

0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet

Subject description is repeated so we can extract it using Apply Text Template from comfy-mtb extension.

https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b

We can extract subject description and get this template:

Silver hair woman

0s:

Wan 2.2 Pose Control - v1.1 | Wan Video Workflows | Civitai

Workflow that lets you pose characters using First-Frame-Last-Frame with target pose as the last frame. For some time I've been trying to solve cha...

5 views15:40

r/StableDiffusion

Want to pose your characters? Here's Wan 2.2 Pose Control workflow

https://i.redd.it/2qr1rvpwma3h1.gif

# Wan 2.2 Pose Control

For some time I've been trying to solve character posing with open-weight models. My previous attempt with Flux.2 Klein was reasonably good but suffered from style bleeding and didn't respect original character proportions (like head-to-body ratio). Character consistency is something image-editing models still struggle with (especially for stylized characters) but there's one exception: **Wan2.2 I2V Video**. Character consistency is something you can expect from a video model, right?

After extensive experiments with the I2V Wan model I discovered a certain prompting technique that lets you "put character from image\_1 into pose from image\_2".

Here's [the workflow link](https://civitai.com/models/2650202/wan-22-pose-control) for the impatient.

So, our task sounds like this:
**"Take this character on the left and make her copy the pose on the right"**

https://preview.redd.it/lxny73n0na3h1.png?width=1309&format=png&auto=webp&s=6183937e4a60a5f7aabd3b5d5d46d8f784a5f960

There are two ways to do this using local open-weight models:

1. Flux.2 Klein character replacement workflow
2. Wan 2.2 Pose Control workflow (**this is what this post is about**)

And this is what the result looks like for each method:

https://preview.redd.it/pk5u35r7na3h1.png?width=1446&format=png&auto=webp&s=d68aa0f2c59032f2971f5502802ae3240199f3be

Let's compare the results with with closed-source models too. Character design is solved but not style fidelity. I guess even big multimodal image-editing models can't reach true character consistency while for video models, it's just an innate property.

https://preview.redd.it/az2b2uq9na3h1.png?width=1334&format=png&auto=webp&s=966e35e80e2d36605f7830d2acbe7bc34437e9a2

The idea is simple: ask Wan 2.2 to generate a sequence of 80 frames using First-Frame-Last-Frame mode. This frame sequence consists of 4 parts:

1. The subject is just standing there
2. The subject moves copying pose of pose reference
3. The subject character morphs into character from the pose reference
4. The character from pose reference is in the frame

Our goal here is to get a single frame where our subject is standing/sitting/lying in the pose from the pose reference image, but hasn't yet morphed into character from the pose reference image. And to do that we have to structure our text prompt in such a way that makes transition from the first frame to the last frame as smooth as possible. So, Information about the subject (design and style) and information about the pose meet in the middle of the frame sequence to give us the desired result.

And yes, **we generate 80 frames just to get the single image**.

# How to write structured prompt

Here's two prompts that were used in the example video above:

Silver hair woman

0s: girl with short silver hair, in green pleated skirt and leather boots is standing
1s: girl with short silver hair, in green pleated skirt and leather boots turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background

Black beard man

0s: black man with sharp teeth in green suit and dark pants is standing at white background
1s: black man with sharp teeth in green suit and dark pants sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character short orange dress, orange top hat, brown hair and fishnet

Subject description is repeated so we can extract it using `Apply Text Template` from **comfy-mtb** extension.

https://preview.redd.it/dxq3d6wgna3h1.png?width=1028&format=png&auto=webp&s=e857cd1f4208b628cf4d647f44f425c6f180ce3b

We can extract subject description and get this template:

Silver hair woman

0s:

Wan 2.2 Pose Control - v1.1 | Wan Video Workflows | Civitai

Workflow that lets you pose characters using First-Frame-Last-Frame with target pose as the last frame. For some time I've been trying to solve cha...

6 views15:40

r/StableDiffusion

{var_1} is standing
1s: {var_1} turns to the left, kneels, places left hand on her head, puts right hand between her legs
2s: she keeps her pose frozen in place. Scene transitions into another scene
3s: her body transforms into another character with white skin, bald head at white background

Black beard man

0s: {var_1} is standing at white background
1s: {var_1} sits in the armchair with tilted head and hand at his chin, crosses legs
2s: he keeps his pose frozen in place. Scene transitions into another scene
3s: his body transforms into another character short orange dress, orange top hat, brown hair and fishnet

Let's examine 4 parts of this prompt.

**0s - Initial description**

This is where you describe your first frame. For the most part, '`is standing`' is enough but you can also specify initial pose of your subject.

**1s - Actual posing**

This is where you specify the movements the subject must take to get from initial pose to target pose. Simple movements (turns left, sits down, crouches, raises hand) separated by comma, works the best. Also you can add '`Camera follows his movement`' if your target pose requires different camera angle.

**2s - Pause before scene transition**

Always the same `he/she keeps his pose frozen in place. Scene transitions into another scene`. This part "`Scene transitions into another scene`" is the most important here - Wan 2.2 respects this boundary (surprisingly).

**3s - Anchoring your last frame**

Goes like this: `body transforms into another character <description of the character on the last frame>`. We want Wan 2.2 to understand that character from the start of the video is different from character at the end of the video.

# Practical example

Let's practice what we've learned. Here's our subject and the pose images:

[\*Pose reference](https://preview.redd.it/5afo9gaona3h1.png?width=1621&format=png&auto=webp&s=e3ee78fcc158b27c67914acda186ce09a982faa4)

Start with the subject description. Nothing fancy here:

https://preview.redd.it/rl0ech4jpa3h1.png?width=713&format=png&auto=webp&s=88b33e2cc5805583d9ba2949985f4b3b125b6b73

Next step is to describe movements:

https://preview.redd.it/jb429dylpa3h1.png?width=812&format=png&auto=webp&s=5a838f95b868f2cc1069fded9ba8f0935dbdc672

And lastly write the transition to the last frame

https://preview.redd.it/x474ueqopa3h1.png?width=793&format=png&auto=webp&s=1f820926f036b8471cad89846cfaedb379dc91c1

Unfortunately it fails:

https://i.redd.it/z9y1iogtpa3h1.gif

Wan 2.2 has managed to capture the gun's position but not the pose. The main reason here is that the black clothes in our target image don't let the model "process" the pose. Luckily we can fix it in Flux.2:

`remove hair, remove clothes and draw this person bald and in skin tone underwear. Turn into white wireframe figure`

https://preview.redd.it/tok1fngwpa3h1.png?width=313&format=png&auto=webp&s=57e8f02dcb414fba5819c87c1add4cff0a5fbab5

Run Pose Control workflow again with updated prompt:

https://preview.redd.it/s2w5ytcxpa3h1.png?width=726&format=png&auto=webp&s=6c68a52a10ce001302375d7ab1b621a8ffdb7c25

This time result is much better:

https://i.redd.it/g4olola1qa3h1.gif

With this knowledge you can adapt this workflow for your specific case.

[Link to the workflow](https://civitai.com/models/2650202/wan-22-pose-control) (it has note about recommended Wan 2.2 finetune)

Some tips:

* The whole process works the best if there's noticeable contrast between first frame and last frame: different hair color, skin color, background, etc. You can even pre-process your pose reference with some other model - turn it into wireframe figure mannequin - so Wan 2.2 has a better chance of reading the pose.
* If some elements of character design change (gloves tend to disappear too early) add them to subject description prompt so model will remember this design element.
* If your subject image and pose reference image have different sizes try adding "Camera zooms in capturing new view" or "Camera zooms out capturing new

6 views15:40

r/StableDiffusion

view".

https://redd.it/1tnbikd
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views15:40

r/StableDiffusion

"Trauma" A dark and dramatic animated film (Wan 2.2 ComfyUI)
https://youtu.be/mms1tLikH58

https://redd.it/1tndb3d
@rStableDiffusion

185 | "Trauma" | A dark and dramatic animated film (Wan 2.2 ComfyUI) [4K]

"Trauma" Check out this dark and dramatic experimental music video animated with local Wan 2.2 text2video, exploring a retro anime aesthetic. This is an animation experiment where the director desperately tries to manipulate the viewer’s emotions in the most…

8 views16:40

r/StableDiffusion

LTX 2.3 12GB GGUF Director Workflows! What a great node this one is!
https://redd.it/1tncun2
@rStableDiffusion

7 views17:40

r/StableDiffusion

ComfyUI node for NVIDIA PiD pixel diffusion decoding

https://redd.it/1tneayo
@rStableDiffusion

From the StableDiffusion community on Reddit: ComfyUI node for NVIDIA PiD pixel diffusion decoding

Explore this post and more from the StableDiffusion community

7 views18:40

r/StableDiffusion

6 views18:40

r/StableDiffusion

Make any video into VR with Muffins flat 2 VR!
https://youtu.be/c5Dj_0qZBLs

https://redd.it/1tnjwm4
@rStableDiffusion

Make any video into VR with Muffins flat 2 VR!

The workflow uses LTX 2.3 to expand/outpaint the original video into a wider panoramic canvas, then applies the panoramic/fisheye conversion pass and refines the result. I also show the optional depth-based 2D-to-3D SBS branch, the LTX enhancer/upscaler section…

7 views20:40

r/StableDiffusion

Testing ZIT and Flux-1 with "NVIDIA PiD — Pixel Diffusion Decoder"

https://redd.it/1tnk3hg
@rStableDiffusion

From the StableDiffusion community on Reddit: Testing ZIT and Flux-1 with "NVIDIA PiD — Pixel Diffusion Decoder"

Explore this post and more from the StableDiffusion community

9 views21:40

r/StableDiffusion

Anima style explorer + Anima lora explorer

https://redd.it/1tnlz59
@rStableDiffusion

From the StableDiffusion community on Reddit: Anima style explorer + Anima lora explorer

Explore this post and more from the StableDiffusion community

8 views22:40

r/StableDiffusion

7 views22:40

r/StableDiffusion

This media is not supported in your browser

VIEW IN TELEGRAM

ScreenDiffusion V0.2 Released - Major Refactoring of V0.1 - Easy Install - Open Source.

https://redd.it/1tnkab6
@rStableDiffusion

9 views23:40

r/StableDiffusion

Icarus
https://redd.it/1tnnz1c
@rStableDiffusion

9 views00:40