Qwen Image 2 papers - does that mean anything?
https://huggingface.co/papers/2605.10730
https://preview.redd.it/cmg25rw5ro0h1.png?width=1990&format=png&auto=webp&s=94f7e04f28fbaaccd504dd2502af38b798e59aae
https://preview.redd.it/vyloqa9nro0h1.png?width=1618&format=png&auto=webp&s=175ee402bff154bca8d691e5ef4c2102d5c8f5a3
"We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios.
Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities.
The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models."
https://redd.it/1taxowh
@rStableDiffusion
https://huggingface.co/papers/2605.10730
https://preview.redd.it/cmg25rw5ro0h1.png?width=1990&format=png&auto=webp&s=94f7e04f28fbaaccd504dd2502af38b798e59aae
https://preview.redd.it/vyloqa9nro0h1.png?width=1618&format=png&auto=webp&s=175ee402bff154bca8d691e5ef4c2102d5c8f5a3
"We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution photorealism, robust instruction following, and efficient deployment, especially in text-rich and compositionally complex scenarios.
Qwen-Image-2.0 addresses these challenges by coupling Qwen3-VL as the condition encoder with a Multimodal Diffusion Transformer for joint condition-target modeling, supported by large-scale data curation and a customized multi-stage training pipeline. This enables strong multimodal understanding while preserving flexible generation and editing capabilities.
The model supports instructions of up to 1K tokens for generating text-rich content such as slides, posters, infographics, and comics, while significantly improving multilingual text fidelity and typography. It also enhances photorealistic generation with richer details, more realistic textures, and coherent lighting, and follows complex prompts more reliably across diverse styles. Extensive human evaluations show that Qwen-Image-2.0 substantially outperforms previous Qwen-Image models in both generation and editing, marking a step toward more general, reliable, and practical image generation foundation models."
https://redd.it/1taxowh
@rStableDiffusion
What nobody tells you about retouching shiny stuff (and how AI quietly changed my workflow)
https://www.reddit.com/gallery/1tb1eyg
https://redd.it/1tb1i6c
@rStableDiffusion
https://www.reddit.com/gallery/1tb1eyg
https://redd.it/1tb1i6c
@rStableDiffusion
Reddit
From the comfyui community on Reddit: What nobody tells you about retouching shiny stuff (and how AI quietly changed my workflow)
Explore this post and more from the comfyui community
INT8 in the age of MXFP8. An investigation into the quality of various quantization types, and their speed.
I've seen some MXFP8 posts recently, so I've been wondering how it compares against other quant types.
Most interesting to me is the comparison against INT8, which unlike MXFP8, has been hardware accelerated since the RTX 20 series.
So I've spent the past week testing how INT8 via my comfy node "INT8-Fast" compares.
PS: All of the text here is human written, and reflects my own conclusions, with the exception of a single clearly marked paragraph.
TLDR: The rough ranking for the quantization quality tested is GGUF Q8 > INT8 ConvRot > MXFP8 > FP8 >= INT8 Row.
#Quick glossary:
INT8: A data type storing numbers from -128 to 127. Like FP8 but using integers.
INT8 Row-wise: A slightly fancier way to store INT8 weights and activation with more granularity.
INT8 Tensor-Wise: The easiest and lowest quality way to do INT8.
INT8 ConvRot: It's row-wise INT8, but the model and activations are rotated in a way that removes outliers before quantization. Reference paper here
Explaining what the measurements do (AI):
SNR dB: "How loud is the real signal compared to the static/noise the quantization added?"
Cosine Similarity (Cos-sim): "Are the quantized latents pointing in the same direction as the originals, even if they're a slightly different size?"
Rel-RMSE: "On average, how wrong is each value, as a percentage of how big the values actually are?"
/end of AI explanation
#Methodology:
What I did is to capture the cond/uncond latents at every step of the inference process with a modified KSampler node. Then I compare it against the unquantized BF16 baseline model.
These tests are run with the ~latest comfy on an RTX3090
#Results:
Anima, 100 samples at 1MP resolution, 25 steps.
| Metric | INT8 ConvRot | INT8 Row | INT8 Row Bedovyy | INT8 Tensor Silver | FP8 | GGUF_Q8 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.09032 ±0.00626 ★ | 0.13396 ±0.00720 | 0.13084 ±0.00920 | 0.23802 ±0.01011 | 0.14523 ±0.00679 | 0.12124 ±0.00714 |
| SNR dB ↑ | 24.05 ±0.53 ★ | 19.68 ±0.39 | 20.24 ±0.52 | 14.48 ±0.36 | 19.66 ±0.35 | 21.98 ±0.46 |
| Cos-sim ↑ | 0.992165 ±0.001113 ★ | 0.984617 ±0.001780 | 0.984765 ±0.002368 | 0.957751 ±0.003461 | 0.981587 ±0.001878 | 0.985553 ±0.001704 |
----
Z-Image turbo, 64 samples, 0.5MP resolution, 8 steps:
| Metric | GGUF_Q8 | INT8 ConvRot | INT8 Row | MXFP8 |
| :--- | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.16740 ±0.00628 ★ | 0.19634 ±0.00660 | 0.35659 ±0.00968 | 0.30729 ±0.00645 |
| SNR dB ↑ | 16.42 ±0.29 ★ | 14.86 ±0.26 | 9.27 ±0.23 | 10.59 ±0.18 |
| Cos-sim ↑ | 0.978215 ±0.001696 ★ | 0.971225 ±0.001920 | 0.916394 ±0.004070 | 0.935860 ±0.002428 |
---
HiDream O1, 16 samples, 0.5MP resolution, 24 steps
FP8 Naive refers to using a BF16 checkpoint with the dtype set to FP8, which naively casts most weights to FP8.
| Metric | FP8Naive | [FP8 Scaled](https://huggingface.co/Comfy-Org/HiDream-O1-Image/blob/main/checkpoints/hidreamo1imagedevfp8scaled.safetensors) | INT8 ConvRot | INT8 Row | MXFP8 |
| :--- | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.23140 ±0.03353 | 0.08793 ±0.01196 | 0.06738 ±0.00849 ★ | 0.40533 ±0.03865 | 0.09269 ±0.00912 |
| SNR dB ↑ | 14.86 ±1.00 | 22.98 ±0.91 | 25.65 ±0.85 ★ | 8.77 ±0.76 | 22.65 ±0.79
I've seen some MXFP8 posts recently, so I've been wondering how it compares against other quant types.
Most interesting to me is the comparison against INT8, which unlike MXFP8, has been hardware accelerated since the RTX 20 series.
So I've spent the past week testing how INT8 via my comfy node "INT8-Fast" compares.
PS: All of the text here is human written, and reflects my own conclusions, with the exception of a single clearly marked paragraph.
TLDR: The rough ranking for the quantization quality tested is GGUF Q8 > INT8 ConvRot > MXFP8 > FP8 >= INT8 Row.
#Quick glossary:
INT8: A data type storing numbers from -128 to 127. Like FP8 but using integers.
INT8 Row-wise: A slightly fancier way to store INT8 weights and activation with more granularity.
INT8 Tensor-Wise: The easiest and lowest quality way to do INT8.
INT8 ConvRot: It's row-wise INT8, but the model and activations are rotated in a way that removes outliers before quantization. Reference paper here
Explaining what the measurements do (AI):
SNR dB: "How loud is the real signal compared to the static/noise the quantization added?"
Cosine Similarity (Cos-sim): "Are the quantized latents pointing in the same direction as the originals, even if they're a slightly different size?"
Rel-RMSE: "On average, how wrong is each value, as a percentage of how big the values actually are?"
/end of AI explanation
#Methodology:
What I did is to capture the cond/uncond latents at every step of the inference process with a modified KSampler node. Then I compare it against the unquantized BF16 baseline model.
These tests are run with the ~latest comfy on an RTX3090
#Results:
Anima, 100 samples at 1MP resolution, 25 steps.
| Metric | INT8 ConvRot | INT8 Row | INT8 Row Bedovyy | INT8 Tensor Silver | FP8 | GGUF_Q8 |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.09032 ±0.00626 ★ | 0.13396 ±0.00720 | 0.13084 ±0.00920 | 0.23802 ±0.01011 | 0.14523 ±0.00679 | 0.12124 ±0.00714 |
| SNR dB ↑ | 24.05 ±0.53 ★ | 19.68 ±0.39 | 20.24 ±0.52 | 14.48 ±0.36 | 19.66 ±0.35 | 21.98 ±0.46 |
| Cos-sim ↑ | 0.992165 ±0.001113 ★ | 0.984617 ±0.001780 | 0.984765 ±0.002368 | 0.957751 ±0.003461 | 0.981587 ±0.001878 | 0.985553 ±0.001704 |
----
Z-Image turbo, 64 samples, 0.5MP resolution, 8 steps:
| Metric | GGUF_Q8 | INT8 ConvRot | INT8 Row | MXFP8 |
| :--- | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.16740 ±0.00628 ★ | 0.19634 ±0.00660 | 0.35659 ±0.00968 | 0.30729 ±0.00645 |
| SNR dB ↑ | 16.42 ±0.29 ★ | 14.86 ±0.26 | 9.27 ±0.23 | 10.59 ±0.18 |
| Cos-sim ↑ | 0.978215 ±0.001696 ★ | 0.971225 ±0.001920 | 0.916394 ±0.004070 | 0.935860 ±0.002428 |
---
HiDream O1, 16 samples, 0.5MP resolution, 24 steps
FP8 Naive refers to using a BF16 checkpoint with the dtype set to FP8, which naively casts most weights to FP8.
| Metric | FP8Naive | [FP8 Scaled](https://huggingface.co/Comfy-Org/HiDream-O1-Image/blob/main/checkpoints/hidreamo1imagedevfp8scaled.safetensors) | INT8 ConvRot | INT8 Row | MXFP8 |
| :--- | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.23140 ±0.03353 | 0.08793 ±0.01196 | 0.06738 ±0.00849 ★ | 0.40533 ±0.03865 | 0.09269 ±0.00912 |
| SNR dB ↑ | 14.86 ±1.00 | 22.98 ±0.91 | 25.65 ±0.85 ★ | 8.77 ±0.76 | 22.65 ±0.79
GitHub
GitHub - BobJohnson24/ComfyUI-INT8-Fast: Custom node to load models in INT8 for 1.5~2X Speed gains on 30 series cards.
Custom node to load models in INT8 for 1.5~2X Speed gains on 30 series cards. - BobJohnson24/ComfyUI-INT8-Fast
|
| Cos-sim ↑ | 0.957479 ±0.013819 | 0.993943 ±0.001945 | 0.996338 ±0.001124 ★ | 0.901425 ±0.020387 | 0.993764 ±0.001271 |
---
Qwen Image 2512, 0.5MP, 16 Samples, 25 steps:
| Metric | FP8 | GGUF Q4 K M | GGUF Q8 | INT8 ConvRot | INT8 Row | Nunchaku BestQuality |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.22316 ±0.02186 | 0.25253 ±0.02143 | 0.13382 ±0.02853 ★ | 0.13795 ±0.02225 | 0.16354 ±0.02883 | 0.24947 ±0.02144 |
| SNR dB ↑ | 14.08 ±0.75 | 13.78 ±0.84 | 22.44 ±1.67 ★ | 20.34 ±1.31 | 18.70 ±1.27 | 13.54 ±0.72 |
| Cos-sim ↑ | 0.943337 ±0.010885 | 0.929011 ±0.010479 | 0.967114 ±0.011496 | 0.972459 ±0.007414 ★ | 0.957911 ±0.013642 | 0.927933 ±0.011458 |
---
Anima but on a 5060 to see if maybe MXFP8 is just doing worse when its not properly supported by the hardware:
16 Samples, 0.5MP Resolution, 24 steps
| Metric | INT8ConvRot | MXFP8 |
| :--- | ---: | ---: |
| Rel-RMSE ↓ | 0.08546 ±0.00846 ★ | 0.14716 ±0.01107 |
| SNR dB ↑ | 24.22 ±0.73 ★ | 18.90 ±0.58 |
| Cos-sim ↑ | 0.991708 ±0.001573 ★ | 0.979025 ±0.003469 |
---
If you are still hungry for more you can find the full comparisons in even higher detail on my github here.
You can also create your own quality comparison with this node.
#Speed:
I don't have as many numbers here. On a 3090, depending on the model, I've seen anywhere from a 1.5x-2x speed up vs bf16/GGUF. ConvRot adds a ~1.15x inference overhead, so you can decide on your own whether it makes sense to use for your purposes.
Most models on my available 8GB RTX5060 would be offloaded, so for now I'll go with anima for ease of use:
Anima, PyTorch 2.13.0.dev20260511+cu132, triton-windows, 1MP, Batch size 1, speed measured after 2 warmup rounds for fair testing:
| Format | Speed (it/s) ↑ | Relative Speedup |
|-------|--------------|--------------|
| bf16 | 0.78 | 1.00×
| INT8 ConvRot | 1.12 | 1.43×
| INT8 Row | 1.24 | 1.58×
| INT8 ConvRot Compile | 1.47 | 1.88×
| MXFP8 | 0.89 | 1.14×
| MXFP8 --fast | 0.93 | 1.19×
| MXFP8 --fast with torch compile | 1.37 | 1.75×
#Conclusion:
There is no need to look out of your window like this
https://preview.redd.it/jjh0b0lo4p0h1.jpg?width=400&format=pjpg&auto=webp&s=ce808b485717ae9efef17862da32f544ec9b791a
INT8 with ConvRot appears to be faster than MXFP8 while also being higher quality, and unlike MXFP8 it is supported on nearly every Nvidia GPU since 2019.
Caveats: RTX 20 series GPUs only have x4 INT8 flops compared to bf16, meaning you may see less of a gain there.
I hope this helped, bye.
https://redd.it/1tazxqz
@rStableDiffusion
| Cos-sim ↑ | 0.957479 ±0.013819 | 0.993943 ±0.001945 | 0.996338 ±0.001124 ★ | 0.901425 ±0.020387 | 0.993764 ±0.001271 |
---
Qwen Image 2512, 0.5MP, 16 Samples, 25 steps:
| Metric | FP8 | GGUF Q4 K M | GGUF Q8 | INT8 ConvRot | INT8 Row | Nunchaku BestQuality |
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
| Rel-RMSE ↓ | 0.22316 ±0.02186 | 0.25253 ±0.02143 | 0.13382 ±0.02853 ★ | 0.13795 ±0.02225 | 0.16354 ±0.02883 | 0.24947 ±0.02144 |
| SNR dB ↑ | 14.08 ±0.75 | 13.78 ±0.84 | 22.44 ±1.67 ★ | 20.34 ±1.31 | 18.70 ±1.27 | 13.54 ±0.72 |
| Cos-sim ↑ | 0.943337 ±0.010885 | 0.929011 ±0.010479 | 0.967114 ±0.011496 | 0.972459 ±0.007414 ★ | 0.957911 ±0.013642 | 0.927933 ±0.011458 |
---
Anima but on a 5060 to see if maybe MXFP8 is just doing worse when its not properly supported by the hardware:
16 Samples, 0.5MP Resolution, 24 steps
| Metric | INT8ConvRot | MXFP8 |
| :--- | ---: | ---: |
| Rel-RMSE ↓ | 0.08546 ±0.00846 ★ | 0.14716 ±0.01107 |
| SNR dB ↑ | 24.22 ±0.73 ★ | 18.90 ±0.58 |
| Cos-sim ↑ | 0.991708 ±0.001573 ★ | 0.979025 ±0.003469 |
---
If you are still hungry for more you can find the full comparisons in even higher detail on my github here.
You can also create your own quality comparison with this node.
#Speed:
I don't have as many numbers here. On a 3090, depending on the model, I've seen anywhere from a 1.5x-2x speed up vs bf16/GGUF. ConvRot adds a ~1.15x inference overhead, so you can decide on your own whether it makes sense to use for your purposes.
Most models on my available 8GB RTX5060 would be offloaded, so for now I'll go with anima for ease of use:
Anima, PyTorch 2.13.0.dev20260511+cu132, triton-windows, 1MP, Batch size 1, speed measured after 2 warmup rounds for fair testing:
| Format | Speed (it/s) ↑ | Relative Speedup |
|-------|--------------|--------------|
| bf16 | 0.78 | 1.00×
| INT8 ConvRot | 1.12 | 1.43×
| INT8 Row | 1.24 | 1.58×
| INT8 ConvRot Compile | 1.47 | 1.88×
| MXFP8 | 0.89 | 1.14×
| MXFP8 --fast | 0.93 | 1.19×
| MXFP8 --fast with torch compile | 1.37 | 1.75×
#Conclusion:
There is no need to look out of your window like this
https://preview.redd.it/jjh0b0lo4p0h1.jpg?width=400&format=pjpg&auto=webp&s=ce808b485717ae9efef17862da32f544ec9b791a
INT8 with ConvRot appears to be faster than MXFP8 while also being higher quality, and unlike MXFP8 it is supported on nearly every Nvidia GPU since 2019.
Caveats: RTX 20 series GPUs only have x4 INT8 flops compared to bf16, meaning you may see less of a gain there.
I hope this helped, bye.
https://redd.it/1tazxqz
@rStableDiffusion
Disponibilizei meu Workflow Chroma V48 DC (v48 Best Midjourney style model)
https://redd.it/1tb1uza
@rStableDiffusion
https://redd.it/1tb1uza
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Disponibilizei meu Workflow Chroma V48 DC (v48 Best Midjourney style model)
Explore this post and more from the StableDiffusion community
I combined FLUX Fill with ControlNet for structured inpainting
I've been experimenting with FLUX.1-Fill-dev lately and kept running into the same wall: the Fill model is great for mask-based edits, but there's no built-in way to feed it a ControlNet signal (depth, canny, pose, etc.) at the same time.
**The idea is simple:**
FLUX Fill handles the mask-based edit, while ControlNet guides the structure using inputs like **depth, canny, pose, tile, blur, gray, or low-quality conditioning**. This makes the inpainting more controlled, especially when you want the generated object or edit to follow a specific structure or composition.
Since **FLUX.1-Fill-dev was not originally trained jointly with ControlNet**, this is more of an experimental/community implementation. In practice, it works well for structured inpainting, but results depend a lot on the mask quality, control image alignment, and conditioning strength.
**Links**
* Personal Repo : [https://github.com/pratim4dasude/pipline\_flux\_fill\_controlnet\_Inpaint](https://github.com/pratim4dasude/pipline_flux_fill_controlnet_Inpaint)
* Pipeline file (Diffusers community): [https://github.com/huggingface/diffusers/blob/main/examples/community/pipline\_flux\_fill\_controlnet\_Inpaint.py](https://github.com/huggingface/diffusers/blob/main/examples/community/pipline_flux_fill_controlnet_Inpaint.py)
* Community Pipelines README (FLUX Fill ControlNet section): [https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline)
* FLUX Pipelines docs: [https://huggingface.co/docs/diffusers/api/pipelines/flux](https://huggingface.co/docs/diffusers/api/pipelines/flux)
* ControlNet in Diffusers docs: [https://huggingface.co/docs/diffusers/api/pipelines/controlnet\_flux](https://huggingface.co/docs/diffusers/api/pipelines/controlnet_flux)
**Code example**
import torch
from diffusers import FluxControlNetModel
from diffusers.utils import load_image
from pipline_flux_fill_controlnet_Inpaint import FluxControlNetFillInpaintPipeline
dtype = torch.bfloat16
device = "cuda"
controlnet = FluxControlNetModel.from_pretrained(
"Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0",
torch_dtype=dtype,
)
fill_pipe = FluxControlNetFillInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
controlnet=controlnet,
torch_dtype=dtype,
).to(device)
img = load_image("imgs/background.jpg")
mask = load_image("imgs/mask.png")
ctrl = load_image("imgs/dog_depth_2.png")
result = fill_pipe(
prompt="a dog on a bench",
image=img,
mask_image=mask,
control_image=ctrl,
control_mode=[2],
# canny=0, tile=1, depth=2, blur=3, pose=4
controlnet_conditioning_scale=0.9,
control_guidance_start=0.0,
control_guidance_end=0.8,
height=1024, width=1024,
strength=1.0,
guidance_scale=50.0,
num_inference_steps=60,
max_sequence_length=512,
)
result.images[0].save("output.jpg")
If you find this useful, a GitHub star ⭐ would really help support the project.
https://redd.it/1tb5v89
@rStableDiffusion
I've been experimenting with FLUX.1-Fill-dev lately and kept running into the same wall: the Fill model is great for mask-based edits, but there's no built-in way to feed it a ControlNet signal (depth, canny, pose, etc.) at the same time.
**The idea is simple:**
FLUX Fill handles the mask-based edit, while ControlNet guides the structure using inputs like **depth, canny, pose, tile, blur, gray, or low-quality conditioning**. This makes the inpainting more controlled, especially when you want the generated object or edit to follow a specific structure or composition.
Since **FLUX.1-Fill-dev was not originally trained jointly with ControlNet**, this is more of an experimental/community implementation. In practice, it works well for structured inpainting, but results depend a lot on the mask quality, control image alignment, and conditioning strength.
**Links**
* Personal Repo : [https://github.com/pratim4dasude/pipline\_flux\_fill\_controlnet\_Inpaint](https://github.com/pratim4dasude/pipline_flux_fill_controlnet_Inpaint)
* Pipeline file (Diffusers community): [https://github.com/huggingface/diffusers/blob/main/examples/community/pipline\_flux\_fill\_controlnet\_Inpaint.py](https://github.com/huggingface/diffusers/blob/main/examples/community/pipline_flux_fill_controlnet_Inpaint.py)
* Community Pipelines README (FLUX Fill ControlNet section): [https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline)
* FLUX Pipelines docs: [https://huggingface.co/docs/diffusers/api/pipelines/flux](https://huggingface.co/docs/diffusers/api/pipelines/flux)
* ControlNet in Diffusers docs: [https://huggingface.co/docs/diffusers/api/pipelines/controlnet\_flux](https://huggingface.co/docs/diffusers/api/pipelines/controlnet_flux)
**Code example**
import torch
from diffusers import FluxControlNetModel
from diffusers.utils import load_image
from pipline_flux_fill_controlnet_Inpaint import FluxControlNetFillInpaintPipeline
dtype = torch.bfloat16
device = "cuda"
controlnet = FluxControlNetModel.from_pretrained(
"Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0",
torch_dtype=dtype,
)
fill_pipe = FluxControlNetFillInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
controlnet=controlnet,
torch_dtype=dtype,
).to(device)
img = load_image("imgs/background.jpg")
mask = load_image("imgs/mask.png")
ctrl = load_image("imgs/dog_depth_2.png")
result = fill_pipe(
prompt="a dog on a bench",
image=img,
mask_image=mask,
control_image=ctrl,
control_mode=[2],
# canny=0, tile=1, depth=2, blur=3, pose=4
controlnet_conditioning_scale=0.9,
control_guidance_start=0.0,
control_guidance_end=0.8,
height=1024, width=1024,
strength=1.0,
guidance_scale=50.0,
num_inference_steps=60,
max_sequence_length=512,
)
result.images[0].save("output.jpg")
If you find this useful, a GitHub star ⭐ would really help support the project.
https://redd.it/1tb5v89
@rStableDiffusion
I shipped an offline SD app for Android. It's slow, your phone will get warm, and it's completely free.
https://redd.it/1tb7v3q
@rStableDiffusion
https://redd.it/1tb7v3q
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I shipped an offline SD app for Android. It's slow, your phone will get warm, and…
Explore this post and more from the StableDiffusion community