I combined FLUX Fill with ControlNet for structured inpainting
I've been experimenting with FLUX.1-Fill-dev lately and kept running into the same wall: the Fill model is great for mask-based edits, but there's no built-in way to feed it a ControlNet signal (depth, canny, pose, etc.) at the same time.
**The idea is simple:**
FLUX Fill handles the mask-based edit, while ControlNet guides the structure using inputs like **depth, canny, pose, tile, blur, gray, or low-quality conditioning**. This makes the inpainting more controlled, especially when you want the generated object or edit to follow a specific structure or composition.
Since **FLUX.1-Fill-dev was not originally trained jointly with ControlNet**, this is more of an experimental/community implementation. In practice, it works well for structured inpainting, but results depend a lot on the mask quality, control image alignment, and conditioning strength.
**Links**
* Personal Repo : [https://github.com/pratim4dasude/pipline\_flux\_fill\_controlnet\_Inpaint](https://github.com/pratim4dasude/pipline_flux_fill_controlnet_Inpaint)
* Pipeline file (Diffusers community): [https://github.com/huggingface/diffusers/blob/main/examples/community/pipline\_flux\_fill\_controlnet\_Inpaint.py](https://github.com/huggingface/diffusers/blob/main/examples/community/pipline_flux_fill_controlnet_Inpaint.py)
* Community Pipelines README (FLUX Fill ControlNet section): [https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline)
* FLUX Pipelines docs: [https://huggingface.co/docs/diffusers/api/pipelines/flux](https://huggingface.co/docs/diffusers/api/pipelines/flux)
* ControlNet in Diffusers docs: [https://huggingface.co/docs/diffusers/api/pipelines/controlnet\_flux](https://huggingface.co/docs/diffusers/api/pipelines/controlnet_flux)
**Code example**
import torch
from diffusers import FluxControlNetModel
from diffusers.utils import load_image
from pipline_flux_fill_controlnet_Inpaint import FluxControlNetFillInpaintPipeline
dtype = torch.bfloat16
device = "cuda"
controlnet = FluxControlNetModel.from_pretrained(
"Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0",
torch_dtype=dtype,
)
fill_pipe = FluxControlNetFillInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
controlnet=controlnet,
torch_dtype=dtype,
).to(device)
img = load_image("imgs/background.jpg")
mask = load_image("imgs/mask.png")
ctrl = load_image("imgs/dog_depth_2.png")
result = fill_pipe(
prompt="a dog on a bench",
image=img,
mask_image=mask,
control_image=ctrl,
control_mode=[2],
# canny=0, tile=1, depth=2, blur=3, pose=4
controlnet_conditioning_scale=0.9,
control_guidance_start=0.0,
control_guidance_end=0.8,
height=1024, width=1024,
strength=1.0,
guidance_scale=50.0,
num_inference_steps=60,
max_sequence_length=512,
)
result.images[0].save("output.jpg")
If you find this useful, a GitHub star ⭐ would really help support the project.
https://redd.it/1tb5v89
@rStableDiffusion
I've been experimenting with FLUX.1-Fill-dev lately and kept running into the same wall: the Fill model is great for mask-based edits, but there's no built-in way to feed it a ControlNet signal (depth, canny, pose, etc.) at the same time.
**The idea is simple:**
FLUX Fill handles the mask-based edit, while ControlNet guides the structure using inputs like **depth, canny, pose, tile, blur, gray, or low-quality conditioning**. This makes the inpainting more controlled, especially when you want the generated object or edit to follow a specific structure or composition.
Since **FLUX.1-Fill-dev was not originally trained jointly with ControlNet**, this is more of an experimental/community implementation. In practice, it works well for structured inpainting, but results depend a lot on the mask quality, control image alignment, and conditioning strength.
**Links**
* Personal Repo : [https://github.com/pratim4dasude/pipline\_flux\_fill\_controlnet\_Inpaint](https://github.com/pratim4dasude/pipline_flux_fill_controlnet_Inpaint)
* Pipeline file (Diffusers community): [https://github.com/huggingface/diffusers/blob/main/examples/community/pipline\_flux\_fill\_controlnet\_Inpaint.py](https://github.com/huggingface/diffusers/blob/main/examples/community/pipline_flux_fill_controlnet_Inpaint.py)
* Community Pipelines README (FLUX Fill ControlNet section): [https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline](https://github.com/huggingface/diffusers/tree/main/examples/community#flux-fill-controlnet-pipeline)
* FLUX Pipelines docs: [https://huggingface.co/docs/diffusers/api/pipelines/flux](https://huggingface.co/docs/diffusers/api/pipelines/flux)
* ControlNet in Diffusers docs: [https://huggingface.co/docs/diffusers/api/pipelines/controlnet\_flux](https://huggingface.co/docs/diffusers/api/pipelines/controlnet_flux)
**Code example**
import torch
from diffusers import FluxControlNetModel
from diffusers.utils import load_image
from pipline_flux_fill_controlnet_Inpaint import FluxControlNetFillInpaintPipeline
dtype = torch.bfloat16
device = "cuda"
controlnet = FluxControlNetModel.from_pretrained(
"Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0",
torch_dtype=dtype,
)
fill_pipe = FluxControlNetFillInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
controlnet=controlnet,
torch_dtype=dtype,
).to(device)
img = load_image("imgs/background.jpg")
mask = load_image("imgs/mask.png")
ctrl = load_image("imgs/dog_depth_2.png")
result = fill_pipe(
prompt="a dog on a bench",
image=img,
mask_image=mask,
control_image=ctrl,
control_mode=[2],
# canny=0, tile=1, depth=2, blur=3, pose=4
controlnet_conditioning_scale=0.9,
control_guidance_start=0.0,
control_guidance_end=0.8,
height=1024, width=1024,
strength=1.0,
guidance_scale=50.0,
num_inference_steps=60,
max_sequence_length=512,
)
result.images[0].save("output.jpg")
If you find this useful, a GitHub star ⭐ would really help support the project.
https://redd.it/1tb5v89
@rStableDiffusion
I shipped an offline SD app for Android. It's slow, your phone will get warm, and it's completely free.
https://redd.it/1tb7v3q
@rStableDiffusion
https://redd.it/1tb7v3q
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I shipped an offline SD app for Android. It's slow, your phone will get warm, and…
Explore this post and more from the StableDiffusion community
Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models
https://arxiv.org/abs/2605.08115
https://redd.it/1tbaspl
@rStableDiffusion
https://arxiv.org/abs/2605.08115
https://redd.it/1tbaspl
@rStableDiffusion
arXiv.org
Alice v1: Distillation-Enhanced Video Generation Surpassing...
Wepresent Alice v1, a 14-billion parameter open-source video generation model that achieves state-of-the-art quality through consistency distillation with score regularization (rCM). Contrary to...
Ostris/AI-Toolkit Supports HiDream O1 Training
\- Ostris github repo
\- HiDream-O1-Image repo
According to Ostris, on X/Twitter, disable caching text embeddings: "There are not text embeddings. Tokens go directly in." He has some other comments/replies on his Twitter that might be useful, but no magic bullet fix.
\- ComfyUI versions of checkpoints.
\- Test ComfyUI workflow can be found here. Still no official workflow in templates at the time of this post.
https://redd.it/1tbby44
@rStableDiffusion
\- Ostris github repo
\- HiDream-O1-Image repo
According to Ostris, on X/Twitter, disable caching text embeddings: "There are not text embeddings. Tokens go directly in." He has some other comments/replies on his Twitter that might be useful, but no magic bullet fix.
\- ComfyUI versions of checkpoints.
\- Test ComfyUI workflow can be found here. Still no official workflow in templates at the time of this post.
https://redd.it/1tbby44
@rStableDiffusion
GitHub
GitHub - ostris/ai-toolkit: The ultimate training toolkit for finetuning diffusion models
The ultimate training toolkit for finetuning diffusion models - ostris/ai-toolkit
Chroma1-HD Character Transfer with Flux.2 Dev
Chroma1-HD with Flux.2 Dev character transfer
This workflow gives multi-modal capabilities to open-source image models. In particular, this workflow combines a text-to-image workflow (Comfy's official Chroma1-HD workflow) and an image-to-image workflow (Comfy's official Flux.2 Dev workflow).
Link to workflow: https://huggingface.co/ussaaron/workflows/blob/main/chroma\_flux\_character\_transfer.json
This workflow is the final result of a ton of experimentation to solve one problem: Using an image reference for a consistent character kneecaps the creativity of an image model. For example, if I want to create a cool cinematic shot with a specific style, including an image reference will reduce the image model's style output into a pretty narrow lane. Generally, the final image will share most of the stylistic elements present in the character image and that's not ideal.
I selected the models for this workflow, because after a ton of testing, I determined that they are the best for each modality. I concluded that Chroma1-HD is the best open source model for style flexibility and professional photography. I concluded that Flux.2 Dev is the best open source model for facial fidelity and character consistency.
However, just combining these two models is not enough to produce a consistent character transfer solution. I also structured the prompts for both sides of the workflow in a specific way to ensure cohesion from end-to-end. The full prompts are included in the workflow for you to check out.
And here's how it went.
This is my character reference for Crystal Sparkle - a Sora character. I made a 1980's style model composite of her with an 80's hairstyle (make sure your character has a hairstyle consistent with the era in your Chroma image).
Model composite for Crystal Sparkle
This is the output of the Chroma prompt for a blonde woman wandering through a post-apocalyptic New York City inspired by 1980s grindhouse and sci-fi b-movies.
Choma1-HD Text-to-image output
This is the Flux.2 Dev output after completing the character transfer for Crystal Sparkle.
Flux.2 Dev Image-to-image output
The final result is exactly what I wanted. The Chroma1-HD style, grain, grunge elements were retained and Crystal was cleanly added into the shot. This example is just one of thousands of possibilities that are now available with Chroma1-HD.
Note: The settings in this workflow are tuned more for people that want professional photography output. All the settings can be dialed back as needed. Also, there are a few optional LoRAs that can be removed as needed.
Let me know if you have any questions. Cheers!
https://redd.it/1tbdj5o
@rStableDiffusion
Chroma1-HD with Flux.2 Dev character transfer
This workflow gives multi-modal capabilities to open-source image models. In particular, this workflow combines a text-to-image workflow (Comfy's official Chroma1-HD workflow) and an image-to-image workflow (Comfy's official Flux.2 Dev workflow).
Link to workflow: https://huggingface.co/ussaaron/workflows/blob/main/chroma\_flux\_character\_transfer.json
This workflow is the final result of a ton of experimentation to solve one problem: Using an image reference for a consistent character kneecaps the creativity of an image model. For example, if I want to create a cool cinematic shot with a specific style, including an image reference will reduce the image model's style output into a pretty narrow lane. Generally, the final image will share most of the stylistic elements present in the character image and that's not ideal.
I selected the models for this workflow, because after a ton of testing, I determined that they are the best for each modality. I concluded that Chroma1-HD is the best open source model for style flexibility and professional photography. I concluded that Flux.2 Dev is the best open source model for facial fidelity and character consistency.
However, just combining these two models is not enough to produce a consistent character transfer solution. I also structured the prompts for both sides of the workflow in a specific way to ensure cohesion from end-to-end. The full prompts are included in the workflow for you to check out.
And here's how it went.
This is my character reference for Crystal Sparkle - a Sora character. I made a 1980's style model composite of her with an 80's hairstyle (make sure your character has a hairstyle consistent with the era in your Chroma image).
Model composite for Crystal Sparkle
This is the output of the Chroma prompt for a blonde woman wandering through a post-apocalyptic New York City inspired by 1980s grindhouse and sci-fi b-movies.
Choma1-HD Text-to-image output
This is the Flux.2 Dev output after completing the character transfer for Crystal Sparkle.
Flux.2 Dev Image-to-image output
The final result is exactly what I wanted. The Chroma1-HD style, grain, grunge elements were retained and Crystal was cleanly added into the shot. This example is just one of thousands of possibilities that are now available with Chroma1-HD.
Note: The settings in this workflow are tuned more for people that want professional photography output. All the settings can be dialed back as needed. Also, there are a few optional LoRAs that can be removed as needed.
Let me know if you have any questions. Cheers!
https://redd.it/1tbdj5o
@rStableDiffusion
"Masked Generative Transformer Is What You Need for Image Editing"
https://github.com/weichow23/EditMGT
https://redd.it/1tbahnp
@rStableDiffusion
https://github.com/weichow23/EditMGT
https://redd.it/1tbahnp
@rStableDiffusion
GitHub
GitHub - weichow23/EditMGT: Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image…
Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing> - weichow23/EditMGT
Anima Question
Loving the Anima model with various lora's etc, but sometimes running it without LORA's produces some interesting styles.
Is there any way to extract the style when it's from the models "brain"? or do I just post it and hope someone knows?
Cheers.
https://redd.it/1tbhzl5
@rStableDiffusion
Loving the Anima model with various lora's etc, but sometimes running it without LORA's produces some interesting styles.
Is there any way to extract the style when it's from the models "brain"? or do I just post it and hope someone knows?
Cheers.
https://redd.it/1tbhzl5
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community