Bonsai Image 4B, a pair of low-bit diffusion transformer deployments built from FLUX.2 Klein 4B .
https://redd.it/1tspz3p
@rStableDiffusion
https://redd.it/1tspz3p
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Bonsai Image 4B, a pair of low-bit diffusion transformer deployments built from FLUX.2…
Explore this post and more from the StableDiffusion community
HiDream-O1-Image: C'mon, seriously?
I'm giving Hidream-O1 a shot, and I'm really confused by this model. With editing tasks, the official docs recommend using the non-dev, 50-step model. The thing is, I'm getting way worse results with the "full" model versus the dev model. If I use the dev model, image edits follow my reference images much closer. I'm running the model in BF16 on a 5090 using the official inference.py script. Check these out:
https://preview.redd.it/guz9jx2m9g4h1.png?width=3328&format=png&auto=webp&s=fe3357fff6404d7732638c50631e4faaa1534980
The image on the right (dev version) pretty much looks exactly like the reference face I gave the model. The one on the left isn't close, and the overall quality looks like it was shot on an awful cheap toy digital camera.
Same input image, same prompt, same seed. I'm shocked the Dev model actually follows the input image closer than the full model (nearly perfectly, actually).
If I'm doing something wrong, I'm dying to know.
https://redd.it/1tsrppa
@rStableDiffusion
I'm giving Hidream-O1 a shot, and I'm really confused by this model. With editing tasks, the official docs recommend using the non-dev, 50-step model. The thing is, I'm getting way worse results with the "full" model versus the dev model. If I use the dev model, image edits follow my reference images much closer. I'm running the model in BF16 on a 5090 using the official inference.py script. Check these out:
https://preview.redd.it/guz9jx2m9g4h1.png?width=3328&format=png&auto=webp&s=fe3357fff6404d7732638c50631e4faaa1534980
The image on the right (dev version) pretty much looks exactly like the reference face I gave the model. The one on the left isn't close, and the overall quality looks like it was shot on an awful cheap toy digital camera.
Same input image, same prompt, same seed. I'm shocked the Dev model actually follows the input image closer than the full model (nearly perfectly, actually).
If I'm doing something wrong, I'm dying to know.
https://redd.it/1tsrppa
@rStableDiffusion
Stable Diffusion model recommendations for faster and cleaner outputs in 2026?
I’ve been switching between a few models lately but I still can’t find something that feels both fast and consistently clean in results.
Some models look great but slow everything down, while others are fast but lose detail pretty quickly.
Even with similar settings, the output quality feels pretty inconsistent between different checkpoints.
What models are people actually using these days for a good balance?
https://redd.it/1tsuhtz
@rStableDiffusion
I’ve been switching between a few models lately but I still can’t find something that feels both fast and consistently clean in results.
Some models look great but slow everything down, while others are fast but lose detail pretty quickly.
Even with similar settings, the output quality feels pretty inconsistent between different checkpoints.
What models are people actually using these days for a good balance?
https://redd.it/1tsuhtz
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Renting a GPU for use with a service like a runpod has become prohibitively expensive. The last time I rented one was about 3 months ago. The price for a 4090 was $5 per day for 25. The hourly rate for a 5090 was higher than for an A100 about 3 months ago.
Even simpler GPUs like the 5070 Ti are absurdly expensive.
Previously, the rental cost was equivalent to the price of the GPU for 2 to 4 years.
Nowadays, it's equivalent to just a few months.
https://redd.it/1tsvt9z
@rStableDiffusion
Even simpler GPUs like the 5070 Ti are absurdly expensive.
Previously, the rental cost was equivalent to the price of the GPU for 2 to 4 years.
Nowadays, it's equivalent to just a few months.
https://redd.it/1tsvt9z
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Tried capturing that classic SF3 Bengus/Akiman/Ikeno art style in ComfyUI. Anyone else miss this vibe?
Lately, I’ve been feeling nostalgic looking through old artbooks. Am I the only one who feels like the 3D era of Street Fighter lost some of its soul to Westernized, hyper-polished realistic models?
As a visual dev artist, I see this all the time: if you don’t fight to keep the raw, stylized "energy" of the original line art throughout the production pipeline, standard 3D industrial polishing just kills the character's personality.
So, I optimized Klein in ComfyUI workflow using original Street Fighter III concept art to see if I could recapture that magic. Here are a few test renders with paintovers—let me know if it hits the spot!
Capcom, please—can we get a next-gen remake that actually preserves the original art style instead of just chasing realism?
https://preview.redd.it/l1mnklezmh4h1.png?width=3066&format=png&auto=webp&s=13ff3754338a77f67ffbe6c33a0b8b92d50af4ea
https://preview.redd.it/ybtoaiezmh4h1.png?width=3554&format=png&auto=webp&s=1359cc09c2aade796ad804656501e8e2fa99e2ce
https://preview.redd.it/rfw80hezmh4h1.png?width=3594&format=png&auto=webp&s=28afa47e03dd917a6bb27829cb1d9269836efc32
https://redd.it/1tsxswt
@rStableDiffusion
Lately, I’ve been feeling nostalgic looking through old artbooks. Am I the only one who feels like the 3D era of Street Fighter lost some of its soul to Westernized, hyper-polished realistic models?
As a visual dev artist, I see this all the time: if you don’t fight to keep the raw, stylized "energy" of the original line art throughout the production pipeline, standard 3D industrial polishing just kills the character's personality.
So, I optimized Klein in ComfyUI workflow using original Street Fighter III concept art to see if I could recapture that magic. Here are a few test renders with paintovers—let me know if it hits the spot!
Capcom, please—can we get a next-gen remake that actually preserves the original art style instead of just chasing realism?
https://preview.redd.it/l1mnklezmh4h1.png?width=3066&format=png&auto=webp&s=13ff3754338a77f67ffbe6c33a0b8b92d50af4ea
https://preview.redd.it/ybtoaiezmh4h1.png?width=3554&format=png&auto=webp&s=1359cc09c2aade796ad804656501e8e2fa99e2ce
https://preview.redd.it/rfw80hezmh4h1.png?width=3594&format=png&auto=webp&s=28afa47e03dd917a6bb27829cb1d9269836efc32
https://redd.it/1tsxswt
@rStableDiffusion
Does anyone else can't stand ComfyUI and prefers classic Automatic/Forge UI or it's just me?
I tried. I swear I tried.
Each time a new "ultra super mega" model appears I'm reinstalling that bloody UI just so I can test it, and each time some God Awful error list appears telling me that somehow the latest version it's missing something critical that needs to be downloaded ASAP. And in 3/4 of cases that's not the right download, it's incompatible or needs additional stuff to work.
Even when everything works out of the box I am looking at the final image and think to myself "I could have had 4 batches of 8 images to choose from instead of 1-4 like here".
I am not lazy, but I have a job and I don't have hours to spend to make sure each new model and each new workflow that I download is compatible with every single other version of all dependencies.
Does anyone else feels the same or I'm just the only one that keeps using Forge and Automatic?...
https://redd.it/1tt5uiq
@rStableDiffusion
I tried. I swear I tried.
Each time a new "ultra super mega" model appears I'm reinstalling that bloody UI just so I can test it, and each time some God Awful error list appears telling me that somehow the latest version it's missing something critical that needs to be downloaded ASAP. And in 3/4 of cases that's not the right download, it's incompatible or needs additional stuff to work.
Even when everything works out of the box I am looking at the final image and think to myself "I could have had 4 batches of 8 images to choose from instead of 1-4 like here".
I am not lazy, but I have a job and I don't have hours to spend to make sure each new model and each new workflow that I download is compatible with every single other version of all dependencies.
Does anyone else feels the same or I'm just the only one that keeps using Forge and Automatic?...
https://redd.it/1tt5uiq
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
ComfyUI_HYWorld2 update. Quality improvement + World Stereo Light models!
https://redd.it/1tt61vq
@rStableDiffusion
https://redd.it/1tt61vq
@rStableDiffusion
lora dataset images and captions
Okay. I hear a lot of do and don't's, but *gawd* damn, I need more.
Character lora. 25 images. All 1024x1024, all consistent, varying, ...in my mind complete for at least a *functional* if not *flexible* lora.
How the hell do I caption this to be easy model side? I dont want to have to fine tune knobs and prompt engineer like Gemini and other llms are doing to my captions.
I have a highly toxic and inflexible lora iteration right now, I'm not dumb enough to require crash coursing, but im stuck.
I know the *"transient state"* of the image as a whole, including viewpoint should be tagged, but how does one ensure accuracy for the training of the character?
TriggerWord, camera angle, objects/lighting/background to not triggerword bake, but what ELSE about the character needs to be captioned for flexibility in the character themselves? I know clothing and accessories so a bunch of crap doesnt get welded to the character, but hairstyles and expressions?
Those *make* the character, but doesn't tagging them.... remove them from the character? .....but then dont all expressions and hairstyles get averaged and welded together?
https://redd.it/1ttdbmo
@rStableDiffusion
Okay. I hear a lot of do and don't's, but *gawd* damn, I need more.
Character lora. 25 images. All 1024x1024, all consistent, varying, ...in my mind complete for at least a *functional* if not *flexible* lora.
How the hell do I caption this to be easy model side? I dont want to have to fine tune knobs and prompt engineer like Gemini and other llms are doing to my captions.
I have a highly toxic and inflexible lora iteration right now, I'm not dumb enough to require crash coursing, but im stuck.
I know the *"transient state"* of the image as a whole, including viewpoint should be tagged, but how does one ensure accuracy for the training of the character?
TriggerWord, camera angle, objects/lighting/background to not triggerword bake, but what ELSE about the character needs to be captioned for flexibility in the character themselves? I know clothing and accessories so a bunch of crap doesnt get welded to the character, but hairstyles and expressions?
Those *make* the character, but doesn't tagging them.... remove them from the character? .....but then dont all expressions and hairstyles get averaged and welded together?
https://redd.it/1ttdbmo
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Best anime model for multiple characters or Lora?
Hey everyone
I'm really struggling with multiple characters and their details. I can do a prompt that says 2girls but I sometimes get 3, etc. Or try specifying characters and their details and they'll be opposite Ori say I want small breasts and I get huge ones or something random. Or I get the perfect prompt and I regenerate it or read n later and it'll be f*cked. Is there something I can do/use in the prompt or a Lora I can use or something? I've tried pony, illumi, illustrious, noobai, wai-illustrious
Hope you can help
Regards
https://redd.it/1tt95u3
@rStableDiffusion
Hey everyone
I'm really struggling with multiple characters and their details. I can do a prompt that says 2girls but I sometimes get 3, etc. Or try specifying characters and their details and they'll be opposite Ori say I want small breasts and I get huge ones or something random. Or I get the perfect prompt and I regenerate it or read n later and it'll be f*cked. Is there something I can do/use in the prompt or a Lora I can use or something? I've tried pony, illumi, illustrious, noobai, wai-illustrious
Hope you can help
Regards
https://redd.it/1tt95u3
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
FLUX.2-klein-base-9B ControlLight LoRA Release for changing lighting of a photo
https://yfyang007.github.io/ControlLight/
https://redd.it/1ttfv5z
@rStableDiffusion
https://yfyang007.github.io/ControlLight/
https://redd.it/1ttfv5z
@rStableDiffusion
yfyang007.github.io
ControlLight
ControlLight: Towards Controllable, Consistent, and Generalizable Low-Light Enhancement