A few tries with HiDream O1
Hi,
I've been playing with O1 since yesterday. While I can't say I have enough data to make a definitive decision on whether I'll have use for this models, I wanted to share a few generations and observations.
1: The square marks: quite often and commonly enough that it's jarring, the generated image has a small square pattern, sometimes all over the image, sometimes in some part of it. It requires some cherry picking to discard those, but I suspect it might be the settings that might not be optimal. Also, sometimes, rarely, it just produce a fried image or useless pattern, but that's quite rare. I am blaming my settings, config and lack of ComfyUI node at this point.
2: The model has, like most recent models, low variations based on seed when using a vague prompt.
[A French woman gives this. One needs to be more descriptive. ](https://preview.redd.it/ekddb6diqb0h1.png?width=1024&format=png&auto=webp&s=e1d0d1e40b3c1ebad00eb0b3f5737ced01e9f890)
[A café. It's apparently a place where clean-shaven men are not allowed.](https://preview.redd.it/0b53mpx7sb0h1.png?width=1024&format=png&auto=webp&s=412699058f8aef2eed01ca88d443add5fcee74e3)
3: It has very good editing capabilities at first glance. But I didn't test them enough for a definitive opinion.
4. It is twice as fast as Qwen2512 on my 4090, generating an image at 1,25s/it. The recommanded settings are 50 steps, but so are other models where we found that 20-25 steps are more than enough.
5. It is very good with prompt following, especially complex images.
I tried to replicate the results in this thread: [https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest\_create\_an\_image\_using\_an\_openweight\_model/](https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest_create_an_image_using_an_openweight_model/) (Qwen2512 and ZIT are displayed) with the following prompt:
*A wizard with sharp, angular, chiseled facial features sits on an ornate curule chair inside a dim canvas tent. The wizard wears a long dark robe covered with glowing arcane runes and thin metallic embroidery. A wide hood rests on the wizard’s shoulders, showing short, messy white hair. A metal staff leans against the curved leg of the chair. Warm lantern light hangs from a wooden pole and casts deep golden reflections across the tent fabric, creating stretched shadows behind every figure.*
*On the left and right of the wizard stand two human guards dressed in light leather armor reinforced with metal rivets. The male guard has short brown hair, a trimmed beard, and holds a long spear pointed toward the ground. The female guard has a tight braid, leather shoulder plates, and a round small shield strapped to her back. Both guards keep their eyes fixed on the kneeling warrior, their bodies tense, with their spears angled slightly forward. Behind them, the tent wall shows hanging banners with faded heraldic symbols.*
*In front of the wizard, facing him, a wounded warrior kneels on a carpet of red and brown woven patterns. His wrists are bound with heavy iron chains, and his head is lowered. His steel breastplate is cracked, and dust covers his leather boots. A deep cut marks his cheek, and dried blood darkens the edges of his leather gloves. The warrior’s long sword lies on the ground near him, out of reach, its blade reflecting a faint light from the lantern.*
*Behind the kneeling warrior, two green-skinned orcs in dark leather armor grip the chains. Each orc has wide shoulders, muscular arms, and visible tusks curving upward. One orc wears a metal pauldron on a single shoulder, while the other has tribal tattoos on his arms. Their eyes glow under the lantern light, and both keep a firm hold on the chains, pulling them tight. Their boots press heavily into the dusty ground.*
*In the back of the tent, a robed assistant with a simple belt pouch stretches out a leather coin purse toward the orcs. The assistant’s hood hides most of the face, revealing only a thin mouth and a single lock of dark hair. One hand holds the pouch, the other clutches a
Hi,
I've been playing with O1 since yesterday. While I can't say I have enough data to make a definitive decision on whether I'll have use for this models, I wanted to share a few generations and observations.
1: The square marks: quite often and commonly enough that it's jarring, the generated image has a small square pattern, sometimes all over the image, sometimes in some part of it. It requires some cherry picking to discard those, but I suspect it might be the settings that might not be optimal. Also, sometimes, rarely, it just produce a fried image or useless pattern, but that's quite rare. I am blaming my settings, config and lack of ComfyUI node at this point.
2: The model has, like most recent models, low variations based on seed when using a vague prompt.
[A French woman gives this. One needs to be more descriptive. ](https://preview.redd.it/ekddb6diqb0h1.png?width=1024&format=png&auto=webp&s=e1d0d1e40b3c1ebad00eb0b3f5737ced01e9f890)
[A café. It's apparently a place where clean-shaven men are not allowed.](https://preview.redd.it/0b53mpx7sb0h1.png?width=1024&format=png&auto=webp&s=412699058f8aef2eed01ca88d443add5fcee74e3)
3: It has very good editing capabilities at first glance. But I didn't test them enough for a definitive opinion.
4. It is twice as fast as Qwen2512 on my 4090, generating an image at 1,25s/it. The recommanded settings are 50 steps, but so are other models where we found that 20-25 steps are more than enough.
5. It is very good with prompt following, especially complex images.
I tried to replicate the results in this thread: [https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest\_create\_an\_image\_using\_an\_openweight\_model/](https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest_create_an_image_using_an_openweight_model/) (Qwen2512 and ZIT are displayed) with the following prompt:
*A wizard with sharp, angular, chiseled facial features sits on an ornate curule chair inside a dim canvas tent. The wizard wears a long dark robe covered with glowing arcane runes and thin metallic embroidery. A wide hood rests on the wizard’s shoulders, showing short, messy white hair. A metal staff leans against the curved leg of the chair. Warm lantern light hangs from a wooden pole and casts deep golden reflections across the tent fabric, creating stretched shadows behind every figure.*
*On the left and right of the wizard stand two human guards dressed in light leather armor reinforced with metal rivets. The male guard has short brown hair, a trimmed beard, and holds a long spear pointed toward the ground. The female guard has a tight braid, leather shoulder plates, and a round small shield strapped to her back. Both guards keep their eyes fixed on the kneeling warrior, their bodies tense, with their spears angled slightly forward. Behind them, the tent wall shows hanging banners with faded heraldic symbols.*
*In front of the wizard, facing him, a wounded warrior kneels on a carpet of red and brown woven patterns. His wrists are bound with heavy iron chains, and his head is lowered. His steel breastplate is cracked, and dust covers his leather boots. A deep cut marks his cheek, and dried blood darkens the edges of his leather gloves. The warrior’s long sword lies on the ground near him, out of reach, its blade reflecting a faint light from the lantern.*
*Behind the kneeling warrior, two green-skinned orcs in dark leather armor grip the chains. Each orc has wide shoulders, muscular arms, and visible tusks curving upward. One orc wears a metal pauldron on a single shoulder, while the other has tribal tattoos on his arms. Their eyes glow under the lantern light, and both keep a firm hold on the chains, pulling them tight. Their boots press heavily into the dusty ground.*
*In the back of the tent, a robed assistant with a simple belt pouch stretches out a leather coin purse toward the orcs. The assistant’s hood hides most of the face, revealing only a thin mouth and a single lock of dark hair. One hand holds the pouch, the other clutches a
rolled parchment. A wooden table stands beside the assistant, covered with scrolls, a silver inkpot, and unlit candles. On the ground near the table lie scattered parchment sheets, a metal goblet, and a small open chest filled with coins.*
*The atmosphere is heavy and tense, with dense shadows filling the upper corners of the tent. A subtle cloud of dust floats in the lantern light. The canvas walls show faint marks of wind and sand. Outside the tent entrance, only darkness and a tiny trace of moonlight are visible, creating a dramatic contrast with the warm light inside.*
[The female guard's spear needs editing but for a one-shot it beats the competition. ](https://preview.redd.it/zm3i8j1cub0h1.png?width=2048&format=png&auto=webp&s=fe7ce3fc0aeca94788148711a263659a04abf2e2)
With this prompt:
*A spellcaster unleashes an acid splash spell in a muddy village path. The caster, cloaked and focused, extends one hand forward as two glowing green orbs arc through the air, mid-flight. Nearby, two startled peasants standing side by side have been splashed by acid. Their faces are contorted with pain, their flesh begins to sizzle and bubble, steam rising as holes eat through their rough tunics. A third peasant, reduced to skeleton, rests on its knees between them in a pool of acid.*
[The photographic version](https://preview.redd.it/v4md67fjwb0h1.png?width=2048&format=png&auto=webp&s=025a225f1ddb6618e27a4c5a3660b491d3cb6a1d)
[The carton version.](https://preview.redd.it/3wkuls5dwb0h1.png?width=2048&format=png&auto=webp&s=5688bea08279cd5690f0e7ea58550ad80dab4015)
Not perfect, but great prompt adherence.
6. It can be closer than NB in some case, maybe explaining its high initial rating:
https://preview.redd.it/671wibljxb0h1.png?width=2048&format=png&auto=webp&s=93d6a7144f71788b8b1136b90b48b9f504763a3a
Compare to other models, proprietary and free here:
[https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison\_of\_models/](https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison_of_models/)
Another sample:
[Nanobanana's.](https://preview.redd.it/0szwchw1yb0h1.png?width=1408&format=png&auto=webp&s=b44e98eba05338c4dba4de72bae62d40e500ed03)
[O1's.](https://preview.redd.it/ypskdi4byb0h1.png?width=2048&format=png&auto=webp&s=639f0b23c7f9e7e8071bbe9fb93898effc20db86)
Or the flying citadel and portal samples:
Other models here: [https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen\_and\_zimageturbo\_zit\_prompt\_adherence\_contest/](https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen_and_zimageturbo_zit_prompt_adherence_contest/)
https://preview.redd.it/yb22farjyb0h1.png?width=2048&format=png&auto=webp&s=4eaac3cb4b41a5054d91b630cd77b5a39f76cb16
https://preview.redd.it/nht918wkyb0h1.png?width=2048&format=png&auto=webp&s=ea5b0c23ff9f68826a34d1b31971de1788f4eed6
7. Or for the fallling girl:
https://preview.redd.it/q0g68o2zyb0h1.png?width=2048&format=png&auto=webp&s=9558c3070afb37112bfae78fa9b5a26449ef742f
*A young girl tumble from a jagged hole in the ceiling, her small body suspended mid-fall, arms flailing while her long chestnut hair streams upward as though caught in a sudden updraft. She wears a pale cotton dress, simple and slightly wrinkled, the hemp fluttering wildly around her knees as she plunges. Her face is a portrait of surprise and fear, wide hazel eyes staring into the unknown lips, her parted as if mid-gasp. Beside her, a sleek black cat twists and arches, claws extended as although searching for purpose, its green eyes glinting in the half-light. Both are frozen in that fragile instant of descent, their outlines illuminated by the stark contrast of plaster dust and neon glow. They fall into an opulent living room, decorated with refined taste and warm ambient lighting. The girl’s pale dress and scuffed leather shoes seem out of place against the grandeur of velvet upholstery and polished marble surfaces. A velvet sofa in deep burgundy anchors the space, surrounded by glass tables that catch the golden shimmer of a sculptural chandelier
*The atmosphere is heavy and tense, with dense shadows filling the upper corners of the tent. A subtle cloud of dust floats in the lantern light. The canvas walls show faint marks of wind and sand. Outside the tent entrance, only darkness and a tiny trace of moonlight are visible, creating a dramatic contrast with the warm light inside.*
[The female guard's spear needs editing but for a one-shot it beats the competition. ](https://preview.redd.it/zm3i8j1cub0h1.png?width=2048&format=png&auto=webp&s=fe7ce3fc0aeca94788148711a263659a04abf2e2)
With this prompt:
*A spellcaster unleashes an acid splash spell in a muddy village path. The caster, cloaked and focused, extends one hand forward as two glowing green orbs arc through the air, mid-flight. Nearby, two startled peasants standing side by side have been splashed by acid. Their faces are contorted with pain, their flesh begins to sizzle and bubble, steam rising as holes eat through their rough tunics. A third peasant, reduced to skeleton, rests on its knees between them in a pool of acid.*
[The photographic version](https://preview.redd.it/v4md67fjwb0h1.png?width=2048&format=png&auto=webp&s=025a225f1ddb6618e27a4c5a3660b491d3cb6a1d)
[The carton version.](https://preview.redd.it/3wkuls5dwb0h1.png?width=2048&format=png&auto=webp&s=5688bea08279cd5690f0e7ea58550ad80dab4015)
Not perfect, but great prompt adherence.
6. It can be closer than NB in some case, maybe explaining its high initial rating:
https://preview.redd.it/671wibljxb0h1.png?width=2048&format=png&auto=webp&s=93d6a7144f71788b8b1136b90b48b9f504763a3a
Compare to other models, proprietary and free here:
[https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison\_of\_models/](https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison_of_models/)
Another sample:
[Nanobanana's.](https://preview.redd.it/0szwchw1yb0h1.png?width=1408&format=png&auto=webp&s=b44e98eba05338c4dba4de72bae62d40e500ed03)
[O1's.](https://preview.redd.it/ypskdi4byb0h1.png?width=2048&format=png&auto=webp&s=639f0b23c7f9e7e8071bbe9fb93898effc20db86)
Or the flying citadel and portal samples:
Other models here: [https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen\_and\_zimageturbo\_zit\_prompt\_adherence\_contest/](https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen_and_zimageturbo_zit_prompt_adherence_contest/)
https://preview.redd.it/yb22farjyb0h1.png?width=2048&format=png&auto=webp&s=4eaac3cb4b41a5054d91b630cd77b5a39f76cb16
https://preview.redd.it/nht918wkyb0h1.png?width=2048&format=png&auto=webp&s=ea5b0c23ff9f68826a34d1b31971de1788f4eed6
7. Or for the fallling girl:
https://preview.redd.it/q0g68o2zyb0h1.png?width=2048&format=png&auto=webp&s=9558c3070afb37112bfae78fa9b5a26449ef742f
*A young girl tumble from a jagged hole in the ceiling, her small body suspended mid-fall, arms flailing while her long chestnut hair streams upward as though caught in a sudden updraft. She wears a pale cotton dress, simple and slightly wrinkled, the hemp fluttering wildly around her knees as she plunges. Her face is a portrait of surprise and fear, wide hazel eyes staring into the unknown lips, her parted as if mid-gasp. Beside her, a sleek black cat twists and arches, claws extended as although searching for purpose, its green eyes glinting in the half-light. Both are frozen in that fragile instant of descent, their outlines illuminated by the stark contrast of plaster dust and neon glow. They fall into an opulent living room, decorated with refined taste and warm ambient lighting. The girl’s pale dress and scuffed leather shoes seem out of place against the grandeur of velvet upholstery and polished marble surfaces. A velvet sofa in deep burgundy anchors the space, surrounded by glass tables that catch the golden shimmer of a sculptural chandelier
overhead. Cushions scatter as if startled by the intrusion, while the cat’s trajectory points it straight toward the rug below. The girl, however, appears weightless and delicate, as though she might have the echo against such refinement. The room opens towards a vast corner window that stretches from floor to ceiling, to reveal the glowing skyline of a modern metropolis. Skyscrapers stand like gleaming monoliths, their facades awash in neon pinks, silvers, and electric blues. Hovering vehicles trace faint lines of light across the night sky. Against this futuristic backdrop, the girl’s old-fashioned dress and bare scraped knees give her an anachronistic, almost storybook presence, like a character who has stumbled from another time into this sleek, unyielding world. Details heighten the dreamlike tension: fragments of plaster hover like a cloud around her slender form, dust motes glowing in the chandelier's warmth; a Persian rug, richly patterned in crimson and gold, directly below her trajectory, as if to cushion or entrap her fall. A half-open book rests on a nearby table, its pages ruffled by the movement of air, as though the apartment itself is holding its breath. The girl's hair and dress ripple in the invisible currents, her face caught between terror and wonder, as if uncertain whether she has stepped into a nightmare or a fantastical new beginning.*
Since it made it out of proportion with the rest of the image, like many models I tried with this prompt, I used the edit function to make her smaller:
https://preview.redd.it/dqlovgs6zb0h1.png?width=2048&format=png&auto=webp&s=f323cd057d50c20909e853f56a20dd8ca02fe613
8. It doesn't seem to be trained on enough anatomy. A prompt with a man sitting while holding one of his feet with both hands over his knee leads to very bad results while SOTA models usually pass this test easily. It might benefit from finetuning, with 8B parameters.
All in all, it seems to be interesting for a lower-paramater model. HiDream claims to have built a pro model with 200B parameters, it will be interesting to see how it compare, both with the open-weight one and the proprietary SOTA models, so we can gauge whether increasing the number of parameters is really the only way forward (which might be disheartening as long as we only get 24-32 GB VRAM cards on personal computers).
https://redd.it/1t9akyg
@rStableDiffusion
Since it made it out of proportion with the rest of the image, like many models I tried with this prompt, I used the edit function to make her smaller:
https://preview.redd.it/dqlovgs6zb0h1.png?width=2048&format=png&auto=webp&s=f323cd057d50c20909e853f56a20dd8ca02fe613
8. It doesn't seem to be trained on enough anatomy. A prompt with a man sitting while holding one of his feet with both hands over his knee leads to very bad results while SOTA models usually pass this test easily. It might benefit from finetuning, with 8B parameters.
All in all, it seems to be interesting for a lower-paramater model. HiDream claims to have built a pro model with 200B parameters, it will be interesting to see how it compare, both with the open-weight one and the proprietary SOTA models, so we can gauge whether increasing the number of parameters is really the only way forward (which might be disheartening as long as we only get 24-32 GB VRAM cards on personal computers).
https://redd.it/1t9akyg
@rStableDiffusion
Artificial Analysis needs to address HiDream-01 Benchmarks
I'm struggling to understand how an utterly deficient model like HiDream-01 could have performed so well on user preference benchmarks. I don't want to jump to conclusions or speculate baselessly on how they did it, but it absolutely warrants an investigation if people are expected to take this benchmark seriously in the future. I just want an explanation for how something like this happens and, if it was illegitimate, how they will prevent it in the future.
https://redd.it/1t9eifa
@rStableDiffusion
I'm struggling to understand how an utterly deficient model like HiDream-01 could have performed so well on user preference benchmarks. I don't want to jump to conclusions or speculate baselessly on how they did it, but it absolutely warrants an investigation if people are expected to take this benchmark seriously in the future. I just want an explanation for how something like this happens and, if it was illegitimate, how they will prevent it in the future.
https://redd.it/1t9eifa
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I made some Slider Loras for Ace-Step 1.5 if anyone is interested
https://huggingface.co/Xanthius/Ace-Step-1.5-XL-Concept-Sliders/tree/main
Unfortunately AI Toolkit doesn't have native support for Slider Loras for Ace-step 1.5 but I was able to edit the code enough to get it working properly and now I can train concept sliders in about 10 mins to an hour each and without needing specific datasets for the concepts. Since nobody else has a working way to get sliders trained up themselves, I decided to put together a collection of them for people to use if they want to.
My first sliders on there are:
\- male to female voice
\- studio production to lofi
\- Bass boost
\- Choir to solo vocalist
\- digital to acoustic sound
\- Aggressive to gentle
\- drum intensity
\- energetic to calm
\- happiness
\- soft to projected voice
\- talking to singing
\- tempo
\- danceability
But I intend to add some more if people have ideas for them
https://redd.it/1t9e5cj
@rStableDiffusion
https://huggingface.co/Xanthius/Ace-Step-1.5-XL-Concept-Sliders/tree/main
Unfortunately AI Toolkit doesn't have native support for Slider Loras for Ace-step 1.5 but I was able to edit the code enough to get it working properly and now I can train concept sliders in about 10 mins to an hour each and without needing specific datasets for the concepts. Since nobody else has a working way to get sliders trained up themselves, I decided to put together a collection of them for people to use if they want to.
My first sliders on there are:
\- male to female voice
\- studio production to lofi
\- Bass boost
\- Choir to solo vocalist
\- digital to acoustic sound
\- Aggressive to gentle
\- drum intensity
\- energetic to calm
\- happiness
\- soft to projected voice
\- talking to singing
\- tempo
\- danceability
But I intend to add some more if people have ideas for them
https://redd.it/1t9e5cj
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I made some Slider Loras for Ace-Step 1.5 if anyone is interested
Explore this post and more from the StableDiffusion community
OSTRIS about HiDream-O1 LoRA on ToolKit
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry changing innovation!
https://x.com/ostrisai/status/2053256188142428341
https://redd.it/1t9h7ps
@rStableDiffusion
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry changing innovation!
https://x.com/ostrisai/status/2053256188142428341
https://redd.it/1t9h7ps
@rStableDiffusion
X (formerly Twitter)
Ostris (@ostrisai) on X
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry…
Why is realistic skin such an issue for models?
The internet is full of normal, candid photos of people with natural skin texture. Theres a subset of heavily retouched editorial or beauty photography with that smooth porcelain skin look, but that’s clearly a minority of all human images online. Most photos of people are just regular snapshots where skin looks like actual skin.
So why do image models, especially open source ones, struggle so much to generate realistic looking people out of the box? Why do they default to this plasticky, airbrushed, over-retouched aesthetic when that’s not what the majority of the training data actually looks like?
Its striking how hard it is for models to reproduce something as common and statistically ordinary as normal human skin without needing specialized prompting, LoRAs, finetunes, or upscalers. Natural skin texture should arguably be the baseline behavior, yet it very obviously isnt. Why?
https://redd.it/1t9gv4z
@rStableDiffusion
The internet is full of normal, candid photos of people with natural skin texture. Theres a subset of heavily retouched editorial or beauty photography with that smooth porcelain skin look, but that’s clearly a minority of all human images online. Most photos of people are just regular snapshots where skin looks like actual skin.
So why do image models, especially open source ones, struggle so much to generate realistic looking people out of the box? Why do they default to this plasticky, airbrushed, over-retouched aesthetic when that’s not what the majority of the training data actually looks like?
Its striking how hard it is for models to reproduce something as common and statistically ordinary as normal human skin without needing specialized prompting, LoRAs, finetunes, or upscalers. Natural skin texture should arguably be the baseline behavior, yet it very obviously isnt. Why?
https://redd.it/1t9gv4z
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Which workflows are you guys using now for LTX 2.3?
Since prompt relay and other new workflows have released recently, it looks like there are far more options to use ltx 2.3, what are some of the best quality, or coolest workflows you guys have seen or used so far?
https://redd.it/1t9itpr
@rStableDiffusion
Since prompt relay and other new workflows have released recently, it looks like there are far more options to use ltx 2.3, what are some of the best quality, or coolest workflows you guys have seen or used so far?
https://redd.it/1t9itpr
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I built a site to create free AI videos using LTX 2.3 running on my own GPUs
https://redd.it/1t9juoy
@rStableDiffusion
https://redd.it/1t9juoy
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
TenStrip's Workflow is the first LTX 2.3 workflow I found that actually works for Spicy Content it's almost like using the old Grok.
https://redd.it/1t9pbjd
@rStableDiffusion
https://redd.it/1t9pbjd
@rStableDiffusion
The Anima realism model is crazy good. Don’t miss it!
I’ve been messing with the anima realism model posted here (https://civitai.red/models/2585622/ultrareal-fine-tune-anima). If you want prompt adherence for weird stuff, it does a really good job. What’s cool is you can do hybrid danbooru / natural language and it just goes with it.
I’m stunned at how good it is and surprised it’s not getting more traction, especially since this is the authors experiment and the model and this finetune aren’t done yet. The output is decent if you prompt well. It’s not as photo realistic as ZIT or whatever but it will do all your weird danbooru tags other ones blush over. I actually think for the amateur photography all you guys want here it’s a good model.
I do 50 steps , 5cfg, euler (not ancestral). Anima is slow as hell on my Mac for such a small model but hoping the devs improve it somehow. It also works with the turbo lora!
Additionally I saw someone extracted the realism ‘stuff’ as a lora. It’s in the comments of the civitai page, linked in a random Google Drive.
Anyway try it out and if the author sees this thanks dude. Lmk if I can chip in for another training run. There is so much potential here.
https://redd.it/1t9r8c6
@rStableDiffusion
I’ve been messing with the anima realism model posted here (https://civitai.red/models/2585622/ultrareal-fine-tune-anima). If you want prompt adherence for weird stuff, it does a really good job. What’s cool is you can do hybrid danbooru / natural language and it just goes with it.
I’m stunned at how good it is and surprised it’s not getting more traction, especially since this is the authors experiment and the model and this finetune aren’t done yet. The output is decent if you prompt well. It’s not as photo realistic as ZIT or whatever but it will do all your weird danbooru tags other ones blush over. I actually think for the amateur photography all you guys want here it’s a good model.
I do 50 steps , 5cfg, euler (not ancestral). Anima is slow as hell on my Mac for such a small model but hoping the devs improve it somehow. It also works with the turbo lora!
Additionally I saw someone extracted the realism ‘stuff’ as a lora. It’s in the comments of the civitai page, linked in a random Google Drive.
Anyway try it out and if the author sees this thanks dude. Lmk if I can chip in for another training run. There is so much potential here.
https://redd.it/1t9r8c6
@rStableDiffusion
civitai.red
UltraReal Fine-Tune Anima - v1.2 Anima_Preview3 | Anima Checkpoint | Civitai
--preview3 I'm also sharing a new experimental version of UltraReal FineTune Anima, this time trained on Anima_preview3. This version was made beca...
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
https://arxiv.org/abs/2605.08043
https://redd.it/1t9s8da
@rStableDiffusion
https://arxiv.org/abs/2605.08043
https://redd.it/1t9s8da
@rStableDiffusion
arXiv.org
SCOPE: Structured Decomposition and Conditional Skill...
While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding,...