A few tries with HiDream O1

Hi,

I've been playing with O1 since yesterday. While I can't say I have enough data to make a definitive decision on whether I'll have use for this models, I wanted to share a few generations and observations.

1: The square marks: quite often and commonly enough that it's jarring, the generated image has a small square pattern, sometimes all over the image, sometimes in some part of it. It requires some cherry picking to discard those, but I suspect it might be the settings that might not be optimal. Also, sometimes, rarely, it just produce a fried image or useless pattern, but that's quite rare. I am blaming my settings, config and lack of ComfyUI node at this point.

2: The model has, like most recent models, low variations based on seed when using a vague prompt.

[A French woman gives this. One needs to be more descriptive. ](https://preview.redd.it/ekddb6diqb0h1.png?width=1024&format=png&auto=webp&s=e1d0d1e40b3c1ebad00eb0b3f5737ced01e9f890)

[A café. It's apparently a place where clean-shaven men are not allowed.](https://preview.redd.it/0b53mpx7sb0h1.png?width=1024&format=png&auto=webp&s=412699058f8aef2eed01ca88d443add5fcee74e3)

3: It has very good editing capabilities at first glance. But I didn't test them enough for a definitive opinion.

4. It is twice as fast as Qwen2512 on my 4090, generating an image at 1,25s/it. The recommanded settings are 50 steps, but so are other models where we found that 20-25 steps are more than enough.

5. It is very good with prompt following, especially complex images.

I tried to replicate the results in this thread: [https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest\_create\_an\_image\_using\_an\_openweight\_model/](https://www.reddit.com/r/StableDiffusion/comments/1pgx89t/contest_create_an_image_using_an_openweight_model/) (Qwen2512 and ZIT are displayed) with the following prompt:

*A wizard with sharp, angular, chiseled facial features sits on an ornate curule chair inside a dim canvas tent. The wizard wears a long dark robe covered with glowing arcane runes and thin metallic embroidery. A wide hood rests on the wizard’s shoulders, showing short, messy white hair. A metal staff leans against the curved leg of the chair. Warm lantern light hangs from a wooden pole and casts deep golden reflections across the tent fabric, creating stretched shadows behind every figure.*

*On the left and right of the wizard stand two human guards dressed in light leather armor reinforced with metal rivets. The male guard has short brown hair, a trimmed beard, and holds a long spear pointed toward the ground. The female guard has a tight braid, leather shoulder plates, and a round small shield strapped to her back. Both guards keep their eyes fixed on the kneeling warrior, their bodies tense, with their spears angled slightly forward. Behind them, the tent wall shows hanging banners with faded heraldic symbols.*

*In front of the wizard, facing him, a wounded warrior kneels on a carpet of red and brown woven patterns. His wrists are bound with heavy iron chains, and his head is lowered. His steel breastplate is cracked, and dust covers his leather boots. A deep cut marks his cheek, and dried blood darkens the edges of his leather gloves. The warrior’s long sword lies on the ground near him, out of reach, its blade reflecting a faint light from the lantern.*

*Behind the kneeling warrior, two green-skinned orcs in dark leather armor grip the chains. Each orc has wide shoulders, muscular arms, and visible tusks curving upward. One orc wears a metal pauldron on a single shoulder, while the other has tribal tattoos on his arms. Their eyes glow under the lantern light, and both keep a firm hold on the chains, pulling them tight. Their boots press heavily into the dusty ground.*

*In the back of the tent, a robed assistant with a simple belt pouch stretches out a leather coin purse toward the orcs. The assistant’s hood hides most of the face, revealing only a thin mouth and a single lock of dark hair. One hand holds the pouch, the other clutches a
rolled parchment. A wooden table stands beside the assistant, covered with scrolls, a silver inkpot, and unlit candles. On the ground near the table lie scattered parchment sheets, a metal goblet, and a small open chest filled with coins.*

*The atmosphere is heavy and tense, with dense shadows filling the upper corners of the tent. A subtle cloud of dust floats in the lantern light. The canvas walls show faint marks of wind and sand. Outside the tent entrance, only darkness and a tiny trace of moonlight are visible, creating a dramatic contrast with the warm light inside.*

[The female guard's spear needs editing but for a one-shot it beats the competition. ](https://preview.redd.it/zm3i8j1cub0h1.png?width=2048&format=png&auto=webp&s=fe7ce3fc0aeca94788148711a263659a04abf2e2)


With this prompt:

*A spellcaster unleashes an acid splash spell in a muddy village path. The caster, cloaked and focused, extends one hand forward as two glowing green orbs arc through the air, mid-flight. Nearby, two startled peasants standing side by side have been splashed by acid. Their faces are contorted with pain, their flesh begins to sizzle and bubble, steam rising as holes eat through their rough tunics. A third peasant, reduced to skeleton, rests on its knees between them in a pool of acid.*




[The photographic version](https://preview.redd.it/v4md67fjwb0h1.png?width=2048&format=png&auto=webp&s=025a225f1ddb6618e27a4c5a3660b491d3cb6a1d)



[The carton version.](https://preview.redd.it/3wkuls5dwb0h1.png?width=2048&format=png&auto=webp&s=5688bea08279cd5690f0e7ea58550ad80dab4015)

Not perfect, but great prompt adherence.

6. It can be closer than NB in some case, maybe explaining its high initial rating:

https://preview.redd.it/671wibljxb0h1.png?width=2048&format=png&auto=webp&s=93d6a7144f71788b8b1136b90b48b9f504763a3a

Compare to other models, proprietary and free here:

[https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison\_of\_models/](https://www.reddit.com/r/StableDiffusion/comments/1mohl1p/comparison_of_models/)


Another sample:

[Nanobanana's.](https://preview.redd.it/0szwchw1yb0h1.png?width=1408&format=png&auto=webp&s=b44e98eba05338c4dba4de72bae62d40e500ed03)

[O1's.](https://preview.redd.it/ypskdi4byb0h1.png?width=2048&format=png&auto=webp&s=639f0b23c7f9e7e8071bbe9fb93898effc20db86)


Or the flying citadel and portal samples:


Other models here: [https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen\_and\_zimageturbo\_zit\_prompt\_adherence\_contest/](https://www.reddit.com/r/StableDiffusion/comments/1pa2mca/qwen_and_zimageturbo_zit_prompt_adherence_contest/)

https://preview.redd.it/yb22farjyb0h1.png?width=2048&format=png&auto=webp&s=4eaac3cb4b41a5054d91b630cd77b5a39f76cb16

https://preview.redd.it/nht918wkyb0h1.png?width=2048&format=png&auto=webp&s=ea5b0c23ff9f68826a34d1b31971de1788f4eed6

7. Or for the fallling girl:



https://preview.redd.it/q0g68o2zyb0h1.png?width=2048&format=png&auto=webp&s=9558c3070afb37112bfae78fa9b5a26449ef742f

*A young girl tumble from a jagged hole in the ceiling, her small body suspended mid-fall, arms flailing while her long chestnut hair streams upward as though caught in a sudden updraft. She wears a pale cotton dress, simple and slightly wrinkled, the hemp fluttering wildly around her knees as she plunges. Her face is a portrait of surprise and fear, wide hazel eyes staring into the unknown lips, her parted as if mid-gasp. Beside her, a sleek black cat twists and arches, claws extended as although searching for purpose, its green eyes glinting in the half-light. Both are frozen in that fragile instant of descent, their outlines illuminated by the stark contrast of plaster dust and neon glow. They fall into an opulent living room, decorated with refined taste and warm ambient lighting. The girl’s pale dress and scuffed leather shoes seem out of place against the grandeur of velvet upholstery and polished marble surfaces. A velvet sofa in deep burgundy anchors the space, surrounded by glass tables that catch the golden shimmer of a sculptural chandelier
overhead. Cushions scatter as if startled by the intrusion, while the cat’s trajectory points it straight toward the rug below. The girl, however, appears weightless and delicate, as though she might have the echo against such refinement. The room opens towards a vast corner window that stretches from floor to ceiling, to reveal the glowing skyline of a modern metropolis. Skyscrapers stand like gleaming monoliths, their facades awash in neon pinks, silvers, and electric blues. Hovering vehicles trace faint lines of light across the night sky. Against this futuristic backdrop, the girl’s old-fashioned dress and bare scraped knees give her an anachronistic, almost storybook presence, like a character who has stumbled from another time into this sleek, unyielding world. Details heighten the dreamlike tension: fragments of plaster hover like a cloud around her slender form, dust motes glowing in the chandelier's warmth; a Persian rug, richly patterned in crimson and gold, directly below her trajectory, as if to cushion or entrap her fall. A half-open book rests on a nearby table, its pages ruffled by the movement of air, as though the apartment itself is holding its breath. The girl's hair and dress ripple in the invisible currents, her face caught between terror and wonder, as if uncertain whether she has stepped into a nightmare or a fantastical new beginning.*


Since it made it out of proportion with the rest of the image, like many models I tried with this prompt, I used the edit function to make her smaller:

https://preview.redd.it/dqlovgs6zb0h1.png?width=2048&format=png&auto=webp&s=f323cd057d50c20909e853f56a20dd8ca02fe613

8. It doesn't seem to be trained on enough anatomy. A prompt with a man sitting while holding one of his feet with both hands over his knee leads to very bad results while SOTA models usually pass this test easily. It might benefit from finetuning, with 8B parameters.


All in all, it seems to be interesting for a lower-paramater model. HiDream claims to have built a pro model with 200B parameters, it will be interesting to see how it compare, both with the open-weight one and the proprietary SOTA models, so we can gauge whether increasing the number of parameters is really the only way forward (which might be disheartening as long as we only get 24-32 GB VRAM cards on personal computers).




https://redd.it/1t9akyg
@rStableDiffusion
Artificial Analysis needs to address HiDream-01 Benchmarks

I'm struggling to understand how an utterly deficient model like HiDream-01 could have performed so well on user preference benchmarks. I don't want to jump to conclusions or speculate baselessly on how they did it, but it absolutely warrants an investigation if people are expected to take this benchmark seriously in the future. I just want an explanation for how something like this happens and, if it was illegitimate, how they will prevent it in the future.

https://redd.it/1t9eifa
@rStableDiffusion
I made some Slider Loras for Ace-Step 1.5 if anyone is interested

https://huggingface.co/Xanthius/Ace-Step-1.5-XL-Concept-Sliders/tree/main


Unfortunately AI Toolkit doesn't have native support for Slider Loras for Ace-step 1.5 but I was able to edit the code enough to get it working properly and now I can train concept sliders in about 10 mins to an hour each and without needing specific datasets for the concepts. Since nobody else has a working way to get sliders trained up themselves, I decided to put together a collection of them for people to use if they want to.


My first sliders on there are:


\- male to female voice

\- studio production to lofi

\- Bass boost

\- Choir to solo vocalist

\- digital to acoustic sound

\- Aggressive to gentle

\- drum intensity

\- energetic to calm

\- happiness

\- soft to projected voice

\- talking to singing

\- tempo

\- danceability


But I intend to add some more if people have ideas for them

https://redd.it/1t9e5cj
@rStableDiffusion
OSTRIS about HiDream-O1 LoRA on ToolKit

I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry changing innovation!

https://x.com/ostrisai/status/2053256188142428341


https://redd.it/1t9h7ps
@rStableDiffusion
Why is realistic skin such an issue for models?

The internet is full of normal, candid photos of people with natural skin texture. Theres a subset of heavily retouched editorial or beauty photography with that smooth porcelain skin look, but that’s clearly a minority of all human images online. Most photos of people are just regular snapshots where skin looks like actual skin.

So why do image models, especially open source ones, struggle so much to generate realistic looking people out of the box? Why do they default to this plasticky, airbrushed, over-retouched aesthetic when that’s not what the majority of the training data actually looks like?

Its striking how hard it is for models to reproduce something as common and statistically ordinary as normal human skin without needing specialized prompting, LoRAs, finetunes, or upscalers. Natural skin texture should arguably be the baseline behavior, yet it very obviously isnt. Why?

https://redd.it/1t9gv4z
@rStableDiffusion
Which workflows are you guys using now for LTX 2.3?

Since prompt relay and other new workflows have released recently, it looks like there are far more options to use ltx 2.3, what are some of the best quality, or coolest workflows you guys have seen or used so far?

https://redd.it/1t9itpr
@rStableDiffusion
I built a site to create free AI videos using LTX 2.3 running on my own GPUs
https://redd.it/1t9juoy
@rStableDiffusion