Artificial Analysis needs to address HiDream-01 Benchmarks
I'm struggling to understand how an utterly deficient model like HiDream-01 could have performed so well on user preference benchmarks. I don't want to jump to conclusions or speculate baselessly on how they did it, but it absolutely warrants an investigation if people are expected to take this benchmark seriously in the future. I just want an explanation for how something like this happens and, if it was illegitimate, how they will prevent it in the future.
https://redd.it/1t9eifa
@rStableDiffusion
I'm struggling to understand how an utterly deficient model like HiDream-01 could have performed so well on user preference benchmarks. I don't want to jump to conclusions or speculate baselessly on how they did it, but it absolutely warrants an investigation if people are expected to take this benchmark seriously in the future. I just want an explanation for how something like this happens and, if it was illegitimate, how they will prevent it in the future.
https://redd.it/1t9eifa
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I made some Slider Loras for Ace-Step 1.5 if anyone is interested
https://huggingface.co/Xanthius/Ace-Step-1.5-XL-Concept-Sliders/tree/main
Unfortunately AI Toolkit doesn't have native support for Slider Loras for Ace-step 1.5 but I was able to edit the code enough to get it working properly and now I can train concept sliders in about 10 mins to an hour each and without needing specific datasets for the concepts. Since nobody else has a working way to get sliders trained up themselves, I decided to put together a collection of them for people to use if they want to.
My first sliders on there are:
\- male to female voice
\- studio production to lofi
\- Bass boost
\- Choir to solo vocalist
\- digital to acoustic sound
\- Aggressive to gentle
\- drum intensity
\- energetic to calm
\- happiness
\- soft to projected voice
\- talking to singing
\- tempo
\- danceability
But I intend to add some more if people have ideas for them
https://redd.it/1t9e5cj
@rStableDiffusion
https://huggingface.co/Xanthius/Ace-Step-1.5-XL-Concept-Sliders/tree/main
Unfortunately AI Toolkit doesn't have native support for Slider Loras for Ace-step 1.5 but I was able to edit the code enough to get it working properly and now I can train concept sliders in about 10 mins to an hour each and without needing specific datasets for the concepts. Since nobody else has a working way to get sliders trained up themselves, I decided to put together a collection of them for people to use if they want to.
My first sliders on there are:
\- male to female voice
\- studio production to lofi
\- Bass boost
\- Choir to solo vocalist
\- digital to acoustic sound
\- Aggressive to gentle
\- drum intensity
\- energetic to calm
\- happiness
\- soft to projected voice
\- talking to singing
\- tempo
\- danceability
But I intend to add some more if people have ideas for them
https://redd.it/1t9e5cj
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: I made some Slider Loras for Ace-Step 1.5 if anyone is interested
Explore this post and more from the StableDiffusion community
OSTRIS about HiDream-O1 LoRA on ToolKit
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry changing innovation!
https://x.com/ostrisai/status/2053256188142428341
https://redd.it/1t9h7ps
@rStableDiffusion
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry changing innovation!
https://x.com/ostrisai/status/2053256188142428341
https://redd.it/1t9h7ps
@rStableDiffusion
X (formerly Twitter)
Ostris (@ostrisai) on X
I am running my first test on training a HiDream-O1 LoRA on AI Toolkit. I don't want to get too excited too early. But this is the coolest model I have EVER seen. Super efficient pixel space. No VAE. No Text Encoder. Trains super fast. This is an industry…
Why is realistic skin such an issue for models?
The internet is full of normal, candid photos of people with natural skin texture. Theres a subset of heavily retouched editorial or beauty photography with that smooth porcelain skin look, but that’s clearly a minority of all human images online. Most photos of people are just regular snapshots where skin looks like actual skin.
So why do image models, especially open source ones, struggle so much to generate realistic looking people out of the box? Why do they default to this plasticky, airbrushed, over-retouched aesthetic when that’s not what the majority of the training data actually looks like?
Its striking how hard it is for models to reproduce something as common and statistically ordinary as normal human skin without needing specialized prompting, LoRAs, finetunes, or upscalers. Natural skin texture should arguably be the baseline behavior, yet it very obviously isnt. Why?
https://redd.it/1t9gv4z
@rStableDiffusion
The internet is full of normal, candid photos of people with natural skin texture. Theres a subset of heavily retouched editorial or beauty photography with that smooth porcelain skin look, but that’s clearly a minority of all human images online. Most photos of people are just regular snapshots where skin looks like actual skin.
So why do image models, especially open source ones, struggle so much to generate realistic looking people out of the box? Why do they default to this plasticky, airbrushed, over-retouched aesthetic when that’s not what the majority of the training data actually looks like?
Its striking how hard it is for models to reproduce something as common and statistically ordinary as normal human skin without needing specialized prompting, LoRAs, finetunes, or upscalers. Natural skin texture should arguably be the baseline behavior, yet it very obviously isnt. Why?
https://redd.it/1t9gv4z
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Which workflows are you guys using now for LTX 2.3?
Since prompt relay and other new workflows have released recently, it looks like there are far more options to use ltx 2.3, what are some of the best quality, or coolest workflows you guys have seen or used so far?
https://redd.it/1t9itpr
@rStableDiffusion
Since prompt relay and other new workflows have released recently, it looks like there are far more options to use ltx 2.3, what are some of the best quality, or coolest workflows you guys have seen or used so far?
https://redd.it/1t9itpr
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
I built a site to create free AI videos using LTX 2.3 running on my own GPUs
https://redd.it/1t9juoy
@rStableDiffusion
https://redd.it/1t9juoy
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
TenStrip's Workflow is the first LTX 2.3 workflow I found that actually works for Spicy Content it's almost like using the old Grok.
https://redd.it/1t9pbjd
@rStableDiffusion
https://redd.it/1t9pbjd
@rStableDiffusion
The Anima realism model is crazy good. Don’t miss it!
I’ve been messing with the anima realism model posted here (https://civitai.red/models/2585622/ultrareal-fine-tune-anima). If you want prompt adherence for weird stuff, it does a really good job. What’s cool is you can do hybrid danbooru / natural language and it just goes with it.
I’m stunned at how good it is and surprised it’s not getting more traction, especially since this is the authors experiment and the model and this finetune aren’t done yet. The output is decent if you prompt well. It’s not as photo realistic as ZIT or whatever but it will do all your weird danbooru tags other ones blush over. I actually think for the amateur photography all you guys want here it’s a good model.
I do 50 steps , 5cfg, euler (not ancestral). Anima is slow as hell on my Mac for such a small model but hoping the devs improve it somehow. It also works with the turbo lora!
Additionally I saw someone extracted the realism ‘stuff’ as a lora. It’s in the comments of the civitai page, linked in a random Google Drive.
Anyway try it out and if the author sees this thanks dude. Lmk if I can chip in for another training run. There is so much potential here.
https://redd.it/1t9r8c6
@rStableDiffusion
I’ve been messing with the anima realism model posted here (https://civitai.red/models/2585622/ultrareal-fine-tune-anima). If you want prompt adherence for weird stuff, it does a really good job. What’s cool is you can do hybrid danbooru / natural language and it just goes with it.
I’m stunned at how good it is and surprised it’s not getting more traction, especially since this is the authors experiment and the model and this finetune aren’t done yet. The output is decent if you prompt well. It’s not as photo realistic as ZIT or whatever but it will do all your weird danbooru tags other ones blush over. I actually think for the amateur photography all you guys want here it’s a good model.
I do 50 steps , 5cfg, euler (not ancestral). Anima is slow as hell on my Mac for such a small model but hoping the devs improve it somehow. It also works with the turbo lora!
Additionally I saw someone extracted the realism ‘stuff’ as a lora. It’s in the comments of the civitai page, linked in a random Google Drive.
Anyway try it out and if the author sees this thanks dude. Lmk if I can chip in for another training run. There is so much potential here.
https://redd.it/1t9r8c6
@rStableDiffusion
civitai.red
UltraReal Fine-Tune Anima - v1.2 Anima_Preview3 | Anima Checkpoint | Civitai
--preview3 I'm also sharing a new experimental version of UltraReal FineTune Anima, this time trained on Anima_preview3. This version was made beca...
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
https://arxiv.org/abs/2605.08043
https://redd.it/1t9s8da
@rStableDiffusion
https://arxiv.org/abs/2605.08043
https://redd.it/1t9s8da
@rStableDiffusion
arXiv.org
SCOPE: Structured Decomposition and Conditional Skill...
While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding,...