r/StableDiffusion

8 views20:40

Qwen Image Bench - Finetune for image eval
https://huggingface.co/Qwen/Qwen-Image-Bench

https://redd.it/1trgfmn
@rStableDiffusion

huggingface.co

Qwen/Qwen-Image-Bench · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

6 views22:40

LTX 2.3 Character Dialogue

https://redd.it/1trkyr0
@rStableDiffusion

7 views01:40

r/StableDiffusion

What would you run on an RTX Pro 6000 Blackwell?

Lots of people ask about what to run on small GPUs, but nobody asks about big GPUs. What would you do with 96GiB of VRAM?

I play with Z-Image and LTX (and derivatives like Sulphur), and I use Qwen for image editing. I still dabble a bit with older SD1.5 and SDXL models because there are so many useful loras, and they run fast so it's easy to generate a huge batch and then cherry-pick the best results.

Pic is the system with the Blackwell card and the old Ada card. Color-cycling RGB because my inner child is still alive and loves this BS. I'll do minimalism when I die.

https://preview.redd.it/lr0ji4tz964h1.jpg?width=4032&format=pjpg&auto=webp&s=8fc9962eaba7077e08425d0ab52b416a5fb8ee7a

https://redd.it/1trlcpu
@rStableDiffusion

6 views02:40

r/StableDiffusion

Civitai

UnstableAnimaV1-12step - UnstableAnimaV2Turbo6step | Anima Checkpoint | Civitai

V2 anima-base-1 + some of my loras + official anima-turbo-lora-v0.2 test in forge-neo ER SDE BETA step=4-16 prefer 6 CFG=1-4 prefer 1.25 offset=3-1...

4 views06:40

r/StableDiffusion

Question about training a lora for character style consistency

https://redd.it/1trrh6j
@rStableDiffusion

From the StableDiffusion community on Reddit: Question about training a lora for character style consistency

Explore this post and more from the StableDiffusion community

6 views07:40

r/StableDiffusion

6 views07:40

r/StableDiffusion

Anima Ip Adapter is comming.

It seems someone is working on IP Adapter for Anima. If it is good it could finally make sd 1.5 obsolete.

https://github.com/Wenaka2004/comfyui-anima-ipadapter

https://redd.it/1trw160
@rStableDiffusion

GitHub

GitHub - Wenaka2004/comfyui-anima-ipadapter: IP-Adapter custom node for Anima in ComfyUI

IP-Adapter custom node for Anima in ComfyUI. Contribute to Wenaka2004/comfyui-anima-ipadapter development by creating an account on GitHub.

6 views10:40

r/StableDiffusion

Video Colorizing using LTX 2.3 lora
https://www.youtube.com/watch?v=9k8_EOp5tVI

https://redd.it/1tru7lh
@rStableDiffusion

YouTube

Vogue by Madonna (Colorized)

This is the first time I have tried colorizing monochrome video using local AI. The original music video is only available in 480p resolution, so the quality is not ideal.

Original source video: https://www.youtube.com/watch?v=GuJQSAiODqI

All of my AI demos:…

7 views11:40

r/StableDiffusion

GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser.
https://github.com/orion4d/Orion4D_generative_paint

https://redd.it/1try64e
@rStableDiffusion

GitHub

GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface…

Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser. - orion4d/Orion4D_generative_paint

6 views12:40

r/StableDiffusion

8-step FLUX.2-dev DMD2 distillation

A new 8-step distillation of FLUX.2-dev from a professional lab. Haven't been able to try it yet as it's in diffusers format, but seems insteresting.

Blog: https://www.baseten.co/blog/faster-image-generation-timestep-distillation-flux2/

Model: https://huggingface.co/baseten/distilled\_8step\_FLUX.2-dev

https://redd.it/1trx9es
@rStableDiffusion

Baseten

Timestep distillation: 2.5x faster FLUX.2 image generation

Timestep distillation compresses FLUX.2 denoising steps from 20 to 8, achieving 2.5x faster image generation without noticeable quality loss.

5 views13:40

r/StableDiffusion

Anima – Sharing Some Prompts and Results

https://redd.it/1ts0byp
@rStableDiffusion

From the StableDiffusion community on Reddit: Anima – Sharing Some Prompts and Results

Explore this post and more from the StableDiffusion community

5 views14:40

r/StableDiffusion

Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060 VRAM 6GB)

https://redd.it/1ts0lc1
@rStableDiffusion

From the sdforall community on Reddit: Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060…

Explore this post and more from the sdforall community

6 views15:40

r/StableDiffusion

Damn... did all of you who use Runpod have very low to 0 availability?
https://redd.it/1trzex3
@rStableDiffusion

6 views16:40

r/StableDiffusion

Presenting Stable Audio Studio: A dedicated app for running Stable Audio models locally
https://redd.it/1trzjgx
@rStableDiffusion

5 views17:40

NVIDIA PiD Preview Inside a Next-Gen Tiled Upscaler & Enhancer

https://redd.it/1ts3ofu
@rStableDiffusion

5 views18:40

r/StableDiffusion

What are the recommended resolutions for Anima? Why are all the CivitAI images vertical?
https://redd.it/1ts6e5t
@rStableDiffusion

4 views19:40

r/StableDiffusion

Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

TL;DR

Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control

What this means for you: Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

SKIP IF NOT INTERESTED COLOR SPACE BACKGROUND

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

Oklab Technical Deep Dive

CIE Color Spaces & Perceptual Uniformity

---

FULL PROPOSAL

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

Trajectory Simplification in Flow Matching:

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

Latent Compression & Disentangled Chromatic Subspaces:

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

Structured Color Conditioning Pathways:

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette

bottosson.github.io

A perceptual color space for image processing

From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...

4 views20:40

r/StableDiffusion

Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

**TL;DR**

**Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control**

**What this means for you:** Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

**[SKIP IF NOT INTERESTED] COLOR SPACE BACKGROUND**

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

[Oklab Technical Deep Dive](https://bottosson.github.io/posts/oklab/)

[CIE Color Spaces & Perceptual Uniformity](https://en.wikipedia.org/wiki/CIELAB_color_space)

---

**FULL PROPOSAL**

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

**Trajectory Simplification in Flow Matching:**

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

**Latent Compression & Disentangled Chromatic Subspaces:**

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

**Structured Color Conditioning Pathways:**

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette

bottosson.github.io

A perceptual color space for image processing

From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...

3 views20:40

About

Blog

Apps

Platform