r/StableDiffusion

Video Colorizing using LTX 2.3 lora
https://www.youtube.com/watch?v=9k8_EOp5tVI

https://redd.it/1tru7lh
@rStableDiffusion

YouTube

Vogue by Madonna (Colorized)

This is the first time I have tried colorizing monochrome video using local AI. The original music video is only available in 480p resolution, so the quality is not ideal.

Original source video: https://www.youtube.com/watch?v=GuJQSAiODqI

All of my AI demos:…

7 views11:40

r/StableDiffusion

GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser.
https://github.com/orion4d/Orion4D_generative_paint

https://redd.it/1try64e
@rStableDiffusion

GitHub

GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface…

Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser. - orion4d/Orion4D_generative_paint

6 views12:40

r/StableDiffusion

8-step FLUX.2-dev DMD2 distillation

A new 8-step distillation of FLUX.2-dev from a professional lab. Haven't been able to try it yet as it's in diffusers format, but seems insteresting.

Blog: https://www.baseten.co/blog/faster-image-generation-timestep-distillation-flux2/

Model: https://huggingface.co/baseten/distilled\_8step\_FLUX.2-dev

https://redd.it/1trx9es
@rStableDiffusion

Baseten

Timestep distillation: 2.5x faster FLUX.2 image generation

Timestep distillation compresses FLUX.2 denoising steps from 20 to 8, achieving 2.5x faster image generation without noticeable quality loss.

5 views13:40

r/StableDiffusion

Anima – Sharing Some Prompts and Results

https://redd.it/1ts0byp
@rStableDiffusion

From the StableDiffusion community on Reddit: Anima – Sharing Some Prompts and Results

Explore this post and more from the StableDiffusion community

5 views14:40

r/StableDiffusion

Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060 VRAM 6GB)

https://redd.it/1ts0lc1
@rStableDiffusion

From the sdforall community on Reddit: Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060…

Explore this post and more from the sdforall community

6 views15:40

r/StableDiffusion

Damn... did all of you who use Runpod have very low to 0 availability?
https://redd.it/1trzex3
@rStableDiffusion

6 views16:40

r/StableDiffusion

Presenting Stable Audio Studio: A dedicated app for running Stable Audio models locally
https://redd.it/1trzjgx
@rStableDiffusion

5 views17:40

NVIDIA PiD Preview Inside a Next-Gen Tiled Upscaler & Enhancer

https://redd.it/1ts3ofu
@rStableDiffusion

5 views18:40

r/StableDiffusion

What are the recommended resolutions for Anima? Why are all the CivitAI images vertical?
https://redd.it/1ts6e5t
@rStableDiffusion

4 views19:40

r/StableDiffusion

Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

TL;DR

Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control

What this means for you: Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

SKIP IF NOT INTERESTED COLOR SPACE BACKGROUND

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

Oklab Technical Deep Dive

CIE Color Spaces & Perceptual Uniformity

---

FULL PROPOSAL

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

Trajectory Simplification in Flow Matching:

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

Latent Compression & Disentangled Chromatic Subspaces:

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

Structured Color Conditioning Pathways:

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette

bottosson.github.io

A perceptual color space for image processing

From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...

4 views20:40

r/StableDiffusion

Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

**TL;DR**

**Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control**

**What this means for you:** Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

**[SKIP IF NOT INTERESTED] COLOR SPACE BACKGROUND**

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

[Oklab Technical Deep Dive](https://bottosson.github.io/posts/oklab/)

[CIE Color Spaces & Perceptual Uniformity](https://en.wikipedia.org/wiki/CIELAB_color_space)

---

**FULL PROPOSAL**

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

**Trajectory Simplification in Flow Matching:**

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

**Latent Compression & Disentangled Chromatic Subspaces:**

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

**Structured Color Conditioning Pathways:**

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette

bottosson.github.io

A perceptual color space for image processing

From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...

3 views20:40

r/StableDiffusion

constraints in real-world assets into explicit conditioning signals.

• Geometry-Locked Synthetic Pairs: Generate structural duplicates (via depth/Canny/structure maps) with systematically varied harmonic relationships (complementary, triadic, etc.) for clean ablation studies that isolate color logic from spatial priors.

• Vector-Based Rule Embeddings: Feed numerical Oklch coordinates + harmonic relationship vectors directly into cross-attention or lightweight adapters, bypassing the ambiguity of text tokens entirely.
Each approach trades off between data realism, compute overhead, and conditioning precision. We encourage community experimentation across all three, with shared benchmarking to determine which yields the strongest ΔE stability and palette adherence.

**Expected Outcomes & Measurable Metrics:**

- Reduced Latent Trajectory Curvature: Quantifiable via ODE solver step count, velocity field smoothness, and latent interpolation linearity.

- Hue/Chroma Stability: Lower ΔE deviation under varying CFG scales, step counts, and latent perturbations.

- Linear Color Steering: Independent control over lightness, chroma, and hue via latent axis manipulation without cross-dimensional leakage.

- Palette Adherence Benchmarks: Standardized evaluation of spectral compliance using constrained Oklch injection and harmonic rule accuracy.

This proposal advocates optimizing internal training and conditioning representation to match perceptual geometry, reducing representational overhead, and enabling precise, mathematically grounded chromatic control.

Sincerely,
crantob, A practitioner observing latent space geometry

---

**THREE KEY CHALLENGEABLE CLAIMS & SUPPORTING RESEARCH**

• *Claim 1: Perceptually uniform spaces reduce flow trajectory curvature & improve step efficiency.*

- **Why reviewers push back:** Flow models already approximate straight lines; skeptics argue color space choice won't meaningfully alter optimal transport paths or sampling speed.

- **Supporting theory:** Rectified Flow minimizes transport cost by enforcing straight trajectories. When data representation matches perceptual distance, the velocity field requires fewer non-linear corrections to maintain structural/color integrity along the path.

- **References:**

[Flow Matching for Generative Modeling (Lipman et al.)](https://arxiv.org/abs/2210.02747)

[Rectified Flow: A Marginal Preserving Approach to Optimal Transport (Liu et al.)](https://arxiv.org/abs/2209.03003)

• *Claim 2: ΔE(Oklab) + orthogonal regularization disentangles lightness/chroma in VAEs.*

- **Why reviewers push back:** Standard VAEs entangle features regardless of loss function; true disentanglement usually requires heavy architectural priors or explicit labels.

- **Supporting theory:** Capacity constraints (β-VAE) combined with perceptual losses have been empirically proven to isolate semantic axes. Using ΔE as the perceptual metric explicitly penalizes cross-axis gradient coupling between L and (a,b), making orthogonality a trainable prior rather than a statistical accident.

- **References:**

[β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework (Higgins et al.)](https://arxiv.org/abs/1606.05579)

[Perceptual Losses for Real-Time Style Transfer and Super-Resolution (Johnson et al.)](https://arxiv.org/abs/1603.08155)

• *Claim 3: Vector-based harmonic conditioning outperforms textual color tokens.*

- **Why reviewers push back:** Text encoders already embed implicit color statistics; explicit vectors may add overhead without measurable gains over fine-tuned CLIP embeddings.

- **Supporting theory:** Text prompts encode statistical co-occurrence, while numerical Oklch vectors encode explicit spectral geometry. Prior work in image-to-image diffusion demonstrates that direct channel/histogram conditioning bypasses CLIP's semantic ambiguity, yielding stricter palette adherence and lower ΔE deviation under identical compute budgets.

- **References:**
[Palette: Image-to-Image Diffusion Models (Saharia et

arXiv.org

Flow Matching for Generative Modeling

We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow...

4 views20:40

r/StableDiffusion

al.)](https://arxiv.org/abs/2208.04232)

[Oklab: A Perceptual Color Space (Ottosson)](https://bottosson.github.io/posts/oklab/)

---

Ideas: mine

Text: me + qwen + GLM fighting each other over it for a couple hours.

https://redd.it/1ts994w
@rStableDiffusion

arXiv.org

Learning Diverse Document Representations with Deep Query...

In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated...

3 views20:40

r/StableDiffusion

I ported Pixal3D to Apple Silicon
https://blog.chillaid.art/posts/porting-pixal3d-one-cursed-kernel-at-a-time

https://redd.it/1ts82da
@rStableDiffusion

blog.chillaid.art

Chillaid Blog

Tech ramblings and personal thoughts from the guy behindChillaid Engineering.

3 views21:40

r/StableDiffusion

0:24

This media is not supported in your browser

VIEW IN TELEGRAM

UPDATE v0.2.20 Nexus BTA My Web UI for Comfy with Predfined Workflow/template

https://redd.it/1tsa458
@rStableDiffusion

3 views22:40

r/StableDiffusion

Help Needed: How to create this type of art in Stable Diffusion? (Models, LoRA & settings)

https://redd.it/1tsfg8l
@rStableDiffusion

From the StableDiffusion community on Reddit: Help Needed: How to create this type of art in Stable Diffusion? (Models, LoRA &…

Explore this post and more from the StableDiffusion community

4 views00:40

r/StableDiffusion

4 views00:40

r/StableDiffusion

PSA 5060ti 16GB for $300.99. 5070ti 16GB for $699.99. Best Buy in store clearance.

The 5060ti 16GB(SKU 6630626) has been on clearance for a couple of weeks in Best Buy stores for $419.99. A couple of days ago, it dropped to $300.99. The 5070ti 16GB(SKU 6620367) has been on clearance for $699.99. Not all stores will have these prices. Some still have the 5060ti for $419.99 still. The 5070ti for $799. So YMMV. But a lot of stores do have the lower prices.

This is a in store only deal, but your local Best Buy doesn't have to have it in stock. Of course, it's best that it does. If it doesn't, you can order items in Best Buy stores for the same price the store sells it for. So instead of paying the Best Buy online price of $599.99 for the 5060ti, when you order it in store you pay $300.99. Just go into a store and give them those SKUs to look up the price in store.

As of this post, both are still available online for shipping. As long as there is stock online, you should be able to order it at your local Best Buy for the in store clearance prices shipped to you. Of course, your local Best Buy has to have it on clearance at that price. It's not guaranteed all will.

Lastly, there's an Nvidia promo for a free copy of 007 First Light going on right now. So you will also get a key to redeem for that game. The game is like $70.

I hope this helps someone.

https://redd.it/1tse4rl
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

3 views01:40

About

Blog

Apps

Platform