What would you run on an RTX Pro 6000 Blackwell?
Lots of people ask about what to run on small GPUs, but nobody asks about big GPUs. What would you do with 96GiB of VRAM?
I play with Z-Image and LTX (and derivatives like Sulphur), and I use Qwen for image editing. I still dabble a bit with older SD1.5 and SDXL models because there are so many useful loras, and they run fast so it's easy to generate a huge batch and then cherry-pick the best results.
Pic is the system with the Blackwell card and the old Ada card. Color-cycling RGB because my inner child is still alive and loves this BS. I'll do minimalism when I die.
https://preview.redd.it/lr0ji4tz964h1.jpg?width=4032&format=pjpg&auto=webp&s=8fc9962eaba7077e08425d0ab52b416a5fb8ee7a
https://redd.it/1trlcpu
@rStableDiffusion
Lots of people ask about what to run on small GPUs, but nobody asks about big GPUs. What would you do with 96GiB of VRAM?
I play with Z-Image and LTX (and derivatives like Sulphur), and I use Qwen for image editing. I still dabble a bit with older SD1.5 and SDXL models because there are so many useful loras, and they run fast so it's easy to generate a huge batch and then cherry-pick the best results.
Pic is the system with the Blackwell card and the old Ada card. Color-cycling RGB because my inner child is still alive and loves this BS. I'll do minimalism when I die.
https://preview.redd.it/lr0ji4tz964h1.jpg?width=4032&format=pjpg&auto=webp&s=8fc9962eaba7077e08425d0ab52b416a5fb8ee7a
https://redd.it/1trlcpu
@rStableDiffusion
Anima-6steps modle
https://civitai.com/models/2637029/unstableanimav1-12step?modelVersionId=2988995
anima-base-1 + some of my loras + official anima-turbo-lora-v0.2
test in forge-neo ER SDE BETA
step=4-16 prefer 6
CFG=1-4 prefer 1.25
offset=3-12 prefer 8
The advantage of this version which is using the official Turbo Lora is that it's very fast; a 5070TI can be generated in less than 5 seconds in 6 steps, which is quite sensitive to prompts. The problem is that because it's generated so quickly, it's not sensitive to seed changes, and the output is almost the same with same prompt.
A simple test {1girl tanktop sitting} that the basic structure is already established by step 3; by step 4-6, details are simply being added; if more than 10 steps are taken, the structure will change but not much
https://preview.redd.it/on97cjnxh74h1.jpg?width=2688&format=pjpg&auto=webp&s=7571b3d67fc8a20df31377398564ec1c07187f9b
and well function with loras
https://preview.redd.it/1cx4ntvsk74h1.png?width=896&format=png&auto=webp&s=38079f372fe3bd24b9a329ed61433d2d017615d8
https://preview.redd.it/eysilx7xk74h1.png?width=896&format=png&auto=webp&s=a4fcf6e5bf818af48b5973cca6a2253d5dd195bd
https://preview.redd.it/lq3hcjmyk74h1.png?width=896&format=png&auto=webp&s=05de4a579a1ddb926c105a2013422864896becc3
https://preview.redd.it/39cuv7a0l74h1.png?width=896&format=png&auto=webp&s=40162b010b297237cde0a1e4b43caffbd7ca96cb
https://preview.redd.it/ijmowxw2l74h1.png?width=896&format=png&auto=webp&s=08236bbbe2457340bd4c8e48adb47db4195f3a72
https://preview.redd.it/z0dos8l8l74h1.png?width=896&format=png&auto=webp&s=9cf57e75d2f3c645e3ac14118f6c8fd08a0b0787
https://preview.redd.it/35wo2cf9l74h1.png?width=896&format=png&auto=webp&s=bac27c6796bc3e445ee6f4a850fd6f75e3f5f0e4
https://preview.redd.it/j13567fbl74h1.png?width=1152&format=png&auto=webp&s=a76b23b0f2ab4bc40693ec2cf12d8224b30bd4f9
https://preview.redd.it/z09k3oncl74h1.png?width=1152&format=png&auto=webp&s=221badfb10c3bff294e20a6d6d3a5657a35197db
https://redd.it/1trqncp
@rStableDiffusion
https://civitai.com/models/2637029/unstableanimav1-12step?modelVersionId=2988995
anima-base-1 + some of my loras + official anima-turbo-lora-v0.2
test in forge-neo ER SDE BETA
step=4-16 prefer 6
CFG=1-4 prefer 1.25
offset=3-12 prefer 8
The advantage of this version which is using the official Turbo Lora is that it's very fast; a 5070TI can be generated in less than 5 seconds in 6 steps, which is quite sensitive to prompts. The problem is that because it's generated so quickly, it's not sensitive to seed changes, and the output is almost the same with same prompt.
A simple test {1girl tanktop sitting} that the basic structure is already established by step 3; by step 4-6, details are simply being added; if more than 10 steps are taken, the structure will change but not much
https://preview.redd.it/on97cjnxh74h1.jpg?width=2688&format=pjpg&auto=webp&s=7571b3d67fc8a20df31377398564ec1c07187f9b
and well function with loras
https://preview.redd.it/1cx4ntvsk74h1.png?width=896&format=png&auto=webp&s=38079f372fe3bd24b9a329ed61433d2d017615d8
https://preview.redd.it/eysilx7xk74h1.png?width=896&format=png&auto=webp&s=a4fcf6e5bf818af48b5973cca6a2253d5dd195bd
https://preview.redd.it/lq3hcjmyk74h1.png?width=896&format=png&auto=webp&s=05de4a579a1ddb926c105a2013422864896becc3
https://preview.redd.it/39cuv7a0l74h1.png?width=896&format=png&auto=webp&s=40162b010b297237cde0a1e4b43caffbd7ca96cb
https://preview.redd.it/ijmowxw2l74h1.png?width=896&format=png&auto=webp&s=08236bbbe2457340bd4c8e48adb47db4195f3a72
https://preview.redd.it/z0dos8l8l74h1.png?width=896&format=png&auto=webp&s=9cf57e75d2f3c645e3ac14118f6c8fd08a0b0787
https://preview.redd.it/35wo2cf9l74h1.png?width=896&format=png&auto=webp&s=bac27c6796bc3e445ee6f4a850fd6f75e3f5f0e4
https://preview.redd.it/j13567fbl74h1.png?width=1152&format=png&auto=webp&s=a76b23b0f2ab4bc40693ec2cf12d8224b30bd4f9
https://preview.redd.it/z09k3oncl74h1.png?width=1152&format=png&auto=webp&s=221badfb10c3bff294e20a6d6d3a5657a35197db
https://redd.it/1trqncp
@rStableDiffusion
Civitai
UnstableAnimaV1-12step - UnstableAnimaV2Turbo6step | Anima Checkpoint | Civitai
V2 anima-base-1 + some of my loras + official anima-turbo-lora-v0.2 test in forge-neo ER SDE BETA step=4-16 prefer 6 CFG=1-4 prefer 1.25 offset=3-1...
Question about training a lora for character style consistency
https://redd.it/1trrh6j
@rStableDiffusion
https://redd.it/1trrh6j
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: Question about training a lora for character style consistency
Explore this post and more from the StableDiffusion community
Anima Ip Adapter is comming.
It seems someone is working on IP Adapter for Anima. If it is good it could finally make sd 1.5 obsolete.
https://github.com/Wenaka2004/comfyui-anima-ipadapter
https://redd.it/1trw160
@rStableDiffusion
It seems someone is working on IP Adapter for Anima. If it is good it could finally make sd 1.5 obsolete.
https://github.com/Wenaka2004/comfyui-anima-ipadapter
https://redd.it/1trw160
@rStableDiffusion
GitHub
GitHub - Wenaka2004/comfyui-anima-ipadapter: IP-Adapter custom node for Anima in ComfyUI
IP-Adapter custom node for Anima in ComfyUI. Contribute to Wenaka2004/comfyui-anima-ipadapter development by creating an account on GitHub.
Video Colorizing using LTX 2.3 lora
https://www.youtube.com/watch?v=9k8_EOp5tVI
https://redd.it/1tru7lh
@rStableDiffusion
https://www.youtube.com/watch?v=9k8_EOp5tVI
https://redd.it/1tru7lh
@rStableDiffusion
YouTube
Vogue by Madonna (Colorized)
This is the first time I have tried colorizing monochrome video using local AI. The original music video is only available in 480p resolution, so the quality is not ideal.
Original source video: https://www.youtube.com/watch?v=GuJQSAiODqI
All of my AI demos:…
Original source video: https://www.youtube.com/watch?v=GuJQSAiODqI
All of my AI demos:…
GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser.
https://github.com/orion4d/Orion4D_generative_paint
https://redd.it/1try64e
@rStableDiffusion
https://github.com/orion4d/Orion4D_generative_paint
https://redd.it/1try64e
@rStableDiffusion
GitHub
GitHub - orion4d/Orion4D_generative_paint: Generative Paint is a custom node for ComfyUI that adds an advanced painting interface…
Generative Paint is a custom node for ComfyUI that adds an advanced painting interface directly usable from the browser. - orion4d/Orion4D_generative_paint
8-step FLUX.2-dev DMD2 distillation
A new 8-step distillation of FLUX.2-dev from a professional lab. Haven't been able to try it yet as it's in diffusers format, but seems insteresting.
Blog: https://www.baseten.co/blog/faster-image-generation-timestep-distillation-flux2/
Model: https://huggingface.co/baseten/distilled\_8step\_FLUX.2-dev
https://redd.it/1trx9es
@rStableDiffusion
A new 8-step distillation of FLUX.2-dev from a professional lab. Haven't been able to try it yet as it's in diffusers format, but seems insteresting.
Blog: https://www.baseten.co/blog/faster-image-generation-timestep-distillation-flux2/
Model: https://huggingface.co/baseten/distilled\_8step\_FLUX.2-dev
https://redd.it/1trx9es
@rStableDiffusion
Baseten
Timestep distillation: 2.5x faster FLUX.2 image generation
Timestep distillation compresses FLUX.2 denoising steps from 20 to 8, achieving 2.5x faster image generation without noticeable quality loss.
Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060 VRAM 6GB)
https://redd.it/1ts0lc1
@rStableDiffusion
https://redd.it/1ts0lc1
@rStableDiffusion
Reddit
From the sdforall community on Reddit: Testing The New PID With Z image Turbo Model With 512 to 2048 Resolution Model (RTX3060…
Explore this post and more from the sdforall community
Damn... did all of you who use Runpod have very low to 0 availability?
https://redd.it/1trzex3
@rStableDiffusion
https://redd.it/1trzex3
@rStableDiffusion
Presenting Stable Audio Studio: A dedicated app for running Stable Audio models locally
https://redd.it/1trzjgx
@rStableDiffusion
https://redd.it/1trzjgx
@rStableDiffusion
Media is too big
VIEW IN TELEGRAM
NVIDIA PiD Preview Inside a Next-Gen Tiled Upscaler & Enhancer
https://redd.it/1ts3ofu
@rStableDiffusion
https://redd.it/1ts3ofu
@rStableDiffusion
What are the recommended resolutions for Anima? Why are all the CivitAI images vertical?
https://redd.it/1ts6e5t
@rStableDiffusion
https://redd.it/1ts6e5t
@rStableDiffusion
Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.
TL;DR
Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control
What this means for you: Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.
• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature
• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes
• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text
---
---
SKIP IF NOT INTERESTED COLOR SPACE BACKGROUND
sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.
Oklab Technical Deep Dive
CIE Color Spaces & Perceptual Uniformity
---
FULL PROPOSAL
Dear Black Forest Labs, Hugging Face, and the generative AI research community,
State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.
I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.
Trajectory Simplification in Flow Matching:
Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.
Latent Compression & Disentangled Chromatic Subspaces:
Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.
Structured Color Conditioning Pathways:
Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:
• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette
TL;DR
Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control
What this means for you: Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.
• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature
• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes
• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text
---
---
SKIP IF NOT INTERESTED COLOR SPACE BACKGROUND
sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.
Oklab Technical Deep Dive
CIE Color Spaces & Perceptual Uniformity
---
FULL PROPOSAL
Dear Black Forest Labs, Hugging Face, and the generative AI research community,
State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.
I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.
Trajectory Simplification in Flow Matching:
Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.
Latent Compression & Disentangled Chromatic Subspaces:
Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.
Structured Color Conditioning Pathways:
Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:
• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette
bottosson.github.io
A perceptual color space for image processing
From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...
Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.
**TL;DR**
**Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control**
**What this means for you:** Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.
• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature
• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes
• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text
---
---
**[SKIP IF NOT INTERESTED] COLOR SPACE BACKGROUND**
sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.
[Oklab Technical Deep Dive](https://bottosson.github.io/posts/oklab/)
[CIE Color Spaces & Perceptual Uniformity](https://en.wikipedia.org/wiki/CIELAB_color_space)
---
**FULL PROPOSAL**
Dear Black Forest Labs, Hugging Face, and the generative AI research community,
State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.
I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.
**Trajectory Simplification in Flow Matching:**
Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.
**Latent Compression & Disentangled Chromatic Subspaces:**
Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.
**Structured Color Conditioning Pathways:**
Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:
• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette
**TL;DR**
**Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control**
**What this means for you:** Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.
• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature
• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes
• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text
---
---
**[SKIP IF NOT INTERESTED] COLOR SPACE BACKGROUND**
sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.
[Oklab Technical Deep Dive](https://bottosson.github.io/posts/oklab/)
[CIE Color Spaces & Perceptual Uniformity](https://en.wikipedia.org/wiki/CIELAB_color_space)
---
**FULL PROPOSAL**
Dear Black Forest Labs, Hugging Face, and the generative AI research community,
State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.
I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.
**Trajectory Simplification in Flow Matching:**
Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.
**Latent Compression & Disentangled Chromatic Subspaces:**
Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.
**Structured Color Conditioning Pathways:**
Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:
• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette
bottosson.github.io
A perceptual color space for image processing
From personal project to industry standard Introduction added in 2025 When introduced Oklab in 2020, I never expected it to reach as far as...