What would you run on an RTX Pro 6000 Blackwell?

Lots of people ask about what to run on small GPUs, but nobody asks about big GPUs. What would you do with 96GiB of VRAM?

I play with Z-Image and LTX (and derivatives like Sulphur), and I use Qwen for image editing. I still dabble a bit with older SD1.5 and SDXL models because there are so many useful loras, and they run fast so it's easy to generate a huge batch and then cherry-pick the best results.

Pic is the system with the Blackwell card and the old Ada card. Color-cycling RGB because my inner child is still alive and loves this BS. I'll do minimalism when I die.

https://preview.redd.it/lr0ji4tz964h1.jpg?width=4032&format=pjpg&auto=webp&s=8fc9962eaba7077e08425d0ab52b416a5fb8ee7a



https://redd.it/1trlcpu
@rStableDiffusion
Anima-6steps modle

https://civitai.com/models/2637029/unstableanimav1-12step?modelVersionId=2988995

anima-base-1 + some of my loras + official anima-turbo-lora-v0.2

test in forge-neo ER SDE BETA

step=4-16 prefer 6

CFG=1-4 prefer 1.25

offset=3-12 prefer 8

The advantage of this version which is using the official Turbo Lora is that it's very fast; a 5070TI can be generated in less than 5 seconds in 6 steps, which is quite sensitive to prompts. The problem is that because it's generated so quickly, it's not sensitive to seed changes, and the output is almost the same with same prompt.

A simple test {1girl tanktop sitting} that the basic structure is already established by step 3; by step 4-6, details are simply being added; if more than 10 steps are taken, the structure will change but not much

https://preview.redd.it/on97cjnxh74h1.jpg?width=2688&format=pjpg&auto=webp&s=7571b3d67fc8a20df31377398564ec1c07187f9b

and well function with loras

https://preview.redd.it/1cx4ntvsk74h1.png?width=896&format=png&auto=webp&s=38079f372fe3bd24b9a329ed61433d2d017615d8

https://preview.redd.it/eysilx7xk74h1.png?width=896&format=png&auto=webp&s=a4fcf6e5bf818af48b5973cca6a2253d5dd195bd

https://preview.redd.it/lq3hcjmyk74h1.png?width=896&format=png&auto=webp&s=05de4a579a1ddb926c105a2013422864896becc3

https://preview.redd.it/39cuv7a0l74h1.png?width=896&format=png&auto=webp&s=40162b010b297237cde0a1e4b43caffbd7ca96cb

https://preview.redd.it/ijmowxw2l74h1.png?width=896&format=png&auto=webp&s=08236bbbe2457340bd4c8e48adb47db4195f3a72

https://preview.redd.it/z0dos8l8l74h1.png?width=896&format=png&auto=webp&s=9cf57e75d2f3c645e3ac14118f6c8fd08a0b0787

https://preview.redd.it/35wo2cf9l74h1.png?width=896&format=png&auto=webp&s=bac27c6796bc3e445ee6f4a850fd6f75e3f5f0e4

https://preview.redd.it/j13567fbl74h1.png?width=1152&format=png&auto=webp&s=a76b23b0f2ab4bc40693ec2cf12d8224b30bd4f9

https://preview.redd.it/z09k3oncl74h1.png?width=1152&format=png&auto=webp&s=221badfb10c3bff294e20a6d6d3a5657a35197db




https://redd.it/1trqncp
@rStableDiffusion
Damn... did all of you who use Runpod have very low to 0 availability?
https://redd.it/1trzex3
@rStableDiffusion
Presenting Stable Audio Studio: A dedicated app for running Stable Audio models locally
https://redd.it/1trzjgx
@rStableDiffusion
Media is too big
VIEW IN TELEGRAM
NVIDIA PiD Preview Inside a Next-Gen Tiled Upscaler & Enhancer

https://redd.it/1ts3ofu
@rStableDiffusion
What are the recommended resolutions for Anima? Why are all the CivitAI images vertical?
https://redd.it/1ts6e5t
@rStableDiffusion
Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

TL;DR

Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control

What this means for you: Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

SKIP IF NOT INTERESTED COLOR SPACE BACKGROUND

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

Oklab Technical Deep Dive

CIE Color Spaces & Perceptual Uniformity

---

FULL PROPOSAL

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

Trajectory Simplification in Flow Matching:

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

Latent Compression & Disentangled Chromatic Subspaces:

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

Structured Color Conditioning Pathways:

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette
Atttn: Black Forest Labs and other researchers: Perceptual (OKLab) color space models.

**TL;DR**

**Proposal: Training Flow Models in Perceptually Uniform Color Spaces to Simplify Latent Manifolds & Enable Disentangled Chromatic Control**

**What this means for you:** Faster generation (fewer steps needed for clean, stable color), instant palette steering that actually locks to your prompt from step 1, and an end to hue drift / "neon mud" when you push CFG or saturation sliders. For researchers: a mathematically cleaner latent manifold, straighter ODE trajectories, and a testable path toward orthogonal lightness/chroma control without architectural overhaul.

• Flow Matching geometry + Oklab uniformity → reduced trajectory curvature

• β-VAE disentanglement + ΔE(Oklab) loss → orthogonal lightness/chroma axes

• PaletteDiffusion/ColorCond precedents + harmonic rule embeddings → structured conditioning over text

---
---

**[SKIP IF NOT INTERESTED] COLOR SPACE BACKGROUND**

sRGB was engineered for 1990s CRT phosphor limits, not human perception or machine learning. It heavily entangles luminance and chrominance, meaning linear interpolation in sRGB crosses perceptually "dead" zones, forcing models to waste capacity learning correction curves. Perceptually uniform spaces like CIELAB and Oklab were explicitly designed so that Euclidean distance ≈ perceived color difference. Oklab (2020) fixes legacy issues with lightness scaling and hue linearity, making it ideal for gradient-based optimization.

[Oklab Technical Deep Dive](https://bottosson.github.io/posts/oklab/)

[CIE Color Spaces & Perceptual Uniformity](https://en.wikipedia.org/wiki/CIELAB_color_space)

---

**FULL PROPOSAL**

Dear Black Forest Labs, Hugging Face, and the generative AI research community,

State-of-the-art image generators are currently trained and conditioned on sRGB, a display-referred standard optimized for CRT phosphor response, not for perceptual consistency or machine learning efficiency. While sRGB remains necessary for output rendering, its perceptual non-uniformity introduces unnecessary curvature into the data manifold, forcing models to learn compensatory trajectories rather than intrinsic color structure.

I propose a focused research initiative: fine-tuning a VAE and subsequent Rectified Flow/Flow Matching pipeline using Oklab (or its polar counterpart, Oklch) as the internal color representation, paired with structured harmonic conditioning.

**Trajectory Simplification in Flow Matching:**

Rectified flow models approximate optimal transport by learning straight-line velocity fields from noise to data. In sRGB, linear interpolation between saturated hues traverses perceptually desaturated regions, forcing the vector field to learn non-linear corrections to maintain chromatic integrity. Oklab is constructed so that Euclidean distance correlates with perceptual difference (ΔE). Training in Oklab aligns the mathematical trajectories of flow matching with human perceptual geometry, reducing trajectory curvature, lowering effective manifold complexity, and potentially improving convergence and step efficiency.

**Latent Compression & Disentangled Chromatic Subspaces:**

Current VAEs compress sRGB images using MSE or LPIPS, neither of which guarantees perceptual uniformity in the latent space. By training a VAE with a differentiable ΔE(Oklab) perceptual loss and optional orthogonal regularization, we can encourage separation of lightness (L) and chromaticity (a,b) within the latent subspace. This mitigates the "color bleed" and hue drift commonly observed under high CFG or during latent interpolation, as perturbations along lightness axes no longer inadvertently modulate chromatic dimensions.

**Structured Color Conditioning Pathways:**

Teaching harmonic relationships to the model doesn't require manual dataset retagging. Multiple scalable pathways exist:

• Automated Lexical Tagging: Cluster dominant colors in Oklab space, map to standardized color names, and attach LLM-derived mood/setting descriptors. This converts implicit palette