Prompting Tips Flux.2-Klein

For Klein 9B using the qwen_3_8b, the prompt path is basically:

your prompt;

1-wrapped in Qwen chat template

2 - Qwen2 tokenizer

3- Qwen3 8B text encoder

4- hidden layers [9, 18, 27\] stacked into conditioning

5- Flux2/Klein transformer cross-attends to that

The local wrapper does this template:

<|imstart|>user
YOUR PROMPT<|im
end|>
<|imstart|>assistant
<think>

</think>

So it is not reading your prompt like CLIP tags. It is reading it like an instruction/message.

What It Accepts Well:

**It should respond best to natural language with clear relationships:**

A woman sitting on a beachfront, looking at the camera, wearing a black dress. The camera is at eye level. Her body is seated facing slightly left. The beach and ocean are behind her.

**Strong prompt concepts:**

\- subject type: woman, man, dog, car

\- action/pose: sitting, standing, walking, looking at camera

\- location: on a beach, inside a kitchen

\- spatial relations: behind her, to her left, in the foreground

\- clothing/object attribution: she is wearing, holding, beside

\- camera/framing: close-up, full body, eye-level, three-quarter view

\- style if phrased plainly: photo, natural lighting, soft shadows

**What It Throws Away Or Weakens**

The big one: Comfy prompt weighting is disabled for this TE.

**So this does not mean much:**

((face:1.4)), [body:0.6], (((identity)))

The tokenizer still sees punctuation/text, but the encoder wrapper passes disable\
weights=True, so classic CLIP-style

emphasis is not applied as weights.

Also weak:

\- giant comma tag soups

\- repeated words as fake emphasis

\- abstract junk like masterpiece, best quality, ultra detailed

\- contradictions: sitting, standing, walking

\- vague modifiers not attached to a noun: beautiful, perfect, cinematic

\- negative prompt logic, unless the sampler/model path explicitly uses it well

\- overly long prompts where important instructions are buried

What Matters Most

Because this is Qwen-style chat encoding, write prompt chunks as sentences with ownership:

Bad:

beach, woman, camera, sitting, black dress, looking, ocean, realistic

Better:

A realistic photo of a woman sitting on a beach. She is looking at the camera. She is wearing a black dress. The ocean is behind her.

For identity/reference workflows "Identity feature transfer", avoid asking the TE to redefine the subject too much. Let the node carry identity, and let prompt carry scene/action:

Keep the same woman. Change only the location: she is sitting on a beachfront, looking at the camera. Natural daylight photo.

Best Prompt Shape For Your Use:

Use this structure:

[identity constraint\].

[scene/location change\].

[pose/action\].

[clothing/body constraint\].

[camera/framing\].

[lighting/style\].

Example:

Keep the same woman from the reference image.
Move her to a sunny beachfront.
She is sitting and looking directly at the camera.
Preserve her face, body proportions, hairstyle, and clothing shape.
Eye-level photo, natural daylight, realistic beach background.

The TE will not literally “obey” every clause, but this format gives Qwen the best chance to encode relationships instead of treating the prompt as a bag of tags.

https://redd.it/1tflqso
@rStableDiffusion
Dream Wan + LTX combination

Given Wan2.2 is much better at learning movement and physics, but LTX is better with audio and lipsync, the dream would be to define the desired motion with a generated Wan clip, and let LTX continue it.

There exists workflows such as RuneXX to try and achieve this, but I've not managed to make LTX replicate and continue Wan's movements, only go off on its own tangent.

Has anyone achieved this? I know Sulphur is impressive, but it's still a long way behind some of the Wan checkpoints especially in terms of physics and prompt adherence.

https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main/Video-2-Video/Extend-Any-Video

https://redd.it/1tfktgi
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
My local workflow for turning SDXL character generations into game-ready 3D assets

https://redd.it/1tfnlr8
@rStableDiffusion
Check out a free prompt writing site I made
https://redd.it/1tfrykt
@rStableDiffusion
Running Modern AI Image Models on a GTX 1060 6GB — A Practical Guide

Tested & verified on NVIDIA GTX 1060 6GB (Pascal Architecture) · ComfyUI · May 2026
Written to counter the widespread misinformation that "only SD 1.5 runs on 6GB VRAM"

As i started with Image work, my inital Goal was to Translate Japanese Text into English on VN Game CGs. I'm personaly really bad with doing IMAGE work, thats why i thought, lets try a AI for that. As i started, i Asked Claude Sonnet, whats possible with my low Hardware and what not. The answer was a crushing one. Only SD1.5 would run on my System. But as most of you know, SD 1.5 is really limeted compared to Pony, SDXL or Illustious Models. Out of curiiousity i started to test out differend Models, to see whats possible and what not. To my and even Sonnets supprise, thats way more, that i ever thought would be.
I share this here for PPL like me, who only habe low End Hardware like GTX1060 to show you guys whats really possible with that, why it is possible and where are the Limits of ur card lies.

Lets start the Guide 😄

# 🖥️ Platform Compatibility — Read This First

**This guide is written exclusively for Windows + NVIDIA GPU users.**

Before diving in, understand why platform matters enormously for low-VRAM setups:

|Platform|NVIDIA|AMD|
|:-|:-|:-|
|**Windows**| This guide — fully tested|⚠️ ROCm support from ComfyUI Desktop v0.7.0, unstable, many plugins CUDA-only|
|**Linux + NVIDIA**| No Shared Video Memory in NVIDIA Linux driver → hard OOM crashes|⚠️ ROCm available, GTT memory (\~50% RAM) as VRAM extension, but stability issues|
|**macOS**| Not covered — 8GB Unified Memory Macs perform worse than GTX 1060 6GB due to OS sharing the same pool. Higher-end Macs work but are not the target audience of this guide.||

**Why Windows NVIDIA works but Linux NVIDIA doesn't:** Windows uses WDDM (Windows Display Driver Model) which automatically provides **Shared Video Memory** — system RAM that acts as a seamless extension of VRAM when it fills up. This is visible in Task Manager as "Shared GPU Memory" and is the foundation that makes everything in this guide possible.

The NVIDIA Linux driver does not implement this feature. When VRAM fills up on Linux with NVIDIA, the result is a hard CUDA Out of Memory error — no graceful fallback, no RAM extension.

**The Linux irony:** Linux is actually far more RAM-efficient than Windows — OS overhead is significantly lower, leaving more RAM available for models. If NVIDIA had implemented Shared Video Memory in their Linux driver, Linux would likely be the *better* platform for low-VRAM AI setups. Unfortunately, that feature simply does not exist there.

**For AMD on Linux:** GTT memory (up to 50% of system RAM) provides similar functionality to Windows Shared Memory, and ComfyUI runs via ROCm — but there are significant drawbacks:

* **GTT limit:** Maximum 50% of system RAM — hardcoded by the Linux kernel TTM memory manager. With 32GB RAM, only 16GB GTT available as VRAM extension
* **Stability issues:** HIP memory errors, slow first generation, VAE decoding failures are commonly reported
* **Plugin compatibility:** Many ComfyUI custom nodes are CUDA-only and untested on ROCm
* **Driver maturity:** ROCm is improving rapidly but still less mature than NVIDIA CUDA on Windows
* **Gaming origin:** AMD's GTT Shared Memory on Linux exists primarily because AMD has actively supported Linux gaming — a use case where VRAM overflow is equally relevant. NVIDIA has not yet implemented an equivalent for their Linux driver, giving AMD a practical advantage for low-VRAM AI workloads on Linux.

Not covered in this guide — mentioned for completeness only.

# ⚠️ The Myth vs. Reality

You will find countless posts online and even AI assistants confidently telling you:

>*"SDXL needs at least 8GB VRAM"*
*"Illustrious XL is impossible on 6GB"*
*"Z-Image Turbo requires 11-12GB"*

**Most of this is wrong — when you use ComfyUI.**

One thing is true: **batch generation is not practical on 6GB VRAM** — sequential single image generation is dramatically faster.
Everything else in that list is a myth.

This guide documents what actually runs on a GTX 1060 6GB, tested hands-on with real benchmarks. No theory, no assumptions — just results.

# 🔑 The Key: ComfyUIe

The single most important decision is your **backend**. ComfyUI's Dynamic VRAM Management changes everything.

|Backend|SDXL/Illustrious|Z-Image Turbo (12GB FP16)|Batch Generation|
|:-|:-|:-|:-|
|**ComfyUI**| Works| Works|⚠️ Sequential only|
|**Forge / A1111**|Not Tested|Not Tested|Not Tested|

ComfyUI streams model components dynamically — loading only what's needed into VRAM at any given moment, offloading the rest to RAM. Forge loads everything at once and crashes.

>⚠️ **Windows Only Caveat:** The dynamic VRAM management described in this guide relies heavily on **Windows Shared Video Memory (WDDM)**. Windows automatically makes system RAM available as an extension of VRAM when needed. This is visible in Task Manager as "GPU Memory" (dedicated + shared). Linux and macOS may not provide the same Shared Video Memory behavior — results on those systems may differ significantly and the setups described here are **not guaranteed to work outside of Windows**.

# Critical Installation Note for Pascal (GTX 10xx)

Download specifically: `ComfyUI_windows_portable_nvidia_cu126.7z`

* NOT `nvidia.7z` (CUDA 13.0 — no Pascal support)
* NOT `nvidia_cu121` (too old)
* cu126 = Python 3.10, explicitly supports Nvidia 10 Series
* ComfyUI will auto-update to CUDA 12.8 after initial installation — this works fine on Pascal

# What Actually Runs — Tested Results

|Model Type|Example|VRAM Usage|Generation Time|Status|
|:-|:-|:-|:-|:-|
|SD 1.5|Any SD 1.5 checkpoint|\~4GB|\~30s| Native|
|SDXL 1.0|Base SDXL|\~5.7GB peak|\~2-3 min| Works|
|Illustrious XL|Mistoon Illustrious|\~4.9GB peak|\~2 min (24 steps, DPM++)| Works|
|Z-Image Turbo FP16|zlImageTurboAnime (12GB model!)|\~11.7GB staged, \~5.7GB active|\~3-4 min| Works|
|Z-Image Turbo FP8|Same model, fp8\_e4m3fn\_fast|\~5.8GB staged|\~3 min| Works, slightly faster|
|Flux.1 DEV / KREA|Quantized Q4-Q8 versions only|Varies|Slow|⚠️ Runs but quality suffers significantly — not recommended|
|Flux.1 FP16|Base model|12GB+|N/A|⚠️ Runs but really slow|
|Flux.2 DEV|Any version|60GB+ base|N/A| Cannot run — base model alone is 60GB|
|Flux.2 Klein 4B|Full or quantized|Manageable|Moderate|⚠️ Runs stably, decent quality — but tiny community, very limited model selection|
|Flux.2 Klein 9B|Quantized / interlaced|\~20GB or quantized|Slow|⚠️ Runs but slow or quality loss — interlaced version more practical but still limited|

# 🧠 Why Illustrious XL Works — The Simple Explanation

People assume SDXL/Illustrious needs 6.5-7GB because that's the file size. But a model consists of separate components:

|Component|Size|Runs on|
|:-|:-|:-|
|**UNet**|\~4.5 GB|**VRAM** (fits!)|
|VAE|\~300 MB|VRAM (on demand)|
|CLIP-L|\~250 MB|CPU/RAM|
|OpenCLIP-G|\~1.8 GB|CPU/RAM|

The UNet — the part that does the actual image generation — fits comfortably in 6GB. The text encoders run on CPU. ComfyUI dynamically loads the VAE only when needed for final decode, then unloads it again.

**Result:** Illustrious XL runs natively and comfortably on a GTX 1060 6GB.

# 🌊 Why Z-Image Turbo Works Well But Flux Doesn't

Both Z-Image Turbo (FP16) and Flux.1 are \~12GB models. So why does one work well and the other only in degraded form?

**Architecture difference:**

* **Z-Image Turbo** uses a **Single-Stream architecture** — text and image processing share one unified attention stream. ComfyUI can stream this layer-by-layer through 6GB because the dependencies between blocks are linear and manageable.
* **Flux** uses a **Dual-Stream architecture** — text and image run in parallel streams that must synchronize at specific points. ComfyUI must hold both streams in memory simultaneously at sync points, making the FP16 base model impossible to run within 6GB.

**The full Flux picture on 6GB VRAM:**

|Model|Verdict|Notes|
|:-|:-|:-|
|**Flux.1 DEV / KREA FP16**| Cannot run|Full model too large|
|**Flux.1 DEV / KREA Q4-Q8**|⚠️ Runs,