Running Modern AI Image Models on a GTX 1060 6GB β A Practical Guide
Tested & verified on NVIDIA GTX 1060 6GB (Pascal Architecture) Β· ComfyUI Β· May 2026
Written to counter the widespread misinformation that "only SD 1.5 runs on 6GB VRAM"
As i started with Image work, my inital Goal was to Translate Japanese Text into English on VN Game CGs. I'm personaly really bad with doing IMAGE work, thats why i thought, lets try a AI for that. As i started, i Asked Claude Sonnet, whats possible with my low Hardware and what not. The answer was a crushing one. Only SD1.5 would run on my System. But as most of you know, SD 1.5 is really limeted compared to Pony, SDXL or Illustious Models. Out of curiiousity i started to test out differend Models, to see whats possible and what not. To my and even Sonnets supprise, thats way more, that i ever thought would be.
I share this here for PPL like me, who only habe low End Hardware like GTX1060 to show you guys whats really possible with that, why it is possible and where are the Limits of ur card lies.
Lets start the Guide π
# π₯οΈ Platform Compatibility β Read This First
**This guide is written exclusively for Windows + NVIDIA GPU users.**
Before diving in, understand why platform matters enormously for low-VRAM setups:
|Platform|NVIDIA|AMD|
|:-|:-|:-|
|**Windows**|β This guide β fully tested|β οΈ ROCm support from ComfyUI Desktop v0.7.0, unstable, many plugins CUDA-only|
|**Linux + NVIDIA**|β No Shared Video Memory in NVIDIA Linux driver β hard OOM crashes|β οΈ ROCm available, GTT memory (\~50% RAM) as VRAM extension, but stability issues|
|**macOS**|β Not covered β 8GB Unified Memory Macs perform worse than GTX 1060 6GB due to OS sharing the same pool. Higher-end Macs work but are not the target audience of this guide.|β|
**Why Windows NVIDIA works but Linux NVIDIA doesn't:** Windows uses WDDM (Windows Display Driver Model) which automatically provides **Shared Video Memory** β system RAM that acts as a seamless extension of VRAM when it fills up. This is visible in Task Manager as "Shared GPU Memory" and is the foundation that makes everything in this guide possible.
The NVIDIA Linux driver does not implement this feature. When VRAM fills up on Linux with NVIDIA, the result is a hard CUDA Out of Memory error β no graceful fallback, no RAM extension.
**The Linux irony:** Linux is actually far more RAM-efficient than Windows β OS overhead is significantly lower, leaving more RAM available for models. If NVIDIA had implemented Shared Video Memory in their Linux driver, Linux would likely be the *better* platform for low-VRAM AI setups. Unfortunately, that feature simply does not exist there.
**For AMD on Linux:** GTT memory (up to 50% of system RAM) provides similar functionality to Windows Shared Memory, and ComfyUI runs via ROCm β but there are significant drawbacks:
* **GTT limit:** Maximum 50% of system RAM β hardcoded by the Linux kernel TTM memory manager. With 32GB RAM, only 16GB GTT available as VRAM extension
* **Stability issues:** HIP memory errors, slow first generation, VAE decoding failures are commonly reported
* **Plugin compatibility:** Many ComfyUI custom nodes are CUDA-only and untested on ROCm
* **Driver maturity:** ROCm is improving rapidly but still less mature than NVIDIA CUDA on Windows
* **Gaming origin:** AMD's GTT Shared Memory on Linux exists primarily because AMD has actively supported Linux gaming β a use case where VRAM overflow is equally relevant. NVIDIA has not yet implemented an equivalent for their Linux driver, giving AMD a practical advantage for low-VRAM AI workloads on Linux.
Not covered in this guide β mentioned for completeness only.
# β οΈ The Myth vs. Reality
You will find countless posts online and even AI assistants confidently telling you:
>*"SDXL needs at least 8GB VRAM"*
*"Illustrious XL is impossible on 6GB"*
*"Z-Image Turbo requires 11-12GB"*
**Most of this is wrong β when you use ComfyUI.**
One thing is true: **batch generation is not practical on 6GB VRAM** β sequential single image generation is dramatically faster.
Tested & verified on NVIDIA GTX 1060 6GB (Pascal Architecture) Β· ComfyUI Β· May 2026
Written to counter the widespread misinformation that "only SD 1.5 runs on 6GB VRAM"
As i started with Image work, my inital Goal was to Translate Japanese Text into English on VN Game CGs. I'm personaly really bad with doing IMAGE work, thats why i thought, lets try a AI for that. As i started, i Asked Claude Sonnet, whats possible with my low Hardware and what not. The answer was a crushing one. Only SD1.5 would run on my System. But as most of you know, SD 1.5 is really limeted compared to Pony, SDXL or Illustious Models. Out of curiiousity i started to test out differend Models, to see whats possible and what not. To my and even Sonnets supprise, thats way more, that i ever thought would be.
I share this here for PPL like me, who only habe low End Hardware like GTX1060 to show you guys whats really possible with that, why it is possible and where are the Limits of ur card lies.
Lets start the Guide π
# π₯οΈ Platform Compatibility β Read This First
**This guide is written exclusively for Windows + NVIDIA GPU users.**
Before diving in, understand why platform matters enormously for low-VRAM setups:
|Platform|NVIDIA|AMD|
|:-|:-|:-|
|**Windows**|β This guide β fully tested|β οΈ ROCm support from ComfyUI Desktop v0.7.0, unstable, many plugins CUDA-only|
|**Linux + NVIDIA**|β No Shared Video Memory in NVIDIA Linux driver β hard OOM crashes|β οΈ ROCm available, GTT memory (\~50% RAM) as VRAM extension, but stability issues|
|**macOS**|β Not covered β 8GB Unified Memory Macs perform worse than GTX 1060 6GB due to OS sharing the same pool. Higher-end Macs work but are not the target audience of this guide.|β|
**Why Windows NVIDIA works but Linux NVIDIA doesn't:** Windows uses WDDM (Windows Display Driver Model) which automatically provides **Shared Video Memory** β system RAM that acts as a seamless extension of VRAM when it fills up. This is visible in Task Manager as "Shared GPU Memory" and is the foundation that makes everything in this guide possible.
The NVIDIA Linux driver does not implement this feature. When VRAM fills up on Linux with NVIDIA, the result is a hard CUDA Out of Memory error β no graceful fallback, no RAM extension.
**The Linux irony:** Linux is actually far more RAM-efficient than Windows β OS overhead is significantly lower, leaving more RAM available for models. If NVIDIA had implemented Shared Video Memory in their Linux driver, Linux would likely be the *better* platform for low-VRAM AI setups. Unfortunately, that feature simply does not exist there.
**For AMD on Linux:** GTT memory (up to 50% of system RAM) provides similar functionality to Windows Shared Memory, and ComfyUI runs via ROCm β but there are significant drawbacks:
* **GTT limit:** Maximum 50% of system RAM β hardcoded by the Linux kernel TTM memory manager. With 32GB RAM, only 16GB GTT available as VRAM extension
* **Stability issues:** HIP memory errors, slow first generation, VAE decoding failures are commonly reported
* **Plugin compatibility:** Many ComfyUI custom nodes are CUDA-only and untested on ROCm
* **Driver maturity:** ROCm is improving rapidly but still less mature than NVIDIA CUDA on Windows
* **Gaming origin:** AMD's GTT Shared Memory on Linux exists primarily because AMD has actively supported Linux gaming β a use case where VRAM overflow is equally relevant. NVIDIA has not yet implemented an equivalent for their Linux driver, giving AMD a practical advantage for low-VRAM AI workloads on Linux.
Not covered in this guide β mentioned for completeness only.
# β οΈ The Myth vs. Reality
You will find countless posts online and even AI assistants confidently telling you:
>*"SDXL needs at least 8GB VRAM"*
*"Illustrious XL is impossible on 6GB"*
*"Z-Image Turbo requires 11-12GB"*
**Most of this is wrong β when you use ComfyUI.**
One thing is true: **batch generation is not practical on 6GB VRAM** β sequential single image generation is dramatically faster.
Everything else in that list is a myth.
This guide documents what actually runs on a GTX 1060 6GB, tested hands-on with real benchmarks. No theory, no assumptions β just results.
# π The Key: ComfyUIe
The single most important decision is your **backend**. ComfyUI's Dynamic VRAM Management changes everything.
|Backend|SDXL/Illustrious|Z-Image Turbo (12GB FP16)|Batch Generation|
|:-|:-|:-|:-|
|**ComfyUI**|β Works|β Works|β οΈ Sequential only|
|**Forge / A1111**|Not Tested|Not Tested|Not Tested|
ComfyUI streams model components dynamically β loading only what's needed into VRAM at any given moment, offloading the rest to RAM. Forge loads everything at once and crashes.
>β οΈ **Windows Only Caveat:** The dynamic VRAM management described in this guide relies heavily on **Windows Shared Video Memory (WDDM)**. Windows automatically makes system RAM available as an extension of VRAM when needed. This is visible in Task Manager as "GPU Memory" (dedicated + shared). Linux and macOS may not provide the same Shared Video Memory behavior β results on those systems may differ significantly and the setups described here are **not guaranteed to work outside of Windows**.
# Critical Installation Note for Pascal (GTX 10xx)
Download specifically: `ComfyUI_windows_portable_nvidia_cu126.7z`
* β NOT `nvidia.7z` (CUDA 13.0 β no Pascal support)
* β NOT `nvidia_cu121` (too old)
* β cu126 = Python 3.10, explicitly supports Nvidia 10 Series
* β ComfyUI will auto-update to CUDA 12.8 after initial installation β this works fine on Pascal
# β What Actually Runs β Tested Results
|Model Type|Example|VRAM Usage|Generation Time|Status|
|:-|:-|:-|:-|:-|
|SD 1.5|Any SD 1.5 checkpoint|\~4GB|\~30s|β Native|
|SDXL 1.0|Base SDXL|\~5.7GB peak|\~2-3 min|β Works|
|Illustrious XL|Mistoon Illustrious|\~4.9GB peak|\~2 min (24 steps, DPM++)|β Works|
|Z-Image Turbo FP16|zlImageTurboAnime (12GB model!)|\~11.7GB staged, \~5.7GB active|\~3-4 min|β Works|
|Z-Image Turbo FP8|Same model, fp8\_e4m3fn\_fast|\~5.8GB staged|\~3 min|β Works, slightly faster|
|Flux.1 DEV / KREA|Quantized Q4-Q8 versions only|Varies|Slow|β οΈ Runs but quality suffers significantly β not recommended|
|Flux.1 FP16|Base model|12GB+|N/A|β οΈ Runs but really slow|
|Flux.2 DEV|Any version|60GB+ base|N/A|β Cannot run β base model alone is 60GB|
|Flux.2 Klein 4B|Full or quantized|Manageable|Moderate|β οΈ Runs stably, decent quality β but tiny community, very limited model selection|
|Flux.2 Klein 9B|Quantized / interlaced|\~20GB or quantized|Slow|β οΈ Runs but slow or quality loss β interlaced version more practical but still limited|
# π§ Why Illustrious XL Works β The Simple Explanation
People assume SDXL/Illustrious needs 6.5-7GB because that's the file size. But a model consists of separate components:
|Component|Size|Runs on|
|:-|:-|:-|
|**UNet**|\~4.5 GB|**VRAM** (fits!)|
|VAE|\~300 MB|VRAM (on demand)|
|CLIP-L|\~250 MB|CPU/RAM|
|OpenCLIP-G|\~1.8 GB|CPU/RAM|
The UNet β the part that does the actual image generation β fits comfortably in 6GB. The text encoders run on CPU. ComfyUI dynamically loads the VAE only when needed for final decode, then unloads it again.
**Result:** Illustrious XL runs natively and comfortably on a GTX 1060 6GB.
# π Why Z-Image Turbo Works Well But Flux Doesn't
Both Z-Image Turbo (FP16) and Flux.1 are \~12GB models. So why does one work well and the other only in degraded form?
**Architecture difference:**
* **Z-Image Turbo** uses a **Single-Stream architecture** β text and image processing share one unified attention stream. ComfyUI can stream this layer-by-layer through 6GB because the dependencies between blocks are linear and manageable.
* **Flux** uses a **Dual-Stream architecture** β text and image run in parallel streams that must synchronize at specific points. ComfyUI must hold both streams in memory simultaneously at sync points, making the FP16 base model impossible to run within 6GB.
**The full Flux picture on 6GB VRAM:**
|Model|Verdict|Notes|
|:-|:-|:-|
|**Flux.1 DEV / KREA FP16**|β Cannot run|Full model too large|
|**Flux.1 DEV / KREA Q4-Q8**|β οΈ Runs,
This guide documents what actually runs on a GTX 1060 6GB, tested hands-on with real benchmarks. No theory, no assumptions β just results.
# π The Key: ComfyUIe
The single most important decision is your **backend**. ComfyUI's Dynamic VRAM Management changes everything.
|Backend|SDXL/Illustrious|Z-Image Turbo (12GB FP16)|Batch Generation|
|:-|:-|:-|:-|
|**ComfyUI**|β Works|β Works|β οΈ Sequential only|
|**Forge / A1111**|Not Tested|Not Tested|Not Tested|
ComfyUI streams model components dynamically β loading only what's needed into VRAM at any given moment, offloading the rest to RAM. Forge loads everything at once and crashes.
>β οΈ **Windows Only Caveat:** The dynamic VRAM management described in this guide relies heavily on **Windows Shared Video Memory (WDDM)**. Windows automatically makes system RAM available as an extension of VRAM when needed. This is visible in Task Manager as "GPU Memory" (dedicated + shared). Linux and macOS may not provide the same Shared Video Memory behavior β results on those systems may differ significantly and the setups described here are **not guaranteed to work outside of Windows**.
# Critical Installation Note for Pascal (GTX 10xx)
Download specifically: `ComfyUI_windows_portable_nvidia_cu126.7z`
* β NOT `nvidia.7z` (CUDA 13.0 β no Pascal support)
* β NOT `nvidia_cu121` (too old)
* β cu126 = Python 3.10, explicitly supports Nvidia 10 Series
* β ComfyUI will auto-update to CUDA 12.8 after initial installation β this works fine on Pascal
# β What Actually Runs β Tested Results
|Model Type|Example|VRAM Usage|Generation Time|Status|
|:-|:-|:-|:-|:-|
|SD 1.5|Any SD 1.5 checkpoint|\~4GB|\~30s|β Native|
|SDXL 1.0|Base SDXL|\~5.7GB peak|\~2-3 min|β Works|
|Illustrious XL|Mistoon Illustrious|\~4.9GB peak|\~2 min (24 steps, DPM++)|β Works|
|Z-Image Turbo FP16|zlImageTurboAnime (12GB model!)|\~11.7GB staged, \~5.7GB active|\~3-4 min|β Works|
|Z-Image Turbo FP8|Same model, fp8\_e4m3fn\_fast|\~5.8GB staged|\~3 min|β Works, slightly faster|
|Flux.1 DEV / KREA|Quantized Q4-Q8 versions only|Varies|Slow|β οΈ Runs but quality suffers significantly β not recommended|
|Flux.1 FP16|Base model|12GB+|N/A|β οΈ Runs but really slow|
|Flux.2 DEV|Any version|60GB+ base|N/A|β Cannot run β base model alone is 60GB|
|Flux.2 Klein 4B|Full or quantized|Manageable|Moderate|β οΈ Runs stably, decent quality β but tiny community, very limited model selection|
|Flux.2 Klein 9B|Quantized / interlaced|\~20GB or quantized|Slow|β οΈ Runs but slow or quality loss β interlaced version more practical but still limited|
# π§ Why Illustrious XL Works β The Simple Explanation
People assume SDXL/Illustrious needs 6.5-7GB because that's the file size. But a model consists of separate components:
|Component|Size|Runs on|
|:-|:-|:-|
|**UNet**|\~4.5 GB|**VRAM** (fits!)|
|VAE|\~300 MB|VRAM (on demand)|
|CLIP-L|\~250 MB|CPU/RAM|
|OpenCLIP-G|\~1.8 GB|CPU/RAM|
The UNet β the part that does the actual image generation β fits comfortably in 6GB. The text encoders run on CPU. ComfyUI dynamically loads the VAE only when needed for final decode, then unloads it again.
**Result:** Illustrious XL runs natively and comfortably on a GTX 1060 6GB.
# π Why Z-Image Turbo Works Well But Flux Doesn't
Both Z-Image Turbo (FP16) and Flux.1 are \~12GB models. So why does one work well and the other only in degraded form?
**Architecture difference:**
* **Z-Image Turbo** uses a **Single-Stream architecture** β text and image processing share one unified attention stream. ComfyUI can stream this layer-by-layer through 6GB because the dependencies between blocks are linear and manageable.
* **Flux** uses a **Dual-Stream architecture** β text and image run in parallel streams that must synchronize at specific points. ComfyUI must hold both streams in memory simultaneously at sync points, making the FP16 base model impossible to run within 6GB.
**The full Flux picture on 6GB VRAM:**
|Model|Verdict|Notes|
|:-|:-|:-|
|**Flux.1 DEV / KREA FP16**|β Cannot run|Full model too large|
|**Flux.1 DEV / KREA Q4-Q8**|β οΈ Runs,
not recommended|Quality suffers significantly from heavy quantization|
|**Flux.2 DEV**|β Cannot run|Base FP16 model is \~60GB β no quantization makes this practical|
|**Flux.2 Klein 4B**|β οΈ Runs stably|Decent quality, but tiny community and very limited model selection|
|**Flux.2 Klein 9B**|β οΈ Runs with caveats|\~20GB native β needs quantization or interlaced mode, both reduce quality|
**Bottom line on Flux:** It can technically run in quantized form, but the quality trade-off is significant enough that it is not worth pursuing on 6GB VRAM. Z-Image Turbo delivers superior results on this hardware.
# π§ RAM Planning for Z-Image Turbo β A Hidden Pitfall
Z-Image Turbo has a RAM requirement that is easy to underestimate. Unlike Illustrious where text encoders are small, Z-Image Turbo uses **Qwen 3 4B as its text encoder β and it stays permanently in RAM**.
**Full RAM breakdown for Z-Image Turbo:**
|Component|RAM Usage|Notes|
|:-|:-|:-|
|**Qwen 3 4B Text Encoder (FP16)**|\~7.5 GB|Permanent β never unloaded|
|**Z-Image Turbo model**|\~12 GB|Staged dynamically|
|**ComfyUI + latents + overhead**|\~2-3 GB|Varies|
|**Windows OS**|\~4-6 GB|Background processes|
|**Total**|**\~25-28 GB**|With 32GB RAM: only \~4-7GB headroom|
**The danger with 32GB RAM:** When the model unload doesn't run cleanly β which can happen β Z-Image Turbo ignores Windows Shared Memory settings and aggressively accumulates RAM. Observed peak usage: **20GB+ for the model alone**, pushing total system RAM to the absolute limit. Windows will then start swapping to SSD, causing severe slowdowns or freezes.
**64GB RAM is strongly recommended for Z-Image Turbo.**
**The Qwen Q8 workaround:** A quantized Q8 version of the Qwen encoder reduces RAM usage from \~7.5GB to \~4.5GB β saving \~3GB. However, there is an important trade-off:
* Z-Image Turbo already struggles with prompt following compared to tag-based models
* Natural Language prompting requires the encoder to correctly interpret complex sentence structures
* Any quality loss in the encoder hits harder on Z-Image Turbo than on simpler tag-based models
* Only consider Q8 Qwen if RAM pressure is severe and you are willing to accept potentially weaker prompt adherence
# β‘ FP8 on Pascal β Surprising Results
The GTX 1060 (Pascal) is often said to have no FP8 support. This is partially true but misleading.
ComfyUI's eager backend reports these FP8 capabilities on Pascal:
capabilities: ['dequantize_per_tensor_fp8', 'quantize_per_tensor_fp8',
'quantize_mxfp8', 'dequantize_mxfp8', ...]
**Practical results with** `--fp8_e4m3fn-unet` **+** `--fast fp16_accumulation`\*\*:\*\*
|Metric|FP16|FP8 (e4m3fn\_fast)|
|:-|:-|:-|
|Model staged in VRAM|11,739 MB|5,869 MB|
|Generation speed (steps)|Baseline|Slightly faster|
|Load time|Faster|Slightly slower (conversion on load)|
|Image quality (normal view)|Excellent|Excellent|
|Image quality (300% zoom, eyes)|Sharper fine detail|Slightly softer|
**Conclusion:** FP8 nearly halves VRAM usage with minimal quality difference at normal viewing distances. For drafts and exploration, FP8 is the better choice. For final renders where fine detail matters, use FP16.
**Important:** FP8 works for Z-Image Turbo (Flow Matching architecture) but NOT for Illustrious/SDXL (UNet architecture). Illustrious will silently fail to generate with `--fp8_e4m3fn-unet` on Pascal.
# π Recommended Startup BAT Files
# BAT 1: FP16 Quality Mode (for Illustrious XL + Z-Image quality renders)
bat
u/echo off
echo ComfyUI Start - FP16 Fast Mode + Force Model Unload
echo.
.\python_embeded\python.exe -s ComfyUI\main.py ^
--windows-standalone-build ^
--fast fp16_accumulation ^
--disable-smart-memory
pause
# BAT 2: FP8 Draft Mode (for Z-Image Turbo only β drafts & exploration)
bat
u/echo off
echo ComfyUI Start - FP8 Fast Mode + Force Model Unload
echo NOTE: FP8 works for Z-Image Turbo. Use FP16 BAT for Illustrious!
echo.
.\python_embeded\python.exe -s ComfyUI\main.py ^
--windows-standalone-build ^
|**Flux.2 DEV**|β Cannot run|Base FP16 model is \~60GB β no quantization makes this practical|
|**Flux.2 Klein 4B**|β οΈ Runs stably|Decent quality, but tiny community and very limited model selection|
|**Flux.2 Klein 9B**|β οΈ Runs with caveats|\~20GB native β needs quantization or interlaced mode, both reduce quality|
**Bottom line on Flux:** It can technically run in quantized form, but the quality trade-off is significant enough that it is not worth pursuing on 6GB VRAM. Z-Image Turbo delivers superior results on this hardware.
# π§ RAM Planning for Z-Image Turbo β A Hidden Pitfall
Z-Image Turbo has a RAM requirement that is easy to underestimate. Unlike Illustrious where text encoders are small, Z-Image Turbo uses **Qwen 3 4B as its text encoder β and it stays permanently in RAM**.
**Full RAM breakdown for Z-Image Turbo:**
|Component|RAM Usage|Notes|
|:-|:-|:-|
|**Qwen 3 4B Text Encoder (FP16)**|\~7.5 GB|Permanent β never unloaded|
|**Z-Image Turbo model**|\~12 GB|Staged dynamically|
|**ComfyUI + latents + overhead**|\~2-3 GB|Varies|
|**Windows OS**|\~4-6 GB|Background processes|
|**Total**|**\~25-28 GB**|With 32GB RAM: only \~4-7GB headroom|
**The danger with 32GB RAM:** When the model unload doesn't run cleanly β which can happen β Z-Image Turbo ignores Windows Shared Memory settings and aggressively accumulates RAM. Observed peak usage: **20GB+ for the model alone**, pushing total system RAM to the absolute limit. Windows will then start swapping to SSD, causing severe slowdowns or freezes.
**64GB RAM is strongly recommended for Z-Image Turbo.**
**The Qwen Q8 workaround:** A quantized Q8 version of the Qwen encoder reduces RAM usage from \~7.5GB to \~4.5GB β saving \~3GB. However, there is an important trade-off:
* Z-Image Turbo already struggles with prompt following compared to tag-based models
* Natural Language prompting requires the encoder to correctly interpret complex sentence structures
* Any quality loss in the encoder hits harder on Z-Image Turbo than on simpler tag-based models
* Only consider Q8 Qwen if RAM pressure is severe and you are willing to accept potentially weaker prompt adherence
# β‘ FP8 on Pascal β Surprising Results
The GTX 1060 (Pascal) is often said to have no FP8 support. This is partially true but misleading.
ComfyUI's eager backend reports these FP8 capabilities on Pascal:
capabilities: ['dequantize_per_tensor_fp8', 'quantize_per_tensor_fp8',
'quantize_mxfp8', 'dequantize_mxfp8', ...]
**Practical results with** `--fp8_e4m3fn-unet` **+** `--fast fp16_accumulation`\*\*:\*\*
|Metric|FP16|FP8 (e4m3fn\_fast)|
|:-|:-|:-|
|Model staged in VRAM|11,739 MB|5,869 MB|
|Generation speed (steps)|Baseline|Slightly faster|
|Load time|Faster|Slightly slower (conversion on load)|
|Image quality (normal view)|Excellent|Excellent|
|Image quality (300% zoom, eyes)|Sharper fine detail|Slightly softer|
**Conclusion:** FP8 nearly halves VRAM usage with minimal quality difference at normal viewing distances. For drafts and exploration, FP8 is the better choice. For final renders where fine detail matters, use FP16.
**Important:** FP8 works for Z-Image Turbo (Flow Matching architecture) but NOT for Illustrious/SDXL (UNet architecture). Illustrious will silently fail to generate with `--fp8_e4m3fn-unet` on Pascal.
# π Recommended Startup BAT Files
# BAT 1: FP16 Quality Mode (for Illustrious XL + Z-Image quality renders)
bat
u/echo off
echo ComfyUI Start - FP16 Fast Mode + Force Model Unload
echo.
.\python_embeded\python.exe -s ComfyUI\main.py ^
--windows-standalone-build ^
--fast fp16_accumulation ^
--disable-smart-memory
pause
# BAT 2: FP8 Draft Mode (for Z-Image Turbo only β drafts & exploration)
bat
u/echo off
echo ComfyUI Start - FP8 Fast Mode + Force Model Unload
echo NOTE: FP8 works for Z-Image Turbo. Use FP16 BAT for Illustrious!
echo.
.\python_embeded\python.exe -s ComfyUI\main.py ^
--windows-standalone-build ^
--fast fp16_accumulation ^
--fp8_e4m3fn-unet ^
--disable-smart-memory
pause
# Why --disable-smart-memory?
This flag changes how ComfyUI handles memory between generations:
**Without flag (default behavior):**
* Models stay cached in VRAM after use
* VRAM accumulates with each Image you generate. causing later images to take more time to finish
**With** `--disable-smart-memory`\*\*:\*\*
* After each use, modules are offloaded from VRAM β RAM
* The model stays in RAM (loaded once from SSD at startup)
* VRAM stays clean and constant between individual generations
* RAMβVRAM transfer is fast (DDR3: \~15-25 GB/s vs SSD: \~500 MB/s) β overhead is negligible
**β οΈBatch Generation Reality Check**
Batch generation with Illustrious XL on 6GB VRAM was tested extensively. Here is what actually happens:
ComfyUI processes all batch images **simultaneously** β every denoising step is computed for all images at once. This sounds efficient but on 6GB VRAM it has a severe cost:
|Method|Time per image|10 images total|Notes|
|:-|:-|:-|:-|
|**Sequential (recommended)**|\~131 seconds|\~22 minutes|Stable, consistent|
|**Batch 10 parallel**|\~1193 seconds|**3h 19min**|\~10x slower than sequential!|
The reason: each parallel step must process the latent data of all 10 images simultaneously, quickly exhausting VRAM. Second problem is, the GPU doesn't have enough power to render them fast. The per-step time explodes from \~4.68s/it to \~463s/it.
**Recommendation: Always generate sequentially on 6GB VRAM.** Run images one by one β it is dramatically faster than batch mode. `--disable-smart-memory` helps keep VRAM clean between sequential generations, which is its real value here.
# π― Z-Image Turbo β Recommended Settings
Z-Image Turbo uses **Qwen 3 4B** as text encoder and requires **natural language prompts** β NOT Danbooru tags.
|Parameter|Value|Notes|
|:-|:-|:-|
|Sampler|`euler_ancestral`|Official recommendation β model trained on this|
|Scheduler|`beta`|Best for Z-Image Turbo|
|Steps|8-10|More steps = diminishing returns|
|CFG|1.0-1.5|Must be low β higher values cause artifacts|
|Negative prompt|Leave empty|Has no effect on Turbo models|
**Prompt style:**
Write like a film director's script, not keyword lists.
β "A young woman in a black maid uniform standing on a rooftop at sunset,
fox ears and a fluffy tail, warm golden light from behind,
looking directly at the viewer with a calm expression."
β "1girl, maid, fox ears, sunset, masterpiece, best quality, 8k"
# π§ Illustrious XL β Recommended Settings
|Parameter|Value|Notes|
|:-|:-|:-|
|Sampler|`dpmpp_2m_cfg_pp`|Best quality/speed ratio|
|Scheduler|`karras`|Standard recommendation|
|Steps|20-28|Sweet spot for Illustrious|
|CFG|5.0-7.0|Illustrious is CFG-sensitive|
|Resolution|1024Γ1024 or 896Γ1152|Must be multiples of 64|
**Quality tags for Illustrious (NOT Pony tags!):**
masterpiece, best quality, very aesthetic, absurdres
Do NOT use `score_9`, `score_8_up` β those are Pony-specific and have no effect on Illustrious.
# π‘ Key Insights Summary
1. **ComfyUI is mandatory** β Forge/A1111 cannot do what ComfyUI does with limited VRAM
2. **Illustrious XL fits on 6GB** because the UNet (\~4.5GB) fits in VRAM β text encoders go to CPU
3. **Z-Image Turbo (12GB model) runs** due to Single-Stream architecture enabling efficient layer streaming
4. **Flux.1 FP16 does not run** β Dual-Stream architecture requires too much simultaneous VRAM. Heavily quantized versions (Q4-Q8) technically run but quality suffers too much to be worthwhile.
5. **Flux.2 Klein 4B** runs stably but has a tiny community.
6. **FP8 works on Pascal** for Z-Image Turbo via the eager backend β nearly halves VRAM with minimal quality loss
7. **FP8 does NOT work** for Illustrious/SDXL on Pascal β silently fails
8. **CPU** β even the Qwen 3 4B (4B parameter LLM) runs acceptably fast on CPU as an encoder because it only does a single forward pass (encoding), not token-by-token generation
9. **VAE is critical for Flow Matching models** (Z-Image, Flux) β
--fp8_e4m3fn-unet ^
--disable-smart-memory
pause
# Why --disable-smart-memory?
This flag changes how ComfyUI handles memory between generations:
**Without flag (default behavior):**
* Models stay cached in VRAM after use
* VRAM accumulates with each Image you generate. causing later images to take more time to finish
**With** `--disable-smart-memory`\*\*:\*\*
* After each use, modules are offloaded from VRAM β RAM
* The model stays in RAM (loaded once from SSD at startup)
* VRAM stays clean and constant between individual generations
* RAMβVRAM transfer is fast (DDR3: \~15-25 GB/s vs SSD: \~500 MB/s) β overhead is negligible
**β οΈBatch Generation Reality Check**
Batch generation with Illustrious XL on 6GB VRAM was tested extensively. Here is what actually happens:
ComfyUI processes all batch images **simultaneously** β every denoising step is computed for all images at once. This sounds efficient but on 6GB VRAM it has a severe cost:
|Method|Time per image|10 images total|Notes|
|:-|:-|:-|:-|
|**Sequential (recommended)**|\~131 seconds|\~22 minutes|Stable, consistent|
|**Batch 10 parallel**|\~1193 seconds|**3h 19min**|\~10x slower than sequential!|
The reason: each parallel step must process the latent data of all 10 images simultaneously, quickly exhausting VRAM. Second problem is, the GPU doesn't have enough power to render them fast. The per-step time explodes from \~4.68s/it to \~463s/it.
**Recommendation: Always generate sequentially on 6GB VRAM.** Run images one by one β it is dramatically faster than batch mode. `--disable-smart-memory` helps keep VRAM clean between sequential generations, which is its real value here.
# π― Z-Image Turbo β Recommended Settings
Z-Image Turbo uses **Qwen 3 4B** as text encoder and requires **natural language prompts** β NOT Danbooru tags.
|Parameter|Value|Notes|
|:-|:-|:-|
|Sampler|`euler_ancestral`|Official recommendation β model trained on this|
|Scheduler|`beta`|Best for Z-Image Turbo|
|Steps|8-10|More steps = diminishing returns|
|CFG|1.0-1.5|Must be low β higher values cause artifacts|
|Negative prompt|Leave empty|Has no effect on Turbo models|
**Prompt style:**
Write like a film director's script, not keyword lists.
β "A young woman in a black maid uniform standing on a rooftop at sunset,
fox ears and a fluffy tail, warm golden light from behind,
looking directly at the viewer with a calm expression."
β "1girl, maid, fox ears, sunset, masterpiece, best quality, 8k"
# π§ Illustrious XL β Recommended Settings
|Parameter|Value|Notes|
|:-|:-|:-|
|Sampler|`dpmpp_2m_cfg_pp`|Best quality/speed ratio|
|Scheduler|`karras`|Standard recommendation|
|Steps|20-28|Sweet spot for Illustrious|
|CFG|5.0-7.0|Illustrious is CFG-sensitive|
|Resolution|1024Γ1024 or 896Γ1152|Must be multiples of 64|
**Quality tags for Illustrious (NOT Pony tags!):**
masterpiece, best quality, very aesthetic, absurdres
Do NOT use `score_9`, `score_8_up` β those are Pony-specific and have no effect on Illustrious.
# π‘ Key Insights Summary
1. **ComfyUI is mandatory** β Forge/A1111 cannot do what ComfyUI does with limited VRAM
2. **Illustrious XL fits on 6GB** because the UNet (\~4.5GB) fits in VRAM β text encoders go to CPU
3. **Z-Image Turbo (12GB model) runs** due to Single-Stream architecture enabling efficient layer streaming
4. **Flux.1 FP16 does not run** β Dual-Stream architecture requires too much simultaneous VRAM. Heavily quantized versions (Q4-Q8) technically run but quality suffers too much to be worthwhile.
5. **Flux.2 Klein 4B** runs stably but has a tiny community.
6. **FP8 works on Pascal** for Z-Image Turbo via the eager backend β nearly halves VRAM with minimal quality loss
7. **FP8 does NOT work** for Illustrious/SDXL on Pascal β silently fails
8. **CPU** β even the Qwen 3 4B (4B parameter LLM) runs acceptably fast on CPU as an encoder because it only does a single forward pass (encoding), not token-by-token generation
9. **VAE is critical for Flow Matching models** (Z-Image, Flux) β
wrong VAE = broken output. For Z-Image use flux1-vae, NOT flux2-vae
10. **Newer SDXL and all Illustrious models have the VAE fix built in** β external VAE fix is only needed for older SDXL models
# π₯οΈ Tested Hardware
* **GPU:** NVIDIA GeForce GTX 1060 6GB (Pascal architecture, GP106)
* **RAM:** 32GB DDR3
* **Storage:** Fast SSD recommended
* **ComfyUI version:** Windows portable cu128 build
* **Driver:** Current NVIDIA drivers (May 2026)
# βοΈ Minimum & Recommended System Requirements
Running modern models on a 6GB VRAM GPU shifts the bottleneck from VRAM to **RAM and storage**. ComfyUI's Dynamic VRAM Management offloads aggressively to RAM β this only works if you have enough of it and can transfer it fast enough.
|Component|Minimum|Recommended|Why|
|:-|:-|:-|:-|
|**GPU VRAM**|6GB|6GB|GTX 1060 target|
|**RAM**|32GB|64GB|Models offload to RAM β 32GB works but gets tight with large models + OS overhead|
|**Storage**|Fast SATA SSD|NVMe M.2 SSD|Initial model load from disk β slower SSD = longer cold start per session|
|**CPU**|Any modern|Any modern|Text encoders run on CPU β but only for a single forward pass, not a bottleneck|
**Why RAM matters so much:**
* A 12GB Z-Image Turbo model staged in RAM needs \~12GB just for the model
* OS + ComfyUI + other background processes easily add another 8-10GB
* With 16GB RAM: constant disk swapping, extremely slow or unstable
* With 32GB RAM: workable, tight on very large models
* With 64GB RAM: comfortable headroom for multiple large models and batch operations
**Why SSD speed matters:** ComfyUI loads the model from disk once per session into RAM. With `--disable-smart-memory`, it then transfers from RAMβVRAM as needed (fast). But that initial disk load:
* Slow HDD: potentially minutes per model load
* SATA SSD: acceptable, 10-30 seconds
* NVMe M.2: near-instant, 2-5 seconds
**Bottom line:** A fast GPU with slow RAM or HDD will be severely bottlenecked. The GTX 1060 6GB setup only works well when RAM and storage can keep up.
*This guide was written based on hands-on testing. All benchmarks are real measurements, not theoretical estimates. If your experience differs, please share β community knowledge benefits everyone.*
*The goal of this guide is simple: don't let hardware limitation myths stop you from experimenting. Test first, assume nothing.*
https://redd.it/1tfs3ee
@rStableDiffusion
10. **Newer SDXL and all Illustrious models have the VAE fix built in** β external VAE fix is only needed for older SDXL models
# π₯οΈ Tested Hardware
* **GPU:** NVIDIA GeForce GTX 1060 6GB (Pascal architecture, GP106)
* **RAM:** 32GB DDR3
* **Storage:** Fast SSD recommended
* **ComfyUI version:** Windows portable cu128 build
* **Driver:** Current NVIDIA drivers (May 2026)
# βοΈ Minimum & Recommended System Requirements
Running modern models on a 6GB VRAM GPU shifts the bottleneck from VRAM to **RAM and storage**. ComfyUI's Dynamic VRAM Management offloads aggressively to RAM β this only works if you have enough of it and can transfer it fast enough.
|Component|Minimum|Recommended|Why|
|:-|:-|:-|:-|
|**GPU VRAM**|6GB|6GB|GTX 1060 target|
|**RAM**|32GB|64GB|Models offload to RAM β 32GB works but gets tight with large models + OS overhead|
|**Storage**|Fast SATA SSD|NVMe M.2 SSD|Initial model load from disk β slower SSD = longer cold start per session|
|**CPU**|Any modern|Any modern|Text encoders run on CPU β but only for a single forward pass, not a bottleneck|
**Why RAM matters so much:**
* A 12GB Z-Image Turbo model staged in RAM needs \~12GB just for the model
* OS + ComfyUI + other background processes easily add another 8-10GB
* With 16GB RAM: constant disk swapping, extremely slow or unstable
* With 32GB RAM: workable, tight on very large models
* With 64GB RAM: comfortable headroom for multiple large models and batch operations
**Why SSD speed matters:** ComfyUI loads the model from disk once per session into RAM. With `--disable-smart-memory`, it then transfers from RAMβVRAM as needed (fast). But that initial disk load:
* Slow HDD: potentially minutes per model load
* SATA SSD: acceptable, 10-30 seconds
* NVMe M.2: near-instant, 2-5 seconds
**Bottom line:** A fast GPU with slow RAM or HDD will be severely bottlenecked. The GTX 1060 6GB setup only works well when RAM and storage can keep up.
*This guide was written based on hands-on testing. All benchmarks are real measurements, not theoretical estimates. If your experience differs, please share β community knowledge benefits everyone.*
*The goal of this guide is simple: don't let hardware limitation myths stop you from experimenting. Test first, assume nothing.*
https://redd.it/1tfs3ee
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
Best Way to Prompt Qwen, Klein, Zit...You're Welcome
This is the best way to prompt images for Flux Klein, Qwen or Wan. These models were trained on .json in such away that they understand hierarchal structure but there is no need to waste your time on on all the punctuation.
The parts of an image include; a basic concept or summary, a subject or subjects, attire, expression, pose, hair/makeup/accessories and a background.
So break you prompt into sections. Each concept on it's own line, single returns.
Generate your image and if you want to tweak the prompt you can immediately at glance see what you need to edit, not having to dig through a paragraph of mess to find what you want to change.
\--
professional glamour photography (put LORA Trigger and Medium at top)
Concept
Modern office portrait of woman seated on stool, polished professional workspace aesthetic
pose
Seated on round stool with legs crossed at knees and extended slightly forward
Torso angled slightly toward camera with upright posture
One arm folded across body, other resting on thigh
Head slightly tilted with direct gaze toward viewer
attire
White fitted button-up blouse
Red high-waisted mini skirt
Black sheer pantyhose
Red pointed-toe high heels
secretary glasses worn low on nose, eyes looking over glasses top
gold ankle bracelet on left ankle
gold bangle bracelet
gold stud earrings
hair/makeup/nails
Long straight black hair with blunt bangs
Smooth, sleek styling
Defined brows with eyeliner and mascara
Soft blush with red-toned lip color
Neatly manicured nails in neutral tone
expression
Soft confident smile with direct eye contact
Composed, slightly playful demeanor
Calm and self-assured presence
background
White brick wall backdrop
Desk with computer monitor behind subject
Printer/copier unit on side cabinet
Light-colored tiled floor with blue accent tiles
Bright, even indoor lighting creating clean office look
\--
this was 1 shot generation with Klein just show prompt adherence. It wasn't trying to make anything fancy. This is the format I use with Qwen2512 as well. I use LORA files to control my style and avoid using any stylizations words like, "masterpiece, trending, best quality, highly realistic, 4k, etc." I let the LORA do all the work and only describe the objects.
https://preview.redd.it/iwwsg89evq1h1.png?width=1280&format=png&auto=webp&s=2d3a5c99a7110560b567c18875047215fbd9cb15
https://redd.it/1tfya25
@rStableDiffusion
This is the best way to prompt images for Flux Klein, Qwen or Wan. These models were trained on .json in such away that they understand hierarchal structure but there is no need to waste your time on on all the punctuation.
The parts of an image include; a basic concept or summary, a subject or subjects, attire, expression, pose, hair/makeup/accessories and a background.
So break you prompt into sections. Each concept on it's own line, single returns.
Generate your image and if you want to tweak the prompt you can immediately at glance see what you need to edit, not having to dig through a paragraph of mess to find what you want to change.
\--
professional glamour photography (put LORA Trigger and Medium at top)
Concept
Modern office portrait of woman seated on stool, polished professional workspace aesthetic
pose
Seated on round stool with legs crossed at knees and extended slightly forward
Torso angled slightly toward camera with upright posture
One arm folded across body, other resting on thigh
Head slightly tilted with direct gaze toward viewer
attire
White fitted button-up blouse
Red high-waisted mini skirt
Black sheer pantyhose
Red pointed-toe high heels
secretary glasses worn low on nose, eyes looking over glasses top
gold ankle bracelet on left ankle
gold bangle bracelet
gold stud earrings
hair/makeup/nails
Long straight black hair with blunt bangs
Smooth, sleek styling
Defined brows with eyeliner and mascara
Soft blush with red-toned lip color
Neatly manicured nails in neutral tone
expression
Soft confident smile with direct eye contact
Composed, slightly playful demeanor
Calm and self-assured presence
background
White brick wall backdrop
Desk with computer monitor behind subject
Printer/copier unit on side cabinet
Light-colored tiled floor with blue accent tiles
Bright, even indoor lighting creating clean office look
\--
this was 1 shot generation with Klein just show prompt adherence. It wasn't trying to make anything fancy. This is the format I use with Qwen2512 as well. I use LORA files to control my style and avoid using any stylizations words like, "masterpiece, trending, best quality, highly realistic, 4k, etc." I let the LORA do all the work and only describe the objects.
https://preview.redd.it/iwwsg89evq1h1.png?width=1280&format=png&auto=webp&s=2d3a5c99a7110560b567c18875047215fbd9cb15
https://redd.it/1tfya25
@rStableDiffusion
Generated 1000 liminal/dreamcore images with GPT Image 2 and put them in a dataset - could be useful for training
Was playing around with GPT Image 2 on 2K medium and ended up with about 1000 images that all have this liminal space / dreamcore feel. Empty indoor pools, weird corridors, foggy parking lots at night, that sort of thing.
Instead of letting them sit on my drive I packaged everything up and put it on Hugging Face. Could be decent for fine-tuning SD models or just as a reference set for this aesthetic.
https://huggingface.co/datasets/LukaDev13/Liminal-Dreamcore-1K
If anyone uses it for training I'd be curious how it turns out.
https://redd.it/1tg3rym
@rStableDiffusion
Was playing around with GPT Image 2 on 2K medium and ended up with about 1000 images that all have this liminal space / dreamcore feel. Empty indoor pools, weird corridors, foggy parking lots at night, that sort of thing.
Instead of letting them sit on my drive I packaged everything up and put it on Hugging Face. Could be decent for fine-tuning SD models or just as a reference set for this aesthetic.
https://huggingface.co/datasets/LukaDev13/Liminal-Dreamcore-1K
If anyone uses it for training I'd be curious how it turns out.
https://redd.it/1tg3rym
@rStableDiffusion
huggingface.co
LukaDev13/Liminal-Dreamcore-1K Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
Media is too big
VIEW IN TELEGRAM
Tried using HY-Pano 2.0 and WorldMirror 2.0 together to create some rooms
https://redd.it/1tg3dq9
@rStableDiffusion
https://redd.it/1tg3dq9
@rStableDiffusion