r/StableDiffusion

LTX 2.3 Experimental Music Video
https://www.youtube.com/watch?v=8PDmOIgKAFk

https://redd.it/1tfk3tq
@rStableDiffusion

Rainbow Connection - Kermit/Jim Henson's song (1979)

This experiment uses the latest local AI technology to create a music video. I am experimenting and playing with the framing and camera angles of the video.

All of my AI demos:
https://www.youtube.com/playlist?list=PLe3OBqR7FeRhZM6SNoIWibQ1PA2JREYtL

4 views09:40

r/StableDiffusion

Prompting Tips Flux.2-Klein

For Klein 9B using the qwen_3_8b, the prompt path is basically:

your prompt;

1-wrapped in Qwen chat template

2 - Qwen2 tokenizer

3- Qwen3 8B text encoder

4- hidden layers [9, 18, 27\] stacked into conditioning

5- Flux2/Klein transformer cross-attends to that

The local wrapper does this template:

<|imstart|>user
YOUR PROMPT<|imend|>
<|imstart|>assistant
<think>

</think>

So it is not reading your prompt like CLIP tags. It is reading it like an instruction/message.

What It Accepts Well:

**It should respond best to natural language with clear relationships:**

A woman sitting on a beachfront, looking at the camera, wearing a black dress. The camera is at eye level. Her body is seated facing slightly left. The beach and ocean are behind her.

**Strong prompt concepts:**

\- subject type: woman, man, dog, car

\- action/pose: sitting, standing, walking, looking at camera

\- location: on a beach, inside a kitchen

\- spatial relations: behind her, to her left, in the foreground

\- clothing/object attribution: she is wearing, holding, beside

\- camera/framing: close-up, full body, eye-level, three-quarter view

\- style if phrased plainly: photo, natural lighting, soft shadows

**What It Throws Away Or Weakens**

The big one: Comfy prompt weighting is disabled for this TE.

**So this does not mean much:**

((face:1.4)), [body:0.6], (((identity)))

The tokenizer still sees punctuation/text, but the encoder wrapper passes disable\weights=True, so classic CLIP-style

emphasis is not applied as weights.

Also weak:

\- giant comma tag soups

\- repeated words as fake emphasis

\- abstract junk like masterpiece, best quality, ultra detailed

\- contradictions: sitting, standing, walking

\- vague modifiers not attached to a noun: beautiful, perfect, cinematic

\- negative prompt logic, unless the sampler/model path explicitly uses it well

\- overly long prompts where important instructions are buried

What Matters Most

Because this is Qwen-style chat encoding, write prompt chunks as sentences with ownership:

Bad:

beach, woman, camera, sitting, black dress, looking, ocean, realistic

Better:

A realistic photo of a woman sitting on a beach. She is looking at the camera. She is wearing a black dress. The ocean is behind her.

For identity/reference workflows "Identity feature transfer", avoid asking the TE to redefine the subject too much. Let the node carry identity, and let prompt carry scene/action:

Keep the same woman. Change only the location: she is sitting on a beachfront, looking at the camera. Natural daylight photo.

Best Prompt Shape For Your Use:

Use this structure:

[identity constraint\].

[scene/location change\].

[pose/action\].

[clothing/body constraint\].

[camera/framing\].

[lighting/style\].

Example:

Keep the same woman from the reference image.
Move her to a sunny beachfront.
She is sitting and looking directly at the camera.
Preserve her face, body proportions, hairstyle, and clothing shape.
Eye-level photo, natural daylight, realistic beach background.

The TE will not literally “obey” every clause, but this format gives Qwen the best chance to encode relationships instead of treating the prompt as a bag of tags.

https://redd.it/1tflqso
@rStableDiffusion

GitHub

GitHub - capitan01R/ComfyUI-Flux2Klein-Enhancer: Flux.2Klein 9B Enhancement Nodes Suite

Flux.2Klein 9B Enhancement Nodes Suite . Contribute to capitan01R/ComfyUI-Flux2Klein-Enhancer development by creating an account on GitHub.

4 views10:40

r/StableDiffusion

Dream Wan + LTX combination

Given Wan2.2 is much better at learning movement and physics, but LTX is better with audio and lipsync, the dream would be to define the desired motion with a generated Wan clip, and let LTX continue it.

There exists workflows such as RuneXX to try and achieve this, but I've not managed to make LTX replicate and continue Wan's movements, only go off on its own tangent.

Has anyone achieved this? I know Sulphur is impressive, but it's still a long way behind some of the Wan checkpoints especially in terms of physics and prompt adherence.

https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main/Video-2-Video/Extend-Any-Video

https://redd.it/1tfktgi
@rStableDiffusion

huggingface.co

RuneXX/LTX-2.3-Workflows at main

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

6 views11:40

r/StableDiffusion

0:22

This media is not supported in your browser

VIEW IN TELEGRAM

My local workflow for turning SDXL character generations into game-ready 3D assets

https://redd.it/1tfnlr8
@rStableDiffusion

5 views12:40

r/StableDiffusion

0:05

This media is not supported in your browser

VIEW IN TELEGRAM

Recreating 80s and 90s anime style with ZIT and LTX 2.3

https://redd.it/1tflhdi
@rStableDiffusion

5 views13:40

r/StableDiffusion

Check out a free prompt writing site I made
https://redd.it/1tfrykt
@rStableDiffusion

5 views15:40

r/StableDiffusion

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

NeuralCompanion

https://redd.it/1tftqwg
@rStableDiffusion

6 views16:40

r/StableDiffusion

Van Gogh Qwen-2512

https://redd.it/1tfv7uo
@rStableDiffusion

From the StableDiffusion community on Reddit: Van Gogh Qwen-2512

Explore this post and more from the StableDiffusion community

5 views17:40

r/StableDiffusion

6 views17:40

r/StableDiffusion

6 views17:41

r/StableDiffusion

Running Modern AI Image Models on a GTX 1060 6GB — A Practical Guide

Tested & verified on NVIDIA GTX 1060 6GB (Pascal Architecture) · ComfyUI · May 2026
Written to counter the widespread misinformation that "only SD 1.5 runs on 6GB VRAM"

As i started with Image work, my inital Goal was to Translate Japanese Text into English on VN Game CGs. I'm personaly really bad with doing IMAGE work, thats why i thought, lets try a AI for that. As i started, i Asked Claude Sonnet, whats possible with my low Hardware and what not. The answer was a crushing one. Only SD1.5 would run on my System. But as most of you know, SD 1.5 is really limeted compared to Pony, SDXL or Illustious Models. Out of curiiousity i started to test out differend Models, to see whats possible and what not. To my and even Sonnets supprise, thats way more, that i ever thought would be.
I share this here for PPL like me, who only habe low End Hardware like GTX1060 to show you guys whats really possible with that, why it is possible and where are the Limits of ur card lies.

Lets start the Guide 😄

# 🖥️ Platform Compatibility — Read This First

**This guide is written exclusively for Windows + NVIDIA GPU users.**

Before diving in, understand why platform matters enormously for low-VRAM setups:

|Platform|NVIDIA|AMD|
|:-|:-|:-|
|**Windows**|✅ This guide — fully tested|⚠️ ROCm support from ComfyUI Desktop v0.7.0, unstable, many plugins CUDA-only|
|**Linux + NVIDIA**|❌ No Shared Video Memory in NVIDIA Linux driver → hard OOM crashes|⚠️ ROCm available, GTT memory (\~50% RAM) as VRAM extension, but stability issues|
|**macOS**|❌ Not covered — 8GB Unified Memory Macs perform worse than GTX 1060 6GB due to OS sharing the same pool. Higher-end Macs work but are not the target audience of this guide.|❌|

**Why Windows NVIDIA works but Linux NVIDIA doesn't:** Windows uses WDDM (Windows Display Driver Model) which automatically provides **Shared Video Memory** — system RAM that acts as a seamless extension of VRAM when it fills up. This is visible in Task Manager as "Shared GPU Memory" and is the foundation that makes everything in this guide possible.

The NVIDIA Linux driver does not implement this feature. When VRAM fills up on Linux with NVIDIA, the result is a hard CUDA Out of Memory error — no graceful fallback, no RAM extension.

**The Linux irony:** Linux is actually far more RAM-efficient than Windows — OS overhead is significantly lower, leaving more RAM available for models. If NVIDIA had implemented Shared Video Memory in their Linux driver, Linux would likely be the *better* platform for low-VRAM AI setups. Unfortunately, that feature simply does not exist there.

**For AMD on Linux:** GTT memory (up to 50% of system RAM) provides similar functionality to Windows Shared Memory, and ComfyUI runs via ROCm — but there are significant drawbacks:

* **GTT limit:** Maximum 50% of system RAM — hardcoded by the Linux kernel TTM memory manager. With 32GB RAM, only 16GB GTT available as VRAM extension
* **Stability issues:** HIP memory errors, slow first generation, VAE decoding failures are commonly reported
* **Plugin compatibility:** Many ComfyUI custom nodes are CUDA-only and untested on ROCm
* **Driver maturity:** ROCm is improving rapidly but still less mature than NVIDIA CUDA on Windows
* **Gaming origin:** AMD's GTT Shared Memory on Linux exists primarily because AMD has actively supported Linux gaming — a use case where VRAM overflow is equally relevant. NVIDIA has not yet implemented an equivalent for their Linux driver, giving AMD a practical advantage for low-VRAM AI workloads on Linux.

Not covered in this guide — mentioned for completeness only.

# ⚠️ The Myth vs. Reality

You will find countless posts online and even AI assistants confidently telling you:

>*"SDXL needs at least 8GB VRAM"*
*"Illustrious XL is impossible on 6GB"*
*"Z-Image Turbo requires 11-12GB"*

**Most of this is wrong — when you use ComfyUI.**

One thing is true: **batch generation is not practical on 6GB VRAM** — sequential single image generation is dramatically faster.

4 views18:40

r/StableDiffusion

4 views18:40

About

Blog

Apps

Platform