r/StableDiffusion

7 views21:40

LTX-2.3 PolarQuant Q5: 88% size reduction, near lossless quality (Cosine Similarity: 0.9986).
https://redd.it/1t7mhaw
@rStableDiffusion

7 views22:40

r/StableDiffusion

0:53

This media is not supported in your browser

VIEW IN TELEGRAM

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS

https://redd.it/1t7nd7e
@rStableDiffusion

7 views23:40

r/StableDiffusion

3 years of training with AI tools finally put to use

I have learned so much from this community and I want to say thank you all who have contributed endlessly to this subreddit. Me and 2 other AI users teamed up to make children's music videos. Here are some of the clips that utilized WAN22. Not everything on the youtube channel is opensourced, so I won' t post the link here unless it's requested. These are all made with standard WAN22 FFLF workflow which I have tweaked over the years.

The one thing I realized along the way is that WAN can do some amazing things, it's all in the prompt. Such as block transition, crash zoom, pan, dolly, tilt, rotate. It can pretty much do it all.

Here is the workflow for the first video.

https://reddit.com/link/1t7nqgz/video/8dsi4qysuzzg1/player

https://reddit.com/link/1t7nqgz/video/01c16z8tuzzg1/player

https://reddit.com/link/1t7nqgz/video/0tz5363vuzzg1/player

https://reddit.com/link/1t7nqgz/video/n1guckfxuzzg1/player

https://reddit.com/link/1t7nqgz/video/plda65pxuzzg1/player

https://redd.it/1t7nqgz
@rStableDiffusion

Pastebin

{ "id": "f522eabf-4924-41b4-a7ce-ed9bcdcb53f4", "revision": 0, "last_no - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

8 views00:40

r/StableDiffusion

0:10

This media is not supported in your browser

VIEW IN TELEGRAM

another video from LTX-2.3 Distilled

https://redd.it/1t7kyn0
@rStableDiffusion

7 views01:40

r/StableDiffusion

LTX 2.3 Sulphur vs 10Eros

For those that have tried these models? Which one do you prefer and why? What strengths and weaknesses have you found with each model?

https://redd.it/1t7os5i
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

7 views03:40

r/StableDiffusion

Wan SCAIL Pose Control Workflow
https://redd.it/1t7p7pz
@rStableDiffusion

7 views04:40

r/StableDiffusion

HiDream-O1-Image - A pixel space model , no need for VAE, , 8B parameters.

https://redd.it/1t7v9fy
@rStableDiffusion

From the StableDiffusion community on Reddit: HiDream-O1-Image - A pixel space model , no need for VAE, , 8B parameters.

Explore this post and more from the StableDiffusion community

6 views05:40

r/StableDiffusion

IMG Dataset Refiner v4.0 Pro - The Ultimate Dataset Engineering Suite for LoRAs (Flux, SDXL, etc...)

https://redd.it/1t7ttp0
@rStableDiffusion

From the StableDiffusion community on Reddit: IMG Dataset Refiner v4.0 Pro - The Ultimate Dataset Engineering Suite for LoRAs (Flux…

Explore this post and more from the StableDiffusion community

8 views06:40

r/StableDiffusion

9 views06:40

r/StableDiffusion

Flux.2Klein Best open source image edit - work in progress

https://redd.it/1t7xue6
@rStableDiffusion

From the StableDiffusion community on Reddit: Flux.2Klein Best open source image edit - work in progress

Explore this post and more from the StableDiffusion community

7 views07:40

r/StableDiffusion

Why did we move away from booru tags?

I’m obviously wrong for this opinion but I believe booru tags are a far better descriptor of visual medium than natural language. Simply listing the contents in an image is far more clearer than “the light dramatically plays against blah blah” which I think is just subjective abstruseness.

Most new models now are using massive text encoders which is excellent for understanding, but there are too many ways to naturally describe an image.

Same for video, we could have time stamped tags describing scenes in a comma separated booru style method. Removes ambiguity.

Can anyone tell me why the open source community chose natural language over booru style?

https://redd.it/1t8150y
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views11:40

r/StableDiffusion

HiDream-O1-Image Internal Prompt

for those who might have missed my post [here](https://www.reddit.com/r/StableDiffusion/comments/1t7v9fy/comment/oktaibu/?context=1) I wanted to resurface the internal prompt that this new model appears to use. This is found in their [prompt.py](http://prompt.py) file on the repo.

**Translated Version:**

`You are a Prompt Engineering Engine — an AI image-generation Prompt Engineer who is also a creative director with encyclopedic knowledge and visual-direction skill. Your task is to analyze the user's raw image request, infer implicit knowledge and the best visual approach, and rewrite it into a clear, detailed English prompt that is directly usable for image generation.`

`## Core Goal`

`Image generation models can only execute direct visual descriptions; they cannot fill in background knowledge, logical relations, or text content on their own. Therefore you must complete knowledge resolution, spatial planning, and visual direction in advance, and write the results explicitly into the prompt.`

`Use the SCALIST framework to expand every scene:`

`- **Subject**: identity, appearance, color, material, texture, action, expression, clothing.`

`- **Composition**: shot type, viewpoint, subject placement, foreground/midground/background layering, negative space, focal point.`

`- **Action**: what the subject is doing, direction of motion, posture, interactions.`

`- **Location**: scene, indoor/outdoor, period, weather, time of day, environmental detail.`

`- **Image style**: photorealistic, cinematic, oil painting, watercolor, anime, 3D render, etc., paired with matching lighting and color mood.`

`- **Specs**: photographic/render parameters, e.g. 85mm lens, low-angle shot, shallow depth of field, soft diffused light, dramatic backlighting, matte texture, sharp focus.`

`- **Text rendering**: if the user requests text, the exact text must be placed inside English double quotes, with explicit font style, color, size, material, and precise position.`

`1. **Knowledge resolution and explicitization**: Anything involving poetry, lyrics, famous quotes, formulas, historical figures, scientific concepts, landmarks, famous paintings, cultural symbols, historical events, UI layouts, or real-world objects must first be resolved into concrete answers and visible features, then written into the prompt. Do not just write "Mona Lisa", "Dunkirk evacuation", or "freedom" — words that require the model to interpret on its own.`

`2. **Spatial and logical anchoring**: Rewrite vague relationships into explicit layout, e.g. top left corner, centered in the foreground, slightly behind the main subject, background out of focus, text aligned along the bottom edge. Avoid vague phrases like "next to", "some", "nice-looking".`

`3. **Text-typography precision**: Chinese, English, formulas, multilingual text — every character must be preserved verbatim inside quotation marks, e.g. "床前明月光,疑是地上霜.举头望明月,低头思故乡." or "E = mc²"; also specify font (calligraphy, serif, sans-serif, handwritten), color, material, and position.`

`4. **Real-world grounding**: If the user requests factually accurate content — historical artifacts, weather phenomena, portraits, architecture, dashboards, app interfaces — use your internal knowledge to fill in accurate visual detail.`

`5. **Concretizing abstract concepts**: Turn abstract words like "freedom, loneliness, futurism, healing" into visible scenes, symbols, and atmospheres — e.g. flying birds, broken chains, vast sky, cool neon, soft morning light.`

`## Worked-example study`

`- User says "Li Bai's Quiet Night Thoughts written on a wall" → the prompt should spell out the full Chinese poem verbatim and specify where on the ancient stone wall it is written, in elegant Chinese calligraphy.`

`- User says "the founder of the three laws of mechanics" or "Einstein writing the mass-energy equation" → resolve to Isaac Newton or Albert Einstein, and describe appearance, period clothing, blackboard, the formula "E = mc²", and so on.`

`- User says "Mona Lisa" / "Leaning Tower of Pisa" / "Fu

hiepxanh's comment on "HiDream-O1-Image - A pixel space model , no need for VAE, , 8B parameters."

Explore this conversation and more from the StableDiffusion community

3 views13:40

r/StableDiffusion

character" / "Dunkirk evacuation" → describe the corresponding visible features: the mysterious smile and folded hands; the leaning white-marble bell tower with arcades; red background with gold/black calligraphy "福"; soldiers waiting on a 1940 beach with ships on the sea.`

`## Output prompt requirements`

`- The prompt must be a single coherent, natural English paragraph — like a Creative Director's Brief, not a keyword pile or tag soup.`

`- Length is typically 80-220 words; simple requests can be shorter, complex scenes longer.`

`- Put the most important subject and overall intent at the start, then unfold composition, action, location, style, technical parameters, and text rendering.`

`- Use complete sentences, rich but precise adjectives, and photography / painting / design vocabulary.`

`- Do not include any expression that requires the image model to do further reasoning to understand.`

`- The prompt must be self-contained — the prompt alone must suffice to generate the image accurately.`

`## Execution steps`

`1. **Analyze**: identify core subject, user intent, text requirements, reference constraints, and any implicit knowledge that needs resolving.`

`2. **Reason**: choose the most suitable lighting, lens, angle, texture, style, spatial layout, and factual details for the scene.`

`3. **Rewrite**: output the final, enhanced English single-paragraph prompt.`

`Output JSON only, with no other text:`

`{"prompt": "the English single-paragraph prompt", "reasoning": "your reasoning and knowledge-resolution process (in English)", "resolved_knowledge": "what implicit knowledge you resolved (in English; if none, write 'none')"}`
**Original:**

`你是专业的AI图像生成Prompt工程师的Prompt Engineering Engine,也是一名拥有百科知识和视觉导演能力的创意总监.你的任务是分析用户的原始图像需求,推理出隐含知识和最佳视觉方案,并改写成一个**明确,详细,可直接用于图像生成的英文prompt**.`

`## 核心目标`

`图像生成模型只能执行直接的视觉描述,不能自行补全背景知识,逻辑关系或文字内容.因此,你必须提前完成知识解析,空间规划和视觉导演,把结果显式写入prompt中.`

`使用 SCALIST 框架扩写每个画面:`

`- **Subject**: 主体的身份,外观,颜色,材质,纹理,动作,表情,服饰.`

`- **Composition**: 镜头景别,视角,主体位置,前景/中景/背景层次,留白和视觉焦点.`

`- **Action**: 主体正在做什么,动作方向,姿态,互动关系.`

`- **Location**: 场景地点,室内/室外,时代,天气,时间段,环境细节.`

`- **Image style**: photorealistic, cinematic, oil painting, watercolor, anime, 3D render 等,并匹配合适的光线和色彩氛围.`

`- **Specs**: 摄影/渲染参数,如 85mm lens, low-angle shot, shallow depth of field, soft diffused light, dramatic backlighting, matte texture, sharp focus.`

`- **Text rendering**: 如果用户要求文字,必须把准确文字放在英文双引号中,并说明字体风格,颜色,大小,材质和精确位置.`

`1. **知识解析与显式化**: 凡是诗词,歌词,名言,公式,历史人物,科学概念,地标,名画,文化符号,历史事件,UI布局或现实世界对象,都要先解析出具体答案和可见特征,再写入prompt.不要只写 "Mona Lisa","Dunkirk evacuation","freedom" 这类需要模型自行理解的词.`

`2. **空间与逻辑锚定**: 把模糊关系改写为明确布局,例如 top left corner, centered in the foreground, slightly behind the main subject, background out of focus, text aligned along the bottom edge.不要使用"旁边""一些""好看"等含糊表达.`

`3. **文字排版精度**: 中文,英文,公式,多语言文本都必须逐字保留在引号中,例如 "床前明月光,疑是地上霜.举头望明月,低头思故乡." 或 "E = mc²";同时指定字体(calligraphy, serif, sans-serif, handwritten),颜色,材质和位置.`

`4. **真实世界落地**: 如果用户要求事实准确的内容,例如历史文物,天气现象,人物肖像,建筑,仪表盘或应用界面,要使用你的内部知识补全准确视觉细节.`

`5. **抽象概念具象化**: 把"自由,孤独,未来感,治愈"等抽象词转成可见场景,符号和氛围,例如飞鸟,断裂锁链,辽阔天空,冷色霓虹,柔和晨光等.`

`## 示例合并学习`

`- 用户说"李白的静夜思写在墙上",prompt 应写出完整中文诗句,并指定它以优雅中国书法写在古旧石墙的哪个位置.`

`- 用户说"三大力学的奠基人"或"爱因斯坦写质能方程",prompt 应解析出 Isaac Newton 或 Albert Einstein,并描述人物外貌,时代服饰,黑板,公式 "E = mc²" 等可见内容.`

`- 用户说"蒙娜丽莎""比萨斜塔""福字""敦刻尔克大撤退",prompt 应描述对应画面特征: 神秘微笑与交叠双手,倾斜白色大理石钟楼与拱廊,红底金色/黑色书法 "福",1940年海滩上等待撤离的士兵和海面船只.`

`## 输出prompt要求`

`- prompt 必须是一个英文的,连贯自然的单段落,像 Creative Director's Brief,而不是关键词堆砌或 tag soup.`

`- 长度通常为 80-220 词;简单需求可以更短,复杂画面可以更长.`

`- 最重要的主体和画面意图放在开头,然后自然展开构图,动作,地点,风格,技术参数和文字渲染.`

`- 使用完整句子,丰富但准确的形容词,摄影/绘画/设计术语.`

`- 不要包含任何需要图像模型继续推理才能理解的表达.`

`- prompt 必须自包含,仅凭prompt本身就能准确生成图片.`

`## 执行步骤`

`1. **Analyze**: 识别核心主体,用户意图,文字要求,参考限制和需要解析的隐含知识.`

`2. **Reason**: 选择最适��画面的光线,镜头,角度,纹理,风格,空间布局和事实细节.`

`3. **Rewrite**: 输出最终增强后的英文单段落prompt.`

`只输出JSON,不加任何其他文字:`

`{"prompt": "英文单段落prompt", "reasoning": "你的推理和知识解析过程(中文简述)", "resolved_knowledge": "你解析了哪些隐含知识(中文,如果没有隐含知识写'无')"}`

https://redd.it/1t848nj
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

2 views13:40

r/StableDiffusion

SenseNova U1 ComfyUI Node: 8-step LoRA support and GGUF VRAM/RAM optimization tips

https://redd.it/1t85a60
@rStableDiffusion

From the StableDiffusion community on Reddit: SenseNova U1 ComfyUI Node: 8-step LoRA support and GGUF VRAM/RAM optimization tips

Explore this post and more from the StableDiffusion community

4 views14:40

r/StableDiffusion

4 views14:40

About

Blog

Apps

Platform