r/StableDiffusion

Someone posted a real Monet to twitter but said it was AI generated. The replies are amazing, pretentious and confidently wrong
https://redd.it/1tcxmdy
@rStableDiffusion

5 views13:40

r/StableDiffusion

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config)
https://redd.it/1tcxhoq
@rStableDiffusion

5 views14:40

r/StableDiffusion

Qwen-Image-VAE-2.0 Technical Report

arxiv.org/pdf/2605.13565

"We present Qwen-Image-VAE-2.0, a suite of high-compression Variational Autoencoders (VAEs) that achieve significant advances in both reconstruction fidelity and diffusability. To address the reconstruction bottlenecks of high compression, we adopt an improved architecture featuring Global Skip Connections (GSC) and expanded latent channels. Moreover, we scale training to billions of images and incorporate a synthetic rendering engine to improve performance in text-rich scenarios. To tackle the convergence challenges of high-dimensional latent space, we implement an enhanced semantic alignment strategy to make the latent space highly amenable to diffusion modeling. To optimize computational efficiency, we leverage an asymmetric and attention-free encoder-decoder backbone to minimize encoding overhead. We present a comprehensive evaluation of Qwen-Image-VAE-2.0 on public reconstruction benchmarks. To evaluate performance in text-rich scenarios, we propose OmniDoc-TokenBench, a new benchmark comprising a diverse collection of real-world documents coupled with specialized OCR-based evaluation metrics. Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction performance, demonstrating exceptional capabilities in both general domains and text-rich scenarios at high compression ratio. Furthermore, downstream DiT experiments reveal our models possess superior diffusability, significantly accelerating convergence compared to existing high-compression baselines. These establish Qwen-Image-VAE-2.0 as a leading model with high compression, superior reconstruction, and exceptional diffusability."

Key innovations:

Global Skip Connections (GSC): This architectural change allows the model to "remember" fine details from the original image and pass them directly through the compression bottleneck, significantly improving the clarity of the final output.
Asymmetric & Attention-Free Backbone: They made the encoder (which processes the image) very lightweight and fast while keeping the decoder (which reconstructs the image) powerful. By removing "Attention" layers in the VAE itself, they drastically reduced the computational cost (FLOPs).
Semantic Alignment Strategy: To make the model better for generating images (diffusability), they forced the latent space to align more closely with visual "meaning." This helps downstream models learn much faster.
Synthetic Rendering for Text: They trained the model on billions of images, including a massive set of synthetically rendered documents. This makes this VAE exceptionally good at reconstructing OCR-rich images (documents, posters, covers etc.) where most other VAEs fail.

alibaba/OmniDoc-TokenBench

"We conduct a comprehensive evaluation on OmniDoc-TokenBench (\~3K text-rich images, 256×256 resolution). Models are grouped by spatial compression factor and sorted by NED within each group.

Our Qwen-Image-VAE-2.0 achieves state-of-the-art reconstruction across all compression ratios. The f16c128 variant attains SSIM 0.9706 and PSNR 30.45 dB, surpassing the best f8 baseline (FLUX.1-dev at 0.9364 / 26.24 dB) despite 2× higher spatial compression. In terms of text fidelity (NED), f16c128 reaches 0.9617, exceeding all evaluated VAEs. Even under extreme f32 compression, our f32c192 achieves NED 0.8555, surpassing multiple f16

4 views15:40

r/StableDiffusion

baselines."

https://preview.redd.it/yrt8rsc8241h1.png?width=1918&format=png&auto=webp&s=3b812d1a9b4be2f9d2d6922d685c5077b7c9e242

https://redd.it/1tczg0t
@rStableDiffusion

4 views15:40

r/StableDiffusion

LTX Director - All-In-One Timeline Editor. I2V, T2V, FLFF, Prompt Relay, Custom Audio, and more! Unlock LTX 2.3's full potential!
https://youtu.be/fZgtkRcu4_k

https://redd.it/1tczxqw
@rStableDiffusion

YouTube

LTX Director - The All-In-One Timeline Editor. I2V, T2V, FLFF, Prompt Relay, Custom Audio, and more!

A Complete Timeline Editor For LTX 2.3.
Download for free here: https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI

Example workflows here:
https://github.com/WhatDreamsCost/WhatDreamsCost-ComfyUI/tree/main/example_workflows

Main Features:

- Fully…

5 views16:40

r/StableDiffusion

Anima base v1.0 has been released.
https://redd.it/1td4lbn
@rStableDiffusion

4 views17:40

r/StableDiffusion

AsymFLUX.2-klein-9B - Pixel Space Model.

Pixel-space text-to-image model AsymFLUX.2-klein finetuned from black-forest-labs/FLUX.2-klein-base-9B, using the AsymFlow method proposed in the paper:

https://preview.redd.it/moe2i7xjt51h1.png?width=3518&format=png&auto=webp&s=a56904867faa1523161bb71b4414939cfd9277a2

HF: Lakonik/AsymFLUX.2-klein-9B · Hugging Face

Paper: \[2605.12964\ Asymmetric Flow Models](https://arxiv.org/abs/2605.12964)

Code: LakonLab/docs/AsymFlow.md at main · Lakonik/LakonLab

https://redd.it/1td9ojh
@rStableDiffusion

4 views20:40

r/StableDiffusion

I got tired of messy prompt libraries, so I made my own

https://redd.it/1td8g8q
@rStableDiffusion

From the StableDiffusion community on Reddit: I got tired of messy prompt libraries, so I made my own

Explore this post and more from the StableDiffusion community

4 views21:40

r/StableDiffusion

3 views21:40

r/StableDiffusion

anima pv2 vs anima pv3 vs anima-base v1

[A close-up, high-contrast illustration depicts a terrifying embrace between two female figures against a dark, shadowy background. In the foreground, a young woman with long, messy blonde hair and pale skin sits in a state of distress. She wears a loose, white, long-sleeved dress or gown that appears slightly soiled. Her blue eyes are wide and filled with tears, and her mouth is slightly open in a grimace of fear as she looks forward.Looming directly behind her is a monstrous, demonic figure with long, disheveled black hair that blends into the darkness. This figure has glowing red eyes and a wide, menacing grin that reveals sharp teeth. She is embracing the blonde woman from behind, her body pressed close. Her left hand, which appears blackened or gloved, grips the blonde woman's chin and jaw forcefully, tilting her head slightly. The dark figure extends a long, pink tongue, licking the side of the blonde woman's face near her cheek, adding to the predatory and violating nature of the scene. The lighting is dramatic, highlighting the blonde woman's tears and the texture of her white dress while casting the attacker mostly in shadow, emphasizing the horror and intensity of the moment. The art style is painterly with visible brushstrokes, giving it a gritty, textured look reminiscent of dark anime or horror manga.](https://preview.redd.it/sk66rj8n861h1.png?width=2592&format=png&auto=webp&s=80c937ab94ad392e6cd621e87da4392ae88c79bd)

[A full-body, front-facing shot of a dark, multi-limbed silhouette figure rising from a mass of indistinct, shadowy forms at the bottom of the frame. The central figure has long, wild black hair flowing upward and outward as if caught in wind or supernatural force; its face is partially visible — pale with sharp features, eyes closed or downcast, expression serene yet ominous. Extending from its torso are eight elongated arms, each ending in clawed hands splayed in dynamic, reaching poses — some pointing upward, others outward or downward, creating a radial symmetry around the body. Behind the figure’s head glows a large, textured circular halo or sunburst pattern rendered in beige and ochre tones, radiating thin lines outward like rays of light or energy; within this circle, near the top center, appears a single black Japanese kanji character “神” \(kami\/god\). The background resembles aged parchment or canvas, stained with rust-colored smudges and faint vertical striations, enhancing the antique, ritualistic feel. Lighting is high-contrast: the figure is nearly pure black against the luminous backdrop, emphasizing form through negative space while leaving facial details and limb contours sharply defined. The atmosphere is mythic, divine, and terrifying — blending Eastern iconography with grotesque multiplicity to evoke a deity of chaos, power, or judgment emerging from primordial darkness within a sacred, weathered pictorial field.](https://preview.redd.it/dzs8x5wr861h1.png?width=2592&format=png&auto=webp&s=47538e7afc05ef14a2b42bee7a3122ae31b2b6f5)

[A full-body, side-profile shot of two individuals standing back-to-back against a stark white background. The taller figure on the left is a man with shoulder-length dark hair falling across his forehead and neck; he wears a black long-sleeved shirt that clings to his muscular torso, revealing defined shoulders and collarbones under dramatic lighting. His face is turned slightly downward, eyes half-lidded, expression somber or contemplative. Behind him and to the right stands a shorter individual — likely a woman — with short, spiky dark hair and sharp facial features; she wears a form-fitting black turtleneck dress or top, her body angled away but head turned toward the viewer’s left, gaze steady and intense. A small geometric earring or accessory glints at her left earlobe. Lighting originates from the front-right, casting deep shadows along their backs and sides while illuminating parts of their faces, necks, and arms in high contrast. The composition emphasizes physical proximity without touch,

3 views22:40

r/StableDiffusion

suggesting tension, alliance, or shared burden. No environment exists beyond the pure white void, isolating the figures entirely. The atmosphere is minimalist, emotionally charged, and stylized — focusing on silhouette, posture, and interplay of light and shadow to convey intimacy, defiance, or silent solidarity between two bodies locked in mutual orientation within an abstract space.](https://preview.redd.it/suij99mw861h1.png?width=3168&format=png&auto=webp&s=45dfba040eec63a4e940f188904632150b9d1983)

[@zuwai kani,A side-by-side composite image displays two individuals in separate indoor settings, each framed from the chest up. On the left, a man with short, light brown hair and a neatly trimmed goatee wears a white collared shirt, black tie, and dark gray suit jacket; his eyebrows are furrowed, eyes narrowed, and mouth set in a stern line, conveying intensity or displeasure. The background behind him is softly blurred but suggests an office or formal interior with warm tones and indistinct furniture. On the right, a young woman with long, straight platinum blonde hair parted down the middle gazes forward with wide, pale eyes and slightly parted lips, expression neutral to mildly surprised. She wears a thin black choker necklace with a small silver pendant and a sleeveless white top. Her background is similarly out of focus, showing muted beige walls and possibly wooden cabinetry, indicating a domestic or casual indoor space. Lighting is even across both figures, highlighting facial features and clothing textures without dramatic shadows. The atmosphere is tense and juxtaposed — contrasting masculine authority with feminine passivity through direct gaze, attire, and emotional expression within isolated, everyday environments.](https://preview.redd.it/ksl1fb23961h1.png?width=3168&format=png&auto=webp&s=e45a2a52d0139cb775e29922d99d2e1177ab88ea)

[@zunta,A close-up shot of a young woman with long dark brown hair and glasses, her face turned upward in profile as she gazes at a thick stack of Japanese 10,000 yen bills being held directly in front of her mouth by an unseen person’s hand. Her cheeks are flushed pink, eyes half-lidded with a dreamy, adoring expression, lips slightly parted as if about to kiss or accept the money. The hand holding the cash is pale, emerging from the left side of the frame, clad in a beige sleeve; the bills are bound with a white paper band, and the portrait on the note — featuring a historical figure — is clearly visible. Below the image, centered at the bottom, the text “I love you.” appears in simple white sans-serif font against the gray background. The backdrop is indistinct — smudged shades of gray and black suggesting smoke, shadow, or abstract darkness — keeping all focus on the interaction between the woman and the money. Lighting is flat and even, highlighting facial features and currency details without dramatic contrast. The atmosphere is surreal, transactional, and emotionally charged — reducing affection to material exchange through literal visual metaphor within a minimal, stylized setting.](https://preview.redd.it/af98bt59961h1.png?width=3168&format=png&auto=webp&s=b8eec340cb04b30094353d3fbbd5f363fc163ce5)

[@zuharu,A medium close-up shot of a group of five people tightly huddled together in an indoor setting. At the top left, a man with dark slicked-back hair and stubble has his mouth wide open in a scream, tears streaming from his eyes, while his right hand grips the head of the woman below him. In the center, a young woman with short dark blue hair and wide blue eyes grits her teeth in an expression of strain or anger, her face pressed against the others. To the lower left, a young woman with voluminous curly pink hair smiles broadly with closed eyes, her arms wrapped around the group in an enthusiastic embrace. At the bottom center, a young person with spiky blond hair and wide orange eyes stares forward with a shocked expression, their face partially obscured by the others. On the right, a young woman with shoulder-length brown hair and purple eyes smiles brightly with her

2 views22:40

r/StableDiffusion

mouth open, leaning into the huddle with her hands clasped near her chest. The background consists of blurred wooden paneling and hanging tassels, suggesting a traditional room interior. The lighting is warm and even, highlighting the exaggerated facial expressions and physical closeness of the group, creating an atmosphere of chaotic, overwhelming emotional intensity and forced intimacy.](https://preview.redd.it/vwqys3bb961h1.png?width=3168&format=png&auto=webp&s=8d2bd4ade9ac828dce661384c61700852fd8eab4)

[@zhongerweiyuan,A medium shot captures a young woman with long, straight dark hair and bangs, seated atop a gray cylindrical utility pole against a plain pale green background. She wears a light lavender sailor-style school uniform with a white collar, dark blue bow at the chest, and matching pleated skirt; her right leg is bent with foot resting on the pole’s surface, left knee raised, hand placed near her ankle. Her expression is neutral to slightly concerned, eyes wide and directed forward. Extending from behind her lower back is a long, slender, vibrant pink tail that curves upward and arcs toward the upper right of the frame — its tip frayed or feathered in texture. Below her, two horizontal black cables stretch across the bottom edge, anchored by white ceramic insulators mounted on the pole. Lighting is flat and even, casting no shadows, emphasizing clean lines and solid color fields. The atmosphere is surreal and stylized — blending mundane urban infrastructure with fantastical anatomical detail through minimal setting, focused composition, and abrupt juxtaposition of ordinary attire with supernatural appendage.](https://preview.redd.it/z6ytg09d961h1.png?width=3168&format=png&auto=webp&s=5d29eef526d059cb2774e58be856dee69f21ba2d)

[@zeronis,A vertical two-panel composition depicts two characters in contrasting settings and emotional states. In the top panel, a young woman with shoulder-length black hair and glowing orange eyes leans forward against a starry night sky filled with dense constellations and nebulae; she wears a white long-sleeved shirt under a dark vest with a black bow tie, her right hand raised near her chin in a playful gesture, mouth open mid-speech as if asking a question — overlaid text in yellow reads “do u like stars?” Her cheeks are flushed pink, and faint shadows suggest ambient light from above or behind. In the bottom panel, a young man with messy blond hair lies on his back in green grass, wearing a torn white tank top that reveals bruises and dirt on his torso and arms; his expression is dazed and exhausted, eyes half-lidded, lips parted with visible teeth, sweat glistening on his forehead and neck. The background is tightly framed on the grass blades surrounding him, emphasizing grounding and physical weariness. Lighting contrasts sharply: celestial brilliance above versus muted natural daylight below. The atmosphere juxtaposes whimsical curiosity with weary realism, using visual disparity to imply narrative tension or ironic disconnect between the characters’ experiences within a single thematic exchange.](https://preview.redd.it/bz9lv3ag961h1.png?width=3168&format=png&auto=webp&s=3b33eb30d4a20fcb46534c316cf76406dce41725)

[@zawar379,A medium shot captures a man in mid-swing, wielding a large double-bitted axe with both hands raised above his right shoulder. He wears a black knit beanie pulled low over his forehead, revealing thick brown hair at the sides and back; his face is contorted into an intense grimace — brows furrowed, eyes narrowed, lips pressed tight around a clenched jaw. His attire includes a red-and-black plaid flannel shirt with rolled-up sleeves exposing white undershirt cuffs, paired with faded blue jeans. The axe has a light-colored wooden handle and a dark metal head with two sharp blades angled outward. His body is twisted dynamically: left leg bent forward, right leg trailing behind, torso rotated to generate momentum. Lighting is studio-style, directional from front-left, casting soft shadows on the plain beige backdrop that isolates him completely. The

2 views22:40

About

Blog

Apps

Platform