r/StableDiffusion

4 views07:40

r/StableDiffusion

VNCCS QIE2511 PoseStudio Lora for ART has been updated!

https://redd.it/1slyu0z
@rStableDiffusion

From the StableDiffusion community on Reddit: VNCCS QIE2511 PoseStudio Lora for ART has been updated!

Explore this post and more from the StableDiffusion community

5 views08:40

r/StableDiffusion

3 views08:40

r/StableDiffusion

Great news: the ERNIE editing model is expected to be released by the end of this month
https://redd.it/1sm05ml
@rStableDiffusion

4 views09:40

r/StableDiffusion

Last week in Generative Image & Video

I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from the last week:

Numina - Finally makes AI video generators count objects correctly. Ask for three cats, get three cats. Reads attention during generation, catches counting errors, corrects without retraining. [GitHub](https://github.com/H-EmbodVis/NUMINA) | [Project](https://h-embodvis.github.io/NUMINA/)

https://reddit.com/link/1slz1rq/video/t623pxnc2bvg1/player

Prompt Relay - Training-free temporal control for multi-event video generation. Routes each prompt to a specific time segment with zero computational overhead. Plug-and-play with Wan2.2, CogVideo, HunyuanVideo. Project

https://preview.redd.it/j1mpwbgt3bvg1.jpg?width=1900&format=pjpg&auto=webp&s=905891a7d7397a6a9f83d74b9824f7d6aa7f8005

Inspatio World - Takes a normal video and reconstructs a 4D world you can explore. Walk around in 3D, scrub time forward and back, no visible drift. Runs on consumer GPUs. [GitHub](https://github.com/inspatio/inspatio-world) | [Demo](https://world.inspatio.com/)

https://reddit.com/link/1slz1rq/video/wn2lgoqy2bvg1/player

C-MET (Cross-Modal Emotion Transfer) - Emotion editing for talking-face video via text, audio, or video prompts. CLIP-based alignment. Beats SadTalker and EDTalk. Project | GitHub

https://reddit.com/link/1slz1rq/video/q1f3ewi73bvg1/player

LTX 2.3 IC-LoRA Outpaint - By oumoumad. Extends LTX Video with outpainting that actually holds up. [Hugging Face](https://huggingface.co/oumoumad/LTX-2.3-22b-IC-LoRA-Outpaint)
ComfyUI-Image-Conveyor - By xmarre. Sequential drag-and-drop image queuing, processes one image per prompt run, supports manual reordering. GitHub

https://preview.redd.it/nl092r753bvg1.png?width=538&format=png&auto=webp&s=6e0ac1ca2ea6a2429fa1ab29fc7c2fdd071f94bf

Honorable Mentions:

Alibaba HappyHorse - New text- and image-to-video model, currently on top of the Artificial Analysis rankings. Still in beta(not available yet). [Benchmark](https://artificialanalysis.ai/text-to-video)

https://reddit.com/link/1slz1rq/video/q1xew5o13bvg1/player

Google FIT - 1.13M-triplet dataset for fit-aware virtual try-on with body measurements and 3D physics-based draping. Built on FLUX.1 + LoRA. Beats IDM-VTON on fit metrics. Project

https://preview.redd.it/ge0zqa0f3bvg1.png?width=1456&format=png&auto=webp&s=b1e56c273442c9ac42412a44a9494c96d2c136c2

Checkout the full roundup for more demos, papers, and resources.

(https://www.reddit.com/submit/?sourceid=t31slytmb&composerentry=crosspostprompt)

https://redd.it/1slz1rq
@rStableDiffusion

GitHub

GitHub - H-EmbodVis/NUMINA: [CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion…

[CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models - H-EmbodVis/NUMINA

4 views10:40

r/StableDiffusion

Complex & Weird Prompt Test: ERNIE Turbo | Flux.2 Klein 4B | Z-Image Turbo

https://redd.it/1sm32pz
@rStableDiffusion

From the StableDiffusion community on Reddit: Complex & Weird Prompt Test: ERNIE Turbo | Flux.2 Klein 4B | Z-Image Turbo

Explore this post and more from the StableDiffusion community

4 views11:40

r/StableDiffusion

I built a real-time telemetry dashboard for LTX 2.3 and discovered that "clean" math kills cinematic motion

Test1

Test2

Been doing controlled scheduler experiments and the results broke my assumptions completely.

Same prompt. Same seed. Same settings. Only the scheduler curve changed.

Scheduler graph is the top left blue graph. The noisy video is from the debug samplers vae preview.

Test 1 — steady decay curve (the "correct" math):

The video drifted. The model had too much time wandering in low-frequency noise. Character features warped. Background slowly lost coherence. The clean curve was the problem.

Test 2 — deliberate spike injected at the transition phase:

The spike forced the model to align with the prompt's kinetic requirements. The sob physics and flame flicker hit with near-perfect accuracy. "Shocking" the latent space prevented the drift entirely and locked the character into the high-velocity motion path.

The takeaway: a stable sigma curve in LTX 2.3 can be a recipe for identity loss. The model needs pressure at the right moment, not a smooth ride.

To actually see what was happening inside the sampler I built a debug dashboard that tracks sigma, SNR, velocity, cosine similarity, and high/mid/low frequency noise energy per step. That's what's shown in the image. Without it I would never have spotted the drift pattern.

Full breakdown of the methodology and the developing dashboard build here:

https://www.linkedin.com/pulse/developing-real-time-telemetry-dashboard-ltx-video-23-bezuidenhout-5laaf/

https://redd.it/1sm58vl
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views14:40

r/StableDiffusion

Illustrious Z
https://redd.it/1smazg6
@rStableDiffusion

6 views16:40

r/StableDiffusion

0:36

This media is not supported in your browser

VIEW IN TELEGRAM

Lyra 2.0 : Explorable Generative 3D Worlds

https://redd.it/1smbyjf
@rStableDiffusion

5 views17:40

r/StableDiffusion

Just bought RTX 3090
https://redd.it/1smeyz1
@rStableDiffusion

4 views19:40

r/StableDiffusion

*rubs hands together*
https://redd.it/1smfweh
@rStableDiffusion

6 views20:40

r/StableDiffusion

Dear mods, please care about this place. What currently happens is bullshit.

Today a lot of folk posted their GPU's they've bought. What the fuck has this to do with the essential core and initial thought of this sub? Moderate it for fucks sake, that's your job. Otherwise please find mods that actually care and don't have a dozen subs on their names, I look at you /u/dbzer0.

But what I find really disturbing: Do you want to show off you can buy something expensive in nowadays economy? Honestly great for you! But what the fuck has this to do with this sub? Go to /r/pcmasterrace.

This space was so much more about open source models, sharing and workflow optimizations with each other. Please get back to that since your rules state this too.

Nowadays it's a gallery of images without workflows and now GPU flexing? Why is it like this today?

We got a lot of models, past weeks were filled with new open source models all aroud the open space, but those information doesn't get to the top anymore. Nothing gets filtered anymore. People can post any sub-par image generations.

Do your job mods, please.

https://redd.it/1smj348
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

4 views21:40

r/StableDiffusion

Comparison of low Steps, Klein 9b x Z image turbo x Ernie Turbo x Qwen 2512 8 Steps

https://redd.it/1sme1k0
@rStableDiffusion

From the StableDiffusion community on Reddit: Comparison of low Steps, Klein 9b x Z image turbo x Ernie Turbo x Qwen 2512 8 Steps

Explore this post and more from the StableDiffusion community

3 views22:40

r/StableDiffusion

cute civitai.com downtime pic
https://redd.it/1smit1q
@rStableDiffusion

3 views23:40

r/StableDiffusion

Tencent HY-World-2.0 is now public

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

https://huggingface.co/tencent/HY-World-2.0

https://github.com/Tencent-Hunyuan/HY-World-2.0

https://preview.redd.it/x2nhoprmtfvg1.png?width=1920&format=png&auto=webp&s=e480c8bc65589154130efeaadfca70bb74d46b0e

https://3d-models.hunyuan.tencent.com/world/

https://3d-models.hunyuan.tencent.com/world/world2\_0/HY\_World\_2\_0.pdf

https://redd.it/1smmer5
@rStableDiffusion

huggingface.co

tencent/HY-World-2.0 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

4 views00:40

r/StableDiffusion

WAI-ANIMA 1.0 released
https://civitai.red/models/2544636?modelVersionId=2859702

https://redd.it/1smnwjl
@rStableDiffusion

civitai.red

WAI-ANIMA - v1.0 | Anima Checkpoint | Civitai

WAI-Anima - v1 - Free AI CHECKPOINT Download | Tensor.Art | Tensor.Art This is the first version of the model and is still in the exploration stage...

2 views01:40

r/StableDiffusion

I tested Ernie Image Turbo (fp8, nvfp4, fp16 and INT8) with Nano Banana Pro 2 Prompts so you won't have to

https://redd.it/1smo359
@rStableDiffusion

From the StableDiffusion community on Reddit: I tested Ernie Image Turbo (fp8, nvfp4, fp16 and INT8) with Nano Banana Pro 2 Prompts…

Explore this post and more from the StableDiffusion community

3 views02:40

r/StableDiffusion

Motif-Video-2B

https://huggingface.co/Motif-Technologies/Motif-Video-2B

https://motiftech.io/videoshowcase

Training strong video generation models usually requires massive datasets, large parameter counts, and substantial compute. Motif-Video 2B asks whether competitive text-to-video quality is reachable at a much smaller budget — fewer than 10M training clips and under 100,000 H200 GPU hours — and shows that the answer is yes, provided the model design explicitly separates objectives that scaling would otherwise leave entangled.

Our central observation is that prompt alignment, temporal consistency, and fine-detail recovery interfere with one another when handled through the same pathway. Motif-Video 2B addresses this objective interference architecturally rather than relying on scale alone, through two contributions:

Shared Cross-Attention. A residual cross-attention mechanism that reuses self-attention K/V weights to stabilize text–video alignment under long-context token sparsity, where standard joint attention dilutes text influence as the video token sequence grows.
Three-stage DDT-style backbone. 12 dual-stream + 16 single-stream + 8 DDT decoder layers, separating early modality fusion, joint representation learning, and high-frequency detail reconstruction into dedicated components. Per-block attention analysis shows that the DDT decoder spontaneously develops inter-frame attention structure absent from the encoder layers.

"Training strong video generation models usually requires massive datasets, large parameter counts, and substantial compute. Motif-Video 2B asks whether competitive text-to-video quality is reachable at a much smaller budget — fewer than 10M training clips and under 100,000 H200 GPU hours — and shows that the answer is yes, provided the model design explicitly separates objectives that scaling would otherwise leave entangled.

Our central observation is that prompt alignment, temporal consistency, and fine-detail recovery interfere with one another when handled through the same pathway. Motif-Video 2B addresses this objective interference architecturally rather than relying on scale alone, through two contributions:

Shared Cross-Attention. A residual cross-attention mechanism that reuses self-attention K/V weights to stabilize text–video alignment under long-context token sparsity, where standard joint attention dilutes text influence as the video token sequence grows.
Three-stage DDT-style backbone. 12 dual-stream + 16 single-stream + 8 DDT decoder layers, separating early modality fusion, joint representation learning, and high-frequency detail reconstruction into dedicated components. Per-block attention analysis shows that the DDT decoder spontaneously develops inter-frame attention structure absent from the encoder layers."

https://redd.it/1smonvh
@rStableDiffusion

huggingface.co

Motif-Technologies/Motif-Video-2B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

3 views03:40

About

Blog

Apps

Platform