r/StableDiffusion

Last week in Generative Image & Video

I curate a weekly multimodal AI roundup, here are the open-source image & video highlights from the last week:

Numina - Finally makes AI video generators count objects correctly. Ask for three cats, get three cats. Reads attention during generation, catches counting errors, corrects without retraining. [GitHub](https://github.com/H-EmbodVis/NUMINA) | [Project](https://h-embodvis.github.io/NUMINA/)

https://reddit.com/link/1slz1rq/video/t623pxnc2bvg1/player

Prompt Relay - Training-free temporal control for multi-event video generation. Routes each prompt to a specific time segment with zero computational overhead. Plug-and-play with Wan2.2, CogVideo, HunyuanVideo. Project

https://preview.redd.it/j1mpwbgt3bvg1.jpg?width=1900&format=pjpg&auto=webp&s=905891a7d7397a6a9f83d74b9824f7d6aa7f8005

Inspatio World - Takes a normal video and reconstructs a 4D world you can explore. Walk around in 3D, scrub time forward and back, no visible drift. Runs on consumer GPUs. [GitHub](https://github.com/inspatio/inspatio-world) | [Demo](https://world.inspatio.com/)

https://reddit.com/link/1slz1rq/video/wn2lgoqy2bvg1/player

C-MET (Cross-Modal Emotion Transfer) - Emotion editing for talking-face video via text, audio, or video prompts. CLIP-based alignment. Beats SadTalker and EDTalk. Project | GitHub

https://reddit.com/link/1slz1rq/video/q1f3ewi73bvg1/player

LTX 2.3 IC-LoRA Outpaint - By oumoumad. Extends LTX Video with outpainting that actually holds up. [Hugging Face](https://huggingface.co/oumoumad/LTX-2.3-22b-IC-LoRA-Outpaint)
ComfyUI-Image-Conveyor - By xmarre. Sequential drag-and-drop image queuing, processes one image per prompt run, supports manual reordering. GitHub

https://preview.redd.it/nl092r753bvg1.png?width=538&format=png&auto=webp&s=6e0ac1ca2ea6a2429fa1ab29fc7c2fdd071f94bf

Honorable Mentions:

Alibaba HappyHorse - New text- and image-to-video model, currently on top of the Artificial Analysis rankings. Still in beta(not available yet). [Benchmark](https://artificialanalysis.ai/text-to-video)

https://reddit.com/link/1slz1rq/video/q1xew5o13bvg1/player

Google FIT - 1.13M-triplet dataset for fit-aware virtual try-on with body measurements and 3D physics-based draping. Built on FLUX.1 + LoRA. Beats IDM-VTON on fit metrics. Project

https://preview.redd.it/ge0zqa0f3bvg1.png?width=1456&format=png&auto=webp&s=b1e56c273442c9ac42412a44a9494c96d2c136c2

Checkout the full roundup for more demos, papers, and resources.

(https://www.reddit.com/submit/?sourceid=t31slytmb&composerentry=crosspostprompt)

https://redd.it/1slz1rq
@rStableDiffusion

GitHub

GitHub - H-EmbodVis/NUMINA: [CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion…

[CVPR 2026] When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models - H-EmbodVis/NUMINA

4 views10:40

r/StableDiffusion

Complex & Weird Prompt Test: ERNIE Turbo | Flux.2 Klein 4B | Z-Image Turbo

https://redd.it/1sm32pz
@rStableDiffusion

From the StableDiffusion community on Reddit: Complex & Weird Prompt Test: ERNIE Turbo | Flux.2 Klein 4B | Z-Image Turbo

Explore this post and more from the StableDiffusion community

4 views11:40

r/StableDiffusion

I built a real-time telemetry dashboard for LTX 2.3 and discovered that "clean" math kills cinematic motion

Test1

Test2

Been doing controlled scheduler experiments and the results broke my assumptions completely.

Same prompt. Same seed. Same settings. Only the scheduler curve changed.

Scheduler graph is the top left blue graph. The noisy video is from the debug samplers vae preview.

Test 1 — steady decay curve (the "correct" math):

The video drifted. The model had too much time wandering in low-frequency noise. Character features warped. Background slowly lost coherence. The clean curve was the problem.

Test 2 — deliberate spike injected at the transition phase:

The spike forced the model to align with the prompt's kinetic requirements. The sob physics and flame flicker hit with near-perfect accuracy. "Shocking" the latent space prevented the drift entirely and locked the character into the high-velocity motion path.

The takeaway: a stable sigma curve in LTX 2.3 can be a recipe for identity loss. The model needs pressure at the right moment, not a smooth ride.

To actually see what was happening inside the sampler I built a debug dashboard that tracks sigma, SNR, velocity, cosine similarity, and high/mid/low frequency noise energy per step. That's what's shown in the image. Without it I would never have spotted the drift pattern.

Full breakdown of the methodology and the developing dashboard build here:

https://www.linkedin.com/pulse/developing-real-time-telemetry-dashboard-ltx-video-23-bezuidenhout-5laaf/

https://redd.it/1sm58vl
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

6 views14:40

r/StableDiffusion

Illustrious Z
https://redd.it/1smazg6
@rStableDiffusion

6 views16:40

r/StableDiffusion

0:36

This media is not supported in your browser

VIEW IN TELEGRAM

Lyra 2.0 : Explorable Generative 3D Worlds

https://redd.it/1smbyjf
@rStableDiffusion

5 views17:40

r/StableDiffusion

Just bought RTX 3090
https://redd.it/1smeyz1
@rStableDiffusion

4 views19:40

r/StableDiffusion

*rubs hands together*
https://redd.it/1smfweh
@rStableDiffusion

6 views20:40

r/StableDiffusion

Dear mods, please care about this place. What currently happens is bullshit.

Today a lot of folk posted their GPU's they've bought. What the fuck has this to do with the essential core and initial thought of this sub? Moderate it for fucks sake, that's your job. Otherwise please find mods that actually care and don't have a dozen subs on their names, I look at you /u/dbzer0.

But what I find really disturbing: Do you want to show off you can buy something expensive in nowadays economy? Honestly great for you! But what the fuck has this to do with this sub? Go to /r/pcmasterrace.

This space was so much more about open source models, sharing and workflow optimizations with each other. Please get back to that since your rules state this too.

Nowadays it's a gallery of images without workflows and now GPU flexing? Why is it like this today?

We got a lot of models, past weeks were filled with new open source models all aroud the open space, but those information doesn't get to the top anymore. Nothing gets filtered anymore. People can post any sub-par image generations.

Do your job mods, please.

https://redd.it/1smj348
@rStableDiffusion

From the StableDiffusion community on Reddit

Explore this post and more from the StableDiffusion community

4 views21:40

r/StableDiffusion

Comparison of low Steps, Klein 9b x Z image turbo x Ernie Turbo x Qwen 2512 8 Steps

https://redd.it/1sme1k0
@rStableDiffusion

From the StableDiffusion community on Reddit: Comparison of low Steps, Klein 9b x Z image turbo x Ernie Turbo x Qwen 2512 8 Steps

Explore this post and more from the StableDiffusion community

2 views22:40

About

Blog

Apps

Platform