LTX 2.3 audio as standalone speech model.

User @wildmindai from X posted about this new model. Has anyone here tried it yet?

LTX 2.3 audio as standalone speech model.

Emotional TTS with Scenema Audio.

\- Zero-shot expressive voice cloning, speech gen

\- 8-step distilled with Gemma 3 12B text encoding

\- stage directions via <action> tags

\- runs at 1.5x real-time on RTX 4090

\- fits in 16GB VRAM

\- 13 languages, 48kHz stereo output

it also gens matching environment sounds

https://huggingface.co/ScenemaAI/scenema-audio

https://redd.it/1tab0tb
@rStableDiffusion
I have to pretend I hate image generation AI to avoid getting banned or insulted on 99% of Reddit or the internet, even though Stable Diffusion is actually what I like and am most excited about right now. Why do people hate AI so much, especially image generation AI?

I'm not even saying I care if they know the difference between open-source and closed-source image-generating AI, or if they insult me ​​or not.

What I want to know is why so many people hate AI, especially image-generating AI.

At first, I thought it only bothered artists. Then I thought it might also bother those who are afraid of not being able to distinguish AI from reality.

But it's practically 99% of people who hate AI, and I just can't understand why.

For example, I've been using Blender for years. I learned to model, sculpt, and animate as an amateur. Thanks to AI, things that used to take me months now take me seconds. Isn't that supposed to be a good thing?

I don't feel bad or like I've wasted my time using Blender; I simply feel fortunate to have found a better tool for what I needed.

EDIT 1: When I say "Stable Diffusion" I mean the open source model community, all models, not "SD" specifically.

https://redd.it/1tahphc
@rStableDiffusion
This media is not supported in your browser
VIEW IN TELEGRAM
Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU β€” FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline

https://redd.it/1tamqbf
@rStableDiffusion