LTX 2.3 audio as standalone speech model.
User @wildmindai from X posted about this new model. Has anyone here tried it yet?
LTX 2.3 audio as standalone speech model.
Emotional TTS with Scenema Audio.
\- Zero-shot expressive voice cloning, speech gen
\- 8-step distilled with Gemma 3 12B text encoding
\- stage directions via <action> tags
\- runs at 1.5x real-time on RTX 4090
\- fits in 16GB VRAM
\- 13 languages, 48kHz stereo output
it also gens matching environment sounds
https://huggingface.co/ScenemaAI/scenema-audio
https://redd.it/1tab0tb
@rStableDiffusion
User @wildmindai from X posted about this new model. Has anyone here tried it yet?
LTX 2.3 audio as standalone speech model.
Emotional TTS with Scenema Audio.
\- Zero-shot expressive voice cloning, speech gen
\- 8-step distilled with Gemma 3 12B text encoding
\- stage directions via <action> tags
\- runs at 1.5x real-time on RTX 4090
\- fits in 16GB VRAM
\- 13 languages, 48kHz stereo output
it also gens matching environment sounds
https://huggingface.co/ScenemaAI/scenema-audio
https://redd.it/1tab0tb
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: LTX 2.3 audio as standalone speech model.
Explore this post and more from the StableDiffusion community
I have to pretend I hate image generation AI to avoid getting banned or insulted on 99% of Reddit or the internet, even though Stable Diffusion is actually what I like and am most excited about right now. Why do people hate AI so much, especially image generation AI?
I'm not even saying I care if they know the difference between open-source and closed-source image-generating AI, or if they insult me or not.
What I want to know is why so many people hate AI, especially image-generating AI.
At first, I thought it only bothered artists. Then I thought it might also bother those who are afraid of not being able to distinguish AI from reality.
But it's practically 99% of people who hate AI, and I just can't understand why.
For example, I've been using Blender for years. I learned to model, sculpt, and animate as an amateur. Thanks to AI, things that used to take me months now take me seconds. Isn't that supposed to be a good thing?
I don't feel bad or like I've wasted my time using Blender; I simply feel fortunate to have found a better tool for what I needed.
EDIT 1: When I say "Stable Diffusion" I mean the open source model community, all models, not "SD" specifically.
https://redd.it/1tahphc
@rStableDiffusion
I'm not even saying I care if they know the difference between open-source and closed-source image-generating AI, or if they insult me or not.
What I want to know is why so many people hate AI, especially image-generating AI.
At first, I thought it only bothered artists. Then I thought it might also bother those who are afraid of not being able to distinguish AI from reality.
But it's practically 99% of people who hate AI, and I just can't understand why.
For example, I've been using Blender for years. I learned to model, sculpt, and animate as an amateur. Thanks to AI, things that used to take me months now take me seconds. Isn't that supposed to be a good thing?
I don't feel bad or like I've wasted my time using Blender; I simply feel fortunate to have found a better tool for what I needed.
EDIT 1: When I say "Stable Diffusion" I mean the open source model community, all models, not "SD" specifically.
https://redd.it/1tahphc
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit
Explore this post and more from the StableDiffusion community
This media is not supported in your browser
VIEW IN TELEGRAM
Built an open-source one-prompt-to-cinematic-reel pipeline on a single GPU — FLUX.2 [klein] for character keyframes, Wan2.2-I2V for animation, vision critic with auto-retry, music + 9-language narration in the same pipeline
https://redd.it/1tamqbf
@rStableDiffusion
https://redd.it/1tamqbf
@rStableDiffusion
looks like Runexx made that dub lora for ltx turn any silent video into speaking
Video-2-Video/LTX-2.3\_-\_V2V\_Just\_Talk\_dub\_any\_silent\_video\_multilanguage.json · RuneXX/LTX-2.3-Workflows at main
https://redd.it/1tabyy3
@rStableDiffusion
Video-2-Video/LTX-2.3\_-\_V2V\_Just\_Talk\_dub\_any\_silent\_video\_multilanguage.json · RuneXX/LTX-2.3-Workflows at main
https://redd.it/1tabyy3
@rStableDiffusion
Reddit
From the StableDiffusion community on Reddit: looks like Runexx made that dub lora for ltx turn any silent video into speaking
Explore this post and more from the StableDiffusion community
ComfyUI Support for HiDream-01-Image Released
The support for HiDream-01-Image has been merged into ComfyUI. (Thanks to Kijai.)
ComfyUI versions of the checkpoints.
https://redd.it/1tapxvf
@rStableDiffusion
The support for HiDream-01-Image has been merged into ComfyUI. (Thanks to Kijai.)
ComfyUI versions of the checkpoints.
https://redd.it/1tapxvf
@rStableDiffusion
GitHub
GitHub - Comfy-Org/ComfyUI: The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. - Comfy-Org/ComfyUI