Please open Telegram to view this post
VIEW IN TELEGRAM
β€9
Ex_Files_GPT_Building_AI_Apps.zip
114.9 KB
Please open Telegram to view this post
VIEW IN TELEGRAM
β€6
π£ Dia is a new open source text-to-speech model from Nari Labs with 1.6 billion parameters that can generate rich, full-fledged dialogue.
Key features:
- Ultra-realistic dialogue. Generation of coordinated lines of two "speaking" characters, tagged [S1] and [S2] in one text.
- Emotions and tone. You can set the tone and intonation through an acoustic request (audio prompt), and also control the "non-verbal": laughter, coughing, sighs, etc.
- Voice cloning. Voice cloning based on a short sample: upload audio and its transcript, and the model will adapt to the specified timbre
GitHub
The model is written in Python (100% of the code) using PyTorch 2.0 and CUDA 12.6
Performance and requirements:
The full version requires β10 GB of VRAM; quantization of the model is planned in the future.
Installation and launch:
The Gradio interface immediately shows the difference with ElevenLabs and Sesame CSMβ1B
License: Apache 2.0.
Dia is great for ML research in TTS: you get open weight files, a flexible scripting API, and a UI for quickly testing hypotheses.
Dia currently only supports speech generation in English.
βͺοΈDemo
βͺοΈ Github
βͺοΈHF
Key features:
- Ultra-realistic dialogue. Generation of coordinated lines of two "speaking" characters, tagged [S1] and [S2] in one text.
- Emotions and tone. You can set the tone and intonation through an acoustic request (audio prompt), and also control the "non-verbal": laughter, coughing, sighs, etc.
- Voice cloning. Voice cloning based on a short sample: upload audio and its transcript, and the model will adapt to the specified timbre
GitHub
The model is written in Python (100% of the code) using PyTorch 2.0 and CUDA 12.6
Performance and requirements:
The full version requires β10 GB of VRAM; quantization of the model is planned in the future.
Installation and launch:
pip install git+https://github.com/nari-labs/dia.git
git clone https://github.com/nari-labs/dia.git
cd dia
uv run app.py # or python app.pyThe Gradio interface immediately shows the difference with ElevenLabs and Sesame CSMβ1B
License: Apache 2.0.
Dia is great for ML research in TTS: you get open weight files, a flexible scripting API, and a UI for quickly testing hypotheses.
Dia currently only supports speech generation in English.
βͺοΈDemo
βͺοΈ Github
βͺοΈHF
β€21π1
Please open Telegram to view this post
VIEW IN TELEGRAM
β€19
Please open Telegram to view this post
VIEW IN TELEGRAM
π2
Prompt share: Plush fruits
Prompt:
Prompt:
Soft and plush 3D model of a [subject] with a [key detail], rendered in a cute, stylized aesthetic. The texture is velvety and squeezable, emphasizing the charm of animated [object type] designs. Clean background, centered composition
β€19π₯1
Please open Telegram to view this post
VIEW IN TELEGRAM
β€5
Please open Telegram to view this post
VIEW IN TELEGRAM
β€3
From prompt engineering to AI agents and automation, these are the skills that separate AI users from AI builders.
Start learning now to stay ahead.
Please open Telegram to view this post
VIEW IN TELEGRAM
β€21π₯1
π§ 7 AI Skills You Must Have in 2026
β€18π₯6