Data Science by ODS.ai 🦜
51K subscribers
363 photos
34 videos
7 files
1.52K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

High-quality #speechrecognition systems require large amounts of dataβ€”yet many languages have little data available. Check out new research into an end-to-end system trained as a single model allowing for real-time multilingual speech recognition.

Link: https://ai.googleblog.com/2019/09/large-scale-multilingual-speech.html

#speech #audio #DL #Google
​​Online speech recognition with wav2letter@anywhere

Facebook have open-sourced wav2letter@anywhere, an inference framework for online speech recognition that delivers state-of-the-art performance.

Link: https://ai.facebook.com/blog/online-speech-recognition-with-wav2letteranywhere/

#wav2letter #audiolearning #soundlearning #sound #acoustic #audio #facebook
​​MMS: Scaling Speech Technology to 1000+ languages

Get ready for a breakthrough in speech technology that is set to revolutionize the world of communication! The field, which has so far been restricted to around a hundred languages, barely scratches the surface of the more than 7,000 languages spoken globally. The Massively Multilingual Speech (MMS) project is taking a monumental leap to bridge this gap, increasing the number of supported languages by an astounding 10 to 40 times, depending on the task. This unprecedented expansion will be a game-changer, significantly improving global access to information and creating a more inclusive digital landscape.

This incredible feat is achieved through the creation of a new dataset drawn from publicly available religious texts and the strategic implementation of self-supervised learning. The MMS project's achievements are staggering, including the development of pre-trained wav2vec 2.0 models for 1,406 languages, a single multilingual automatic speech recognition model for 1,107 languages, speech synthesis models for as many languages, and a language identification model for a whopping 4,017 languages. Even more impressive is the significant improvement in accuracy - our multilingual speech recognition model more than halves the word error rate of Whisper on 54 languages of the FLEURS benchmark, despite being trained on a significantly smaller dataset.

Paper link: https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/
Blogpost link: https://ai.facebook.com/blog/multilingual-model-speech-recognition/
Code link: https://github.com/facebookresearch/fairseq/tree/main/examples/mms

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-mms
#deeplearning #speechrecognition #tts #audio
Forwarded from Machinelearning
🌟Qwen2-Audio: ΠžΠ±Ρ‰Π°ΠΉΡ‚Π΅ΡΡŒ с LLM ΠΏΠΎΠΌΠΎΡ‰ΡŒΡŽ голоса.

Qwen2-Audio - Π°ΡƒΠ΄ΠΈΠΎ-языковых модСль, которая способна ΠΏΡ€ΠΈΠ½ΠΈΠΌΠ°Ρ‚ΡŒ Π°ΡƒΠ΄ΠΈΠΎ ΠΈ тСкст Π½Π° Π²Ρ…ΠΎΠ΄ ΠΈ Π³Π΅Π½Π΅Ρ€ΠΈΡ€ΠΎΠ²Π°Ρ‚ΡŒ тСкст Π½Π° Π²Ρ‹Ρ…ΠΎΠ΄Π΅.

ΠŸΡ€Π΅Π΄ΡƒΡΠΌΠΎΡ‚Ρ€Π΅Π½ΠΎ Π΄Π²Π° Ρ€Π΅ΠΆΠΈΠΌΠ° взаимодСйствия:
🟠голосовой Ρ‡Π°Ρ‚: ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΠΈ ΠΌΠΎΠ³ΡƒΡ‚ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ голос для ΠΏΠ΅Ρ€Π΅Π΄Π°Ρ‡ΠΈ инструкций ΠΌΠΎΠ΄Π΅Π»ΠΈ Π±Π΅Π· Π±Π΅Π· Π²Π²ΠΎΠ΄Π° тСкста;
πŸŸ Π°ΡƒΠ΄ΠΈΠΎ-Π°Π½Π°Π»ΠΈΠ·: ΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚Π΅Π»ΠΈ ΠΌΠΎΠ³ΡƒΡ‚ ΠΏΡ€Π΅Π΄ΠΎΡΡ‚Π°Π²Π»ΡΡ‚ΡŒ Π°ΡƒΠ΄ΠΈΠΎΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡŽ (Π²ΠΊΠ»ΡŽΡ‡Π°Ρ Ρ€Π΅Ρ‡ΡŒ, Π·Π²ΡƒΠΊ, ΠΌΡƒΠ·Ρ‹ΠΊΡƒ) ΠΈ тСкстовыС инструкции для Π°Π½Π°Π»ΠΈΠ·Π°.

ОбС ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΈΠ²Π°ΡŽΡ‚ 8 языков ΠΈ Π΄ΠΈΠ°Π»Π΅ΠΊΡ‚ΠΎΠ²: китайский, английский, кантонский, французский, ΠΈΡ‚Π°Π»ΡŒΡΠ½ΡΠΊΠΈΠΉ, испанский, Π½Π΅ΠΌΠ΅Ρ†ΠΊΠΈΠΉ ΠΈ японский:

🟒Qwen2-Audio-7B

🟒Qwen2-Audio-7B-Instruct

Π˜Π½Ρ„Π΅Ρ€Π΅Π½Ρ Π½Π° transformers Π² cli Π²ΠΎΠ·ΠΌΠΎΠΆΠ΅Π½ Π² Π½Π΅ΡΠΊΠΎΠ»ΡŒΠΊΠΈΡ… Ρ€Π΅ΠΆΠΈΠΌΠ°Ρ…:

πŸŸ ΠΏΡ€ΠΎΡΡ‚ΠΎΠΉ инфСрСнс ΠΌΠΎΠ΄Π΅Π»ΠΈ Qwen2-Audio;
πŸŸ ΠΏΠ°ΠΊΠ΅Ρ‚Π½Ρ‹ΠΉ инфСрСнс (Π½Π°ΠΏΡ€ΠΈΠΌΠ΅Ρ€, нСсколько тСкстовых запросов ΠΊ Π°ΡƒΠ΄ΠΈΠΎΡ„Π°ΠΉΠ»Ρƒ);
πŸŸ ΠΈΠ½Ρ„Π΅Ρ€Π΅Π½Ρ Π°Π½Π°Π»ΠΈΠ·Π° Π°ΡƒΠ΄ΠΈΠΎ (Π² этом Ρ€Π΅ΠΆΠΈΠΌΠ΅ доступны ΠΈ тСкстовыС ΠΈ Π°ΡƒΠ΄ΠΈΠΎ-инструкции);
πŸŸ ΠΈΠ½Ρ„Π΅Ρ€Π΅Π½Ρ голосового Ρ‡Π°Ρ‚Π°.


β–ΆοΈΠ›ΠΎΠΊΠ°Π»ΡŒΠ½Ρ‹ΠΉ запуск с GradioUI:


# Ensure you have latest Hugging face transformers
pip install git+https://github.com/huggingface/transformers

# to build a web UI demoinstall the following packages
pip install -r requirements_web_demo.txt

# run Gradio web UI
python demo/web_demo_audio.py



πŸ“ŒΠ›ΠΈΡ†Π΅Π½Π·ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ : Apache 2.0


πŸŸ‘Π‘Ρ‚Ρ€Π°Π½ΠΈΡ†Π° ΠΏΡ€ΠΎΠ΅ΠΊΡ‚Π°
πŸŸ‘ΠšΠΎΠ»Π»Π΅ΠΊΡ†ΠΈΡ ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° HF
🟑Arxiv
πŸŸ‘Π‘ΠΎΠΎΠ±Ρ‰Π΅ΡΡ‚Π²ΠΎ Π² Discord
🟑Demo
πŸ–₯Github [ Stars: 618 | Issues: 7 | Forks: 17]

@ai_machinelearning_big_data

#AI #LLM #ML #Qwen2
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM