Speech Technology – Telegram

Speech Technology

1.59K subscribers

122 photos

4 videos

1 file

2.12K links

Download Telegram

About

Blog

Apps

Platform

Speech Technology

1.59K subscribers

Speech Technology

New Mandarin TTS dataset

https://www.openslr.org/138/

SHALCAS22A
Identifier: SLR138

Summary: A Chinese Mandarin corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd.

Category: Speech

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Downloads (use a mirror closer to you):
SHALCAS22A.tgz [3.9G] ( Corpus ) Mirrors: [US] [EU] [CN]

About this resource:

SHALCAS22A is a 1-channel Chinese Mandarin speech corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd. It was collected over a Hi-Fi microphone in a quiet environment. The corpus contains 14,580 utterances from 60 speakers. Each speaker has 243 utterances.
The contents include number passwords, short Chinese words, and long Chinese sentences. The mapping between the content and utterance is given in content.txt.

This corpus can be used in text-dependent speaker verification on number passwords, text-independent speaker verification on short utterances, and other speech-related fields. Please cite the corpus as "SHALCAS22A, a free Chinese Mandarin corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd., 2022".

Contact: Feng Hong, hongfeng@mail.ioa.ac.cn

845 views00:36

Speech Technology

Encodec has just changed to an MIT license. Great news for anyone working on LM approaches to audio or just looking for a high-quality audio codec.

No training code but still a really significant change.

https://github.com/facebookresearch/encodec/commit/349b72939f57cb3bc7b60906c0ee8228c849485d

Releasing under the MIT license · facebookresearch/encodec@349b729

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio. - Releasing under the MIT license · facebookresearch/encodec@349b729

850 views21:23

Speech Technology

Open Preview for #ICASSP2023 is now available on
@IEEEXplore
! Available through June 10, you can now browse all the papers that were accepted to ICASSP 2023, free of charge. Browse research here: https://hubs.la/Q01N_PdX0

869 views22:32

Speech Technology

https://www.assemblyai.com/blog/lemur-early-access/

820 views19:04

Speech Technology

Good VS quality

https://quickvc.github.io/quickvc-demo/

https://github.com/quickvc/QuickVC-VoiceConversion

GitHub - quickvc/QuickVC-VoiceConversion: QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for…

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion - quickvc/QuickVC-VoiceConversion

795 views03:13

Speech Technology

https://github.com/suzuki256/dog-dataset

GitHub - suzuki256/dog-dataset

Contribute to suzuki256/dog-dataset development by creating an account on GitHub.

683 views03:16

Speech Technology

https://www.youtube.com/watch?v=SyJkrdF2Ed4

Recent Works on Speech Translation at Naver Labs Europe

Speakers:
Ioan Calapodescu, Senior Scientist at Naver Labs Europe, NLP team
Laurent Besacier, Principal Scientist at Naver Labs Europe, Interactive Systems Group Lead

Title:
Recent Works on Speech Translation at Naver Labs Europe

Abstract:
In this talk…

782 views04:03

Speech Technology

People report Whisper PEFT + LORA tuning gives quite good results:

https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb

peft/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb at main · huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft

852 views14:54

Speech Technology

EfficientSpeech, or ES for short, is an efficient neural text to speech (TTS) model. It generates mel spectrogram at a speed of 104 (mRTF) or 104 secs of speech per sec on an RPi4. Its tiny version has a footprint of just 266k parameters. Generating 6 secs of speech consumes 90 MFLOPS only.

https://github.com/roatienza/efficientspeech

https://roatienza.github.io/efficientspeech-demo/

GitHub - roatienza/efficientspeech: PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.

PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023. - GitHub - roatienza/efficientspeech: PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.

716 views06:14

Speech Technology

May 12, 2023: Challenge announcement
May 19, 2023: Leaderboard is online and accepting submissions
June 26, 2023: New Language Track Submission Deadline
July 07, 2023: Paper / Model Submission Deadline
July 10, 2023: Paper Revision Deadline

🌍🗣️SUPERB benchmark is back with ML-SUPERB, its multilingual version! The challenge, as one of the #ASRU2023 challenges, includes 3 tracks:
1️⃣ML-SUPERB: For multilingual SSL
2️⃣New language: To new languages!
3️⃣Research: For research papers

More to see 👉 https://multilingual.superbbenchmark.org

multilingual.superbbenchmark.org

ML-SUPERB: Multilingual Speech processing Universal PERformance Benchmark

A multilingual benchmark for Self-supervised Speech Representation Learning

766 viewsedited 09:06

Speech Technology

https://github.com/facebookresearch/AudioDec

GitHub - facebookresearch/AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

An Open-source Streaming High-fidelity Neural Audio Codec - facebookresearch/AudioDec

827 views20:13

Speech Technology

Universal Source Separation with Weakly Labelled Data

abs: https://arxiv.org/abs/2305.07447
paper page: https://huggingface.co/papers/2305.07447
github: https://github.com/bytedance/uss

805 views04:04

Speech Technology

Some people implement streaming speaker diarization manually

https://github.com/pyannote/pyannote-audio/commit/4a6ea9c825b9447a7d03cb9bd94f5f81d661ca16

others just ask ChatGPT to write it

https://github.com/huseinzol05/malaya-speech/commit/564f50c0d91528126fe3b410f387d1b4ff33d364

ChatGPT version is not that bad

wip: add streaming speaker diarization task · pyannote/pyannote-audio@4a6ea9c

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding - wip: add streaming speaker diarization task · pyannote/pyannote-audio@4a6ea9c

822 views23:04

Speech Technology

The first Arabic TTS Challenge - QASR TTS 1.0 is on!! Register and build your own Arabic Anchor Voice and contribute to enriching #ArabicAI #ASRU2023Challege
More details: https://arabicspeech.org/qasr-challenge/

https://twitter.com/shammur_absar/status/1658429029483986944

790 viewsedited 18:00

Speech Technology

Some nice things from industry, autoscaling with Triton and Kubernetes

https://www.speechmatics.com/company/articles-and-news/autoscaling-with-gpu-transcription-models

Autoscaling with GPU Transcription models

Speechmatics has recently switched from CPUs to GPUs to run most batch transcription models. Better hardware = increased accuracy. Find out more!

809 views18:51

Speech Technology

Multilingual TTS from ElevenLabs

https://twitter.com/radamar/status/1658540025611685888

https://huggingface.co/spaces/elevenlabs/tts

881 viewsedited 18:52

Speech Technology

Recent advances in the AudioLM family: 100x higher speed, better consistency, no quality hit - a new paper from and the AudioLM team.

Give it a listen: https://google-research.github.io/seanet/soundstorm/examples/

Arxiv:
https://arxiv.org/abs/2305.09636

919 views12:23

Speech Technology

Final VoxCeleb Challenge

https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/competition2023.html

Timeline
May 20th Development set for verification tracks released.
May 31rd Development set for diarisation tracks released.
June 1st Test set released and evaluation server open.
Early August Deadline for submission of results; invitation to workshop speakers.
August 20th Challenge workshop

873 views04:02

Speech Technology

3 nice Persian TTS datasets

https://www.kaggle.com/magnoliasis/datasets

Kaggle profile for Magnoliasis

1.04K views03:10

Speech Technology

Whisper is essentially an audio-conditioned LLM. Can we prompt it to do unseen tasks? Introducing PromptingWhisper!

We use simple prompts to adapt Whisper to unseen tasks zero-shot without any finetuning.

📄 Paper: http://arxiv.org/abs/2305.11095
💻 Code: https://github.com/jasonppy/PromptingWhisper

GitHub - jasonppy/PromptingWhisper: Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and…

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation - jasonppy/PromptingWhisper

1.32K views19:01

Speech Technology

https://twitter.com/csteinmetz1/status/1659458441197355008

I was complaining that LLMs don't have ears... This paper is a solid attempt to try to make that happen.

abs: https://arxiv.org/abs/2305.10790
Work from Yuan Gong et al. at MIT

857 views19:06