Speech Technology
1.59K subscribers
122 photos
4 videos
1 file
2.12K links
Download Telegram
New Mandarin TTS dataset

https://www.openslr.org/138/

SHALCAS22A
Identifier: SLR138

Summary: A Chinese Mandarin corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd.

Category: Speech

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

Downloads (use a mirror closer to you):
SHALCAS22A.tgz [3.9G] ( Corpus ) Mirrors: [US] [EU] [CN]


About this resource:

SHALCAS22A is a 1-channel Chinese Mandarin speech corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd. It was collected over a Hi-Fi microphone in a quiet environment. The corpus contains 14,580 utterances from 60 speakers. Each speaker has 243 utterances.
The contents include number passwords, short Chinese words, and long Chinese sentences. The mapping between the content and utterance is given in content.txt.

This corpus can be used in text-dependent speaker verification on number passwords, text-independent speaker verification on short utterances, and other speech-related fields. Please cite the corpus as "SHALCAS22A, a free Chinese Mandarin corpus by Shanghai Acoustics Laboratory, CAS and Wuxi Sandu Intelligent Technology Co., Ltd., 2022".

Contact: Feng Hong, hongfeng@mail.ioa.ac.cn
Open Preview for #ICASSP2023 is now available on
@IEEEXplore
! Available through June 10, you can now browse all the papers that were accepted to ICASSP 2023, free of charge. Browse research here: https://hubs.la/Q01N_PdX0
EfficientSpeech, or ES for short, is an efficient neural text to speech (TTS) model. It generates mel spectrogram at a speed of 104 (mRTF) or 104 secs of speech per sec on an RPi4. Its tiny version has a footprint of just 266k parameters. Generating 6 secs of speech consumes 90 MFLOPS only.

https://github.com/roatienza/efficientspeech

https://roatienza.github.io/efficientspeech-demo/
May 12, 2023: Challenge announcement
May 19, 2023: Leaderboard is online and accepting submissions
June 26, 2023: New Language Track Submission Deadline
July 07, 2023: Paper / Model Submission Deadline
July 10, 2023: Paper Revision Deadline

🌍🗣️SUPERB benchmark is back with ML-SUPERB, its multilingual version! The challenge, as one of the #ASRU2023 challenges, includes 3 tracks:
1️⃣ML-SUPERB: For multilingual SSL
2️⃣New language: To new languages!
3️⃣Research: For research papers

More to see 👉 https://multilingual.superbbenchmark.org
Universal Source Separation with Weakly Labelled Data

abs: https://arxiv.org/abs/2305.07447
paper page: https://huggingface.co/papers/2305.07447
github: https://github.com/bytedance/uss
The first Arabic TTS Challenge - QASR TTS 1.0 is on!! Register and build your own Arabic Anchor Voice and contribute to enriching #ArabicAI #ASRU2023Challege
More details: https://arabicspeech.org/qasr-challenge/

https://twitter.com/shammur_absar/status/1658429029483986944
Recent advances in the AudioLM family: 100x higher speed, better consistency, no quality hit - a new paper from and the AudioLM team.

Give it a listen: https://google-research.github.io/seanet/soundstorm/examples/

Arxiv:
https://arxiv.org/abs/2305.09636
Final VoxCeleb Challenge

https://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/competition2023.html

Timeline
May 20th Development set for verification tracks released.
May 31rd Development set for diarisation tracks released.
June 1st Test set released and evaluation server open.
Early August Deadline for submission of results; invitation to workshop speakers.
August 20th Challenge workshop
https://twitter.com/csteinmetz1/status/1659458441197355008

I was complaining that LLMs don't have ears... This paper is a solid attempt to try to make that happen.

abs: https://arxiv.org/abs/2305.10790
Work from Yuan Gong et al. at MIT