New SOTA on TTS from Microsoft Research Asia (outside of ICASSP)
Uses 24 hours (13100 utterances) from LJSpeech, 200M text sentences for phoneme encoder pretraining and a g2p model. 8 V100 GPUs. 3000 epochs.
https://speechresearch.github.io/naturalspeech/
Uses 24 hours (13100 utterances) from LJSpeech, 200M text sentences for phoneme encoder pretraining and a g2p model. 8 V100 GPUs. 3000 epochs.
https://speechresearch.github.io/naturalspeech/
In the mean time all Interspeech 2021 videos have been made available https://www.superlectures.com/interspeech2021/tutorials
https://www.youtube.com/channel/UC2-z0HD4WpSbJONj73BgfwQ/videos
https://www.youtube.com/channel/UC2-z0HD4WpSbJONj73BgfwQ/videos
5297-1.pdf
888.6 KB
https://www.youtube.com/watch?v=-p_awLZWLeI
https://github.com/facebookresearch/vocoder-benchmark
VocBench from Facebook
Autoregressive vocoders: WaveNet, WaveRNN
GANs: Parallel WaveGAN, MelGAN
Diffusion: WaveGrad, DiffWave
All in one place with a common input-output interface with modern codebase from Facebook.
Might be useful for VC if it’s easy to make condition those vocoders using custom features.
https://github.com/facebookresearch/vocoder-benchmark
VocBench from Facebook
Autoregressive vocoders: WaveNet, WaveRNN
GANs: Parallel WaveGAN, MelGAN
Diffusion: WaveGrad, DiffWave
All in one place with a common input-output interface with modern codebase from Facebook.
Might be useful for VC if it’s easy to make condition those vocoders using custom features.
Neural HMM: learns alignments fast
https://shivammehta007.github.io/Neural-HMM/
Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.
https://shivammehta007.github.io/Neural-HMM/
Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.
Prosody annotations for Switchboard: https://groups.inf.ed.ac.uk/switchboard/index.html
Vol Building AGI
Photo
Neural Text to Speech Synthesis Tutorial
https://github.com/tts-tutorial/icassp2022
Survey paper: https://arxiv.org/abs/2106.15561
https://github.com/tts-tutorial/icassp2022
Survey paper: https://arxiv.org/abs/2106.15561
GitHub
GitHub - tts-tutorial/icassp2022
Contribute to tts-tutorial/icassp2022 development by creating an account on GitHub.
Convolutional Pitch Tracker (ICASSP 2018)
https://marl.github.io/crepe/
PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe
https://marl.github.io/crepe/
PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe