New SOTA on TTS from Microsoft Research Asia (outside of ICASSP)
Uses 24 hours (13100 utterances) from LJSpeech, 200M text sentences for phoneme encoder pretraining and a g2p model. 8 V100 GPUs. 3000 epochs.
https://speechresearch.github.io/naturalspeech/
Uses 24 hours (13100 utterances) from LJSpeech, 200M text sentences for phoneme encoder pretraining and a g2p model. 8 V100 GPUs. 3000 epochs.
https://speechresearch.github.io/naturalspeech/
In the mean time all Interspeech 2021 videos have been made available https://www.superlectures.com/interspeech2021/tutorials
https://www.youtube.com/channel/UC2-z0HD4WpSbJONj73BgfwQ/videos
https://www.youtube.com/channel/UC2-z0HD4WpSbJONj73BgfwQ/videos