Vol Building AGI

New SOTA on TTS from Microsoft Research Asia (outside of ICASSP)

Uses 24 hours (13100 utterances) from LJSpeech, 200M text sentences for phoneme encoder pretraining and a g2p model. 8 V100 GPUs. 3000 epochs.

https://speechresearch.github.io/naturalspeech/

33 views11:24

Vol Building AGI

In the mean time all Interspeech 2021 videos have been made available https://www.superlectures.com/interspeech2021/tutorials

https://www.youtube.com/channel/UC2-z0HD4WpSbJONj73BgfwQ/videos

33 viewsedited 14:23

Vol Building AGI

5297-1.pdf

888.6 KB

https://www.youtube.com/watch?v=-p_awLZWLeI

https://github.com/facebookresearch/vocoder-benchmark

VocBench from Facebook

Autoregressive vocoders: WaveNet, WaveRNN
GANs: Parallel WaveGAN, MelGAN
Diffusion: WaveGrad, DiffWave

All in one place with a common input-output interface with modern codebase from Facebook.

Might be useful for VC if it’s easy to make condition those vocoders using custom features.

36 viewsedited 14:53

Vol Building AGI

Neural HMM: learns alignments fast

https://shivammehta007.github.io/Neural-HMM/

Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.

36 views15:20

37 views15:20

39 views16:20

About

Blog

Apps

Platform