Vol Building AGI
580 subscribers
116 photos
9 videos
12 files
199 links
Past topics: speech synthesis, transformers, LSTM, recurrence
Download Telegram
5297-1.pdf
888.6 KB
https://www.youtube.com/watch?v=-p_awLZWLeI

https://github.com/facebookresearch/vocoder-benchmark

VocBench from Facebook

Autoregressive vocoders: WaveNet, WaveRNN
GANs: Parallel WaveGAN, MelGAN
Diffusion: WaveGrad, DiffWave

All in one place with a common input-output interface with modern codebase from Facebook.

Might be useful for VC if it’s easy to make condition those vocoders using custom features.
Neural HMM: learns alignments fast

https://shivammehta007.github.io/Neural-HMM/

Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.
Prosody annotations for Switchboard: https://groups.inf.ed.ac.uk/switchboard/index.html
Convolutional Pitch Tracker (ICASSP 2018)

https://marl.github.io/crepe/

PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe
Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket.


http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html
Ephraim1985_Speech_enhancement_using_a_minimum_mean_square_error.pdf
311.1 KB
Dealing with residual vocoder noise:

LogMMSE Speech Enhancement and Noise Reduction

https://github.com/rajivpoddar/logmmse



y_enh = logmmse(y, sr, output_file=None, initial_noise=1, window_size=160, noise_threshold=0.15)
Vol Building AGI
StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯 https://github.com/yl4579/StarGANv2-VC I need to take a closer look at VTN
VTN is T23,

T10 is ASR and prosody encoder fed into speaker-dependent TTS fed into WaveNet with single Gaussian outputs. The alternative system of T10 was an autoregressive LSTM that converted PPG into melspc and was used for two male-male parallel speakers.
On AMP and HiFi-GAN: may need to remove the bias from convolution