Neural HMM: learns alignments fast
https://shivammehta007.github.io/Neural-HMM/
Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.
https://shivammehta007.github.io/Neural-HMM/
Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.
Prosody annotations for Switchboard: https://groups.inf.ed.ac.uk/switchboard/index.html
Vol Building AGI
Photo
Neural Text to Speech Synthesis Tutorial
https://github.com/tts-tutorial/icassp2022
Survey paper: https://arxiv.org/abs/2106.15561
https://github.com/tts-tutorial/icassp2022
Survey paper: https://arxiv.org/abs/2106.15561
GitHub
GitHub - tts-tutorial/icassp2022
Contribute to tts-tutorial/icassp2022 development by creating an account on GitHub.
Convolutional Pitch Tracker (ICASSP 2018)
https://marl.github.io/crepe/
PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe
https://marl.github.io/crepe/
PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe
Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket.
http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html
http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html
Ephraim1985_Speech_enhancement_using_a_minimum_mean_square_error.pdf
311.1 KB
Dealing with residual vocoder noise:
LogMMSE Speech Enhancement and Noise Reduction
https://github.com/rajivpoddar/logmmse
LogMMSE Speech Enhancement and Noise Reduction
https://github.com/rajivpoddar/logmmse
y_enh = logmmse(y, sr, output_file=None, initial_noise=1, window_size=160, noise_threshold=0.15)
Vol Building AGI
Photo
StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯
https://github.com/yl4579/StarGANv2-VC
I need to take a closer look at VTN
https://github.com/yl4579/StarGANv2-VC
I need to take a closer look at VTN
GitHub
GitHub - yl4579/StarGANv2-VC: StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion - yl4579/StarGANv2-VC
Vol Building AGI
StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯 https://github.com/yl4579/StarGANv2-VC I need to take a closer look at VTN
VTN is T23,
T10 is ASR and prosody encoder fed into speaker-dependent TTS fed into WaveNet with single Gaussian outputs. The alternative system of T10 was an autoregressive LSTM that converted PPG into melspc and was used for two male-male parallel speakers.
T10 is ASR and prosody encoder fed into speaker-dependent TTS fed into WaveNet with single Gaussian outputs. The alternative system of T10 was an autoregressive LSTM that converted PPG into melspc and was used for two male-male parallel speakers.
https://prml-lab-speech-team.github.io/demo/FreGAN2/
A vocoder that uses discrete wavelet transform in the discriminator and has a progressive generator structure similar to StyleGAN2 that produce iDWT arguments
https://github.com/prml-lab-speech-team/demo/tree/master/FreGAN2/code
A vocoder that uses discrete wavelet transform in the discriminator and has a progressive generator structure similar to StyleGAN2 that produce iDWT arguments
https://github.com/prml-lab-speech-team/demo/tree/master/FreGAN2/code
GitHub
demo/FreGAN2/code at master · prml-lab-speech-team/demo
Contribute to prml-lab-speech-team/demo development by creating an account on GitHub.
ICLR 2022
HiFi-GAN + chunked autoregression trains faster and keeps track of pitch better
https://github.com/descriptinc/cargan
HiFi-GAN + chunked autoregression trains faster and keeps track of pitch better
https://github.com/descriptinc/cargan
👍1