Vol Building AGI

Neural HMM: learns alignments fast

https://shivammehta007.github.io/Neural-HMM/

Promises to converge with 500 utterances, i couldn’t get it to work with that much data. I think with 2k utterances it should.

36 views15:20

37 views15:20

39 views16:20

https://github.com/mindslab-ai/assem-vc

GitHub

GitHub - maum-ai/assem-vc: Official Code for Assem-VC @ICASSP2022

Official Code for Assem-VC @ICASSP2022. Contribute to maum-ai/assem-vc development by creating an account on GitHub.

36 views18:31

36 views18:31

tg_image_3087241015.jpeg

35 views18:31

33 views02:21

33 views02:22

Prosody annotations for Switchboard: https://groups.inf.ed.ac.uk/switchboard/index.html

49 views03:17

Vol Building AGI

Photo

Neural Text to Speech Synthesis Tutorial

https://github.com/tts-tutorial/icassp2022

Survey paper: https://arxiv.org/abs/2106.15561

GitHub

GitHub - tts-tutorial/icassp2022

Contribute to tts-tutorial/icassp2022 development by creating an account on GitHub.

35 views04:22

Vol Building AGI

Convolutional Pitch Tracker (ICASSP 2018)

https://marl.github.io/crepe/

PyTorch port with lots of usage details: https://github.com/maxrmorrison/torchcrepe

31 viewsedited 16:05

Vol Building AGI

Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket.

http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html

29 viewsedited 19:17

Vol Building AGI

Ephraim1985_Speech_enhancement_using_a_minimum_mean_square_error.pdf

311.1 KB

Dealing with residual vocoder noise:

LogMMSE Speech Enhancement and Noise Reduction

https://github.com/rajivpoddar/logmmse


y_enh = logmmse(y, sr, output_file=None, initial_noise=1, window_size=160, noise_threshold=0.15)

29 views10:43

Vol Building AGI

Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket. http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html

30 views16:03

Vol Building AGI

Photo

StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯

https://github.com/yl4579/StarGANv2-VC

I need to take a closer look at VTN

GitHub

GitHub - yl4579/StarGANv2-VC: StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion - yl4579/StarGANv2-VC

31 viewsedited 16:03

Vol Building AGI

StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯 https://github.com/yl4579/StarGANv2-VC I need to take a closer look at VTN

VTN is T23,

T10 is ASR and prosody encoder fed into speaker-dependent TTS fed into WaveNet with single Gaussian outputs. The alternative system of T10 was an autoregressive LSTM that converted PPG into melspc and was used for two male-male parallel speakers.

33 viewsedited 16:11

Vol Building AGI

On AMP and HiFi-GAN: may need to remove the bias from convolution

32 views09:46

Vol Building AGI

https://prml-lab-speech-team.github.io/demo/FreGAN2/

A vocoder that uses discrete wavelet transform in the discriminator and has a progressive generator structure similar to StyleGAN2 that produce iDWT arguments

https://github.com/prml-lab-speech-team/demo/tree/master/FreGAN2/code

GitHub

demo/FreGAN2/code at master · prml-lab-speech-team/demo

Contribute to prml-lab-speech-team/demo development by creating an account on GitHub.

35 viewsedited 12:22

Vol Building AGI

5:00