Vol Building AGI

Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket.

http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html

29 viewsedited 19:17

Vol Building AGI

Ephraim1985_Speech_enhancement_using_a_minimum_mean_square_error.pdf

311.1 KB

Dealing with residual vocoder noise:

LogMMSE Speech Enhancement and Noise Reduction

https://github.com/rajivpoddar/logmmse


y_enh = logmmse(y, sr, output_file=None, initial_noise=1, window_size=160, noise_threshold=0.15)

29 views10:43

Vol Building AGI

Transformer-based sprocket successor, uses TTS pretraining. Available as egs/arctic/vc1 in ESPnet. Sounds much worse than sprocket. http://www.kecl.ntt.co.jp/people/kameoka.hirokazu/Demos/vtn/index.html

30 views16:03

Vol Building AGI

Photo

StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯

https://github.com/yl4579/StarGANv2-VC

I need to take a closer look at VTN

GitHub

GitHub - yl4579/StarGANv2-VC: StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion - yl4579/StarGANv2-VC

31 viewsedited 16:03

Vol Building AGI

StarGANv2-VC authors mentioned this method as one achieving highest MOS on VCC-2020 🤯 https://github.com/yl4579/StarGANv2-VC I need to take a closer look at VTN

VTN is T23,

T10 is ASR and prosody encoder fed into speaker-dependent TTS fed into WaveNet with single Gaussian outputs. The alternative system of T10 was an autoregressive LSTM that converted PPG into melspc and was used for two male-male parallel speakers.

33 viewsedited 16:11

Vol Building AGI

On AMP and HiFi-GAN: may need to remove the bias from convolution

32 views09:46

Vol Building AGI

https://prml-lab-speech-team.github.io/demo/FreGAN2/

A vocoder that uses discrete wavelet transform in the discriminator and has a progressive generator structure similar to StyleGAN2 that produce iDWT arguments

https://github.com/prml-lab-speech-team/demo/tree/master/FreGAN2/code

GitHub

demo/FreGAN2/code at master · prml-lab-speech-team/demo

Contribute to prml-lab-speech-team/demo development by creating an account on GitHub.

35 viewsedited 12:22

34 views12:22

34 views12:22

ICLR 2022

HiFi-GAN + chunked autoregression trains faster and keeps track of pitch better

https://github.com/descriptinc/cargan

👍1

31 viewsedited 15:10

Vol Building AGI

ICLR 2022 HiFi-GAN + chunked autoregression trains faster and keeps track of pitch better https://github.com/descriptinc/cargan

https://www.maxrmorrison.com/sites/cargan/

34 views15:18

Vol Building AGI

https://serrjoa.github.io/projects/universe/

Score-based diffusion for universal speech enhancement (55 distortion types)

Base model: 49M parameters, 5 days, 2xV100, AMP
The paper goes on to describe improvements to the model
Scaled up model: 189M parameters, 14 days 8xV100

30 views09:31

31 views08:17

Neural Phonetic Alignment with pretrained models for English:
https://github.com/lingjzhu/charsiu/

GitHub

GitHub - lingjzhu/charsiu: Charsiu: A neural phonetic aligner.

Charsiu: A neural phonetic aligner. Contribute to lingjzhu/charsiu development by creating an account on GitHub.

35 views11:09