Vol Building AGI

https://serrjoa.github.io/projects/universe/

Score-based diffusion for universal speech enhancement (55 distortion types)

Base model: 49M parameters, 5 days, 2xV100, AMP
The paper goes on to describe improvements to the model
Scaled up model: 189M parameters, 14 days 8xV100

serrjoa.github.io

UNIVERSE

Personal website

30 views09:31

Vol Building AGI

31 views08:17

Vol Building AGI

Neural Phonetic Alignment with pretrained models for English:
https://github.com/lingjzhu/charsiu/

GitHub

GitHub - lingjzhu/charsiu: Charsiu: A neural phonetic aligner.

Charsiu: A neural phonetic aligner. Contribute to lingjzhu/charsiu development by creating an account on GitHub.

35 views11:09

Vol Building AGI

Lightweight speech encoder

https://github.com/yl4579/AuxiliaryASR

GitHub

GitHub - yl4579/AuxiliaryASR: Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)

Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment) - yl4579/AuxiliaryASR

33 views11:33

Vol Building AGI

StyleGAN3 antialiasing generator meets vocoder. Trained on all of LibriTTS. Generalizes to laughter and music.

https://arxiv.org/abs/2206.04658

https://github.com/NVIDIA/BigVGAN

https://bigvgan-demo.github.io

👍1

34 views07:35

Vol Building AGI

Try StarGAN-VC and ACVAE-VC to speak like a dog. ACVAE sounds more like a dog while StarGAN has better speech clarity.

https://arxiv.org/abs/2206.04780

https://github.com/suzuki256/dog-dataset

45 views07:53

Vol Building AGI

ACL 2022: Direct speech-to-speech translation with discrete units, Lee at al

https://ai.facebook.com/blog/advancing-direct-speech-to-speech-modeling-with-discrete-units/
Meta does speech translation by feeding discrete units from a transformer encoder-decoder block to a vocoder. I noted how they don’t use pitch information as a HiFi-GAN input and use a mini duration prediction block from FastSpeech 2.

👍1

33 views09:49

Vol Building AGI

https://twitter.com/ysaito_human/status/1536521048568438785

日本語を学びましょう

Twitter

Yuki Saito

今日の13時からの信号処理特論でゲスト講師として発表します🤓 資料は👇 です（slideshare ですが，問題なく閲覧できると思います） slideshare.net/YukiSaito8/neu…

32 views20:10

Vol Building AGI

Very neat TTS composer from Sonatic https://www.youtube.com/watch?v=fNtwg-lXie8

YouTube

How Sonantic AI Voices Work