lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #audio_generation #deep_learning #non_autoregressive #transformers
Stars: 181 Issues: 0 Forks: 6
https://github.com/lucidrains/soundstorm-pytorch
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
Language: Python
#artificial_intelligence #attention_mechanism #audio_generation #deep_learning #non_autoregressive #transformers
Stars: 181 Issues: 0 Forks: 6
https://github.com/lucidrains/soundstorm-pytorch
GitHub
GitHub - lucidrains/soundstorm-pytorch: Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind…
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch - lucidrains/soundstorm-pytorch
OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
GitHub
GitHub - OFA-Sys/ONE-PEACE: A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring…
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - OFA-Sys/ONE-PEACE
VASTDynamics/Vaporizer2
Vaporizer2 hybrid wavetable additive / subtractive VST / AU / AAX synthesizer / sampler workstation plugin
Language: C++
#aax #audio #audiounit_plugins #cpp #daw #music #plugin #sampler #synthesizer #vst #vst3 #vst3_plugin #wavetable
Stars: 186 Issues: 5 Forks: 9
https://github.com/VASTDynamics/Vaporizer2
Vaporizer2 hybrid wavetable additive / subtractive VST / AU / AAX synthesizer / sampler workstation plugin
Language: C++
#aax #audio #audiounit_plugins #cpp #daw #music #plugin #sampler #synthesizer #vst #vst3 #vst3_plugin #wavetable
Stars: 186 Issues: 5 Forks: 9
https://github.com/VASTDynamics/Vaporizer2
GitHub
GitHub - VASTDynamics/Vaporizer2: Vaporizer2 hybrid wavetable additive / subtractive VST / AU / AAX synthesizer / sampler workstation…
Vaporizer2 hybrid wavetable additive / subtractive VST / AU / AAX synthesizer / sampler workstation plugin - VASTDynamics/Vaporizer2
huggingface/distil-whisper
#audio #speech_recognition #whisper
Stars: 261 Issues: 2 Forks: 9
https://github.com/huggingface/distil-whisper
#audio #speech_recognition #whisper
Stars: 261 Issues: 2 Forks: 9
https://github.com/huggingface/distil-whisper
GitHub
GitHub - huggingface/distil-whisper: Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word…
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate. - huggingface/distil-whisper
ZiqiaoPeng/SyncTalk
This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
#audio_driven_talking_face #talking_face #talking_face_generation #talking_head
Stars: 180 Issues: 5 Forks: 2
https://github.com/ZiqiaoPeng/SyncTalk
This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
#audio_driven_talking_face #talking_face #talking_face_generation #talking_head
Stars: 180 Issues: 5 Forks: 2
https://github.com/ZiqiaoPeng/SyncTalk
GitHub
GitHub - ZiqiaoPeng/SyncTalk: [CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization…
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis" - ZiqiaoPeng/SyncTalk
TuneNN/TuneNN
A transformer-based network model for pitch detection
Language: Python
#audio #machine_learning #music #pitch_detection #pitch_estimation
Stars: 142 Issues: 0 Forks: 3
https://github.com/TuneNN/TuneNN
A transformer-based network model for pitch detection
Language: Python
#audio #machine_learning #music #pitch_detection #pitch_estimation
Stars: 142 Issues: 0 Forks: 3
https://github.com/TuneNN/TuneNN
GitHub
GitHub - TuneNN/TuneNN: A transformer-based network model for pitch detection
A transformer-based network model for pitch detection - TuneNN/TuneNN
ali-vilab/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Language: Python
#audio_visual_learning #face_animation #talking_head #video_generation
Stars: 217 Issues: 7 Forks: 20
https://github.com/ali-vilab/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Language: Python
#audio_visual_learning #face_animation #talking_head #video_generation
Stars: 217 Issues: 7 Forks: 20
https://github.com/ali-vilab/dreamtalk
GitHub
GitHub - ali-vilab/dreamtalk: Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion…
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models - ali-vilab/dreamtalk
Lessica/TrollRecorder
WIP: A simple audio recorder for TrollStore.
Language: Objective-C++
#audio_recorder #ios #jailbreak #trollstore #tweak
Stars: 282 Issues: 1 Forks: 10
https://github.com/Lessica/TrollRecorder
WIP: A simple audio recorder for TrollStore.
Language: Objective-C++
#audio_recorder #ios #jailbreak #trollstore #tweak
Stars: 282 Issues: 1 Forks: 10
https://github.com/Lessica/TrollRecorder
GitHub
GitHub - Lessica/TrollRecorder: (i18n/CLI) Not the first, but the best phone call recorder with TrollStore.
(i18n/CLI) Not the first, but the best phone call recorder with TrollStore. - Lessica/TrollRecorder
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language: Python
#acoustic #audio_representation #codec #dac #encodec #gpt4o #music_representation_learning #semantic #soundstream #speech_language_model #speech_representation #text_to_speech
Stars: 332 Issues: 6 Forks: 20
https://github.com/jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language: Python
#acoustic #audio_representation #codec #dac #encodec #gpt4o #music_representation_learning #semantic #soundstream #speech_language_model #speech_representation #text_to_speech
Stars: 332 Issues: 6 Forks: 20
https://github.com/jishengpeng/WavTokenizer
GitHub
GitHub - jishengpeng/WavTokenizer: [ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language…
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling - GitHub - jishengpeng/WavTokenizer: [ICLR 2025] SOTA discrete acoustic codec models with 4...
antgroup/echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Language: Python
#audio_driven_portrait_animations #audio_driven_talking_face #human_animation #talking_face_generation #talking_head
Stars: 307 Issues: 5 Forks: 28
https://github.com/antgroup/echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Language: Python
#audio_driven_portrait_animations #audio_driven_talking_face #human_animation #talking_face_generation #talking_head
Stars: 307 Issues: 5 Forks: 28
https://github.com/antgroup/echomimic_v2
GitHub
GitHub - antgroup/echomimic_v2: [CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation - antgroup/echomimic_v2
Tencent/HunyuanCustom
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Language: Python
#audio_driven #diffusion_models #image_to_video #image_to_video_generation #video_editing #video_generation
Stars: 360 Issues: 4 Forks: 14
https://github.com/Tencent/HunyuanCustom
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Language: Python
#audio_driven #diffusion_models #image_to_video #image_to_video_generation #video_editing #video_generation
Stars: 360 Issues: 4 Forks: 14
https://github.com/Tencent/HunyuanCustom
GitHub
GitHub - Tencent/HunyuanCustom: HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation - Tencent/HunyuanCustom