ML Research Hub
32.8K subscribers
4.11K photos
241 videos
23 files
4.43K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

📝 Summary:
IndexTTS enhances XTTS and Tortoise for TTS, improving naturalness and zero-shot voice cloning. It features hybrid character-pinyin modeling for Chinese and optimized vector quantization, resulting in more controllable usage, faster inference, and superior performance compared to other systems.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.05512
• PDF: https://arxiv.org/pdf/2502.05512
• Github: https://github.com/index-tts/index-tts

🔹 Models citing this paper:
https://huggingface.co/IndexTeam/IndexTTS-2
https://huggingface.co/IndexTeam/Index-TTS
https://huggingface.co/Toxzic/indextts-colab

Spaces citing this paper:
https://huggingface.co/spaces/IndexTeam/IndexTTS
https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena
https://huggingface.co/spaces/jairwaal/image

==================================

For more data science resources:
https://t.me/DataScienceT

#TextToSpeech #ZeroShotLearning #VoiceCloning #AI #MachineLearning
OpenVoice: Versatile Instant Voice Cloning

📝 Summary:
OpenVoice is a versatile voice cloning method using a short audio clip. It provides flexible control over voice styles and achieves zero-shot cross-lingual cloning for new languages without extensive training data. It is also highly efficient.

🔹 Publication Date: Published on Dec 3, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2312.01479
• PDF: https://arxiv.org/pdf/2312.01479
• Github: https://github.com/myshell-ai/openvoice

🔹 Models citing this paper:
https://huggingface.co/rsxdalv/OpenVoiceV2
https://huggingface.co/ameerazam08/Udiff
https://huggingface.co/flopml/OpenVoice-v2

Datasets citing this paper:
https://huggingface.co/datasets/tsinghua-ee/QualiSpeech
https://huggingface.co/datasets/dlxjj/Openvoice
https://huggingface.co/datasets/Pendrokar/open_tts_tracker

Spaces citing this paper:
https://huggingface.co/spaces/Russell1123213123/testOpenVoice
https://huggingface.co/spaces/gauthamk28/gauthamk28_voice
https://huggingface.co/spaces/blayks07/OpenVoice-main

==================================

For more data science resources:
https://t.me/DataScienceT

#VoiceCloning #AIResearch #SpeechSynthesis #ZeroShotLearning #CrossLingualAI
Dynamic Reflections: Probing Video Representations with Text Alignment

📝 Summary:
This work presents the first comprehensive study on video-text representation alignment. It reveals alignment depends on data richness and correlates with downstream task performance, suggesting its value for general video understanding. This introduces video-text alignment as a zero-shot method ...

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02767
• PDF: https://arxiv.org/pdf/2511.02767
• Github: https://video-prh.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoUnderstanding #TextAlignment #VideoTextAI #ZeroShotLearning #RepresentationLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

📝 Summary:
NAF upsamples Vision Foundation Model features zero-shot by learning adaptive spatial-and-content weights. It outperforms VFM-specific upsamplers without retraining, achieving state-of-the-art performance across various tasks efficiently.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18452
• PDF: https://arxiv.org/pdf/2511.18452
• Github: https://github.com/valeoai/NAF?tab=readme-ov-file

==================================

For more data science resources:
https://t.me/DataScienceT

#ZeroShotLearning #ComputerVision #FeatureUpsampling #DeepLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

📝 Summary:
NAF upsamples Vision Foundation Model features zero-shot by learning adaptive spatial-and-content weights. It outperforms VFM-specific upsamplers without retraining, achieving state-of-the-art performance across various tasks efficiently.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18452
• PDF: https://arxiv.org/pdf/2511.18452
• Github: https://github.com/valeoai/NAF?tab=readme-ov-file

==================================

For more data science resources:
https://t.me/DataScienceT

#ZeroShotLearning #ComputerVision #FeatureUpsampling #DeepLearning #AIResearch
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609

==================================

For more data science resources:
https://t.me/DataScienceT

#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning