ML Research Hub
32.8K subscribers
4.11K photos
241 videos
23 files
4.43K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

📝 Summary:
UniAVGen uses dual Diffusion Transformers and Asymmetric Cross-Modal Interaction for unified audio-video generation. This framework ensures precise spatiotemporal synchronization and semantic consistency. It outperforms existing methods in sync and consistency with far fewer training samples.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03334
• PDF: https://arxiv.org/pdf/2511.03334
• Project Page: https://mcg-nju.github.io/UniAVGen/
• Github: https://mcg-nju.github.io/UniAVGen/

==================================

For more data science resources:
https://t.me/DataScienceT

#GenerativeAI #AudioVideoGeneration #DiffusionModels #CrossModalAI #DeepLearning
Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs

📝 Summary:
New benchmarks reveal MLLMs struggle with cross-modal inconsistency, failing to reason consistently across image, text, and mixed modalities with the same information. Visual characteristics like color and resolution significantly impact performance, even when text recognition is perfect. This hi...

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08923
• PDF: https://arxiv.org/pdf/2512.08923

==================================

For more data science resources:
https://t.me/DataScienceT

#MLLMs #CrossModalAI #AIResearch #ComputerVision #NLP