✨UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions
📝 Summary:
UniAVGen uses dual Diffusion Transformers and Asymmetric Cross-Modal Interaction for unified audio-video generation. This framework ensures precise spatiotemporal synchronization and semantic consistency. It outperforms existing methods in sync and consistency with far fewer training samples.
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03334
• PDF: https://arxiv.org/pdf/2511.03334
• Project Page: https://mcg-nju.github.io/UniAVGen/
• Github: https://mcg-nju.github.io/UniAVGen/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #AudioVideoGeneration #DiffusionModels #CrossModalAI #DeepLearning
📝 Summary:
UniAVGen uses dual Diffusion Transformers and Asymmetric Cross-Modal Interaction for unified audio-video generation. This framework ensures precise spatiotemporal synchronization and semantic consistency. It outperforms existing methods in sync and consistency with far fewer training samples.
🔹 Publication Date: Published on Nov 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03334
• PDF: https://arxiv.org/pdf/2511.03334
• Project Page: https://mcg-nju.github.io/UniAVGen/
• Github: https://mcg-nju.github.io/UniAVGen/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GenerativeAI #AudioVideoGeneration #DiffusionModels #CrossModalAI #DeepLearning
✨Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs
📝 Summary:
New benchmarks reveal MLLMs struggle with cross-modal inconsistency, failing to reason consistently across image, text, and mixed modalities with the same information. Visual characteristics like color and resolution significantly impact performance, even when text recognition is perfect. This hi...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08923
• PDF: https://arxiv.org/pdf/2512.08923
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLMs #CrossModalAI #AIResearch #ComputerVision #NLP
📝 Summary:
New benchmarks reveal MLLMs struggle with cross-modal inconsistency, failing to reason consistently across image, text, and mixed modalities with the same information. Visual characteristics like color and resolution significantly impact performance, even when text recognition is perfect. This hi...
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08923
• PDF: https://arxiv.org/pdf/2512.08923
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MLLMs #CrossModalAI #AIResearch #ComputerVision #NLP