ML Research Hub

✨Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment

📝 Summary:
The SANTA framework addresses object and action hallucinations in multimodal LLM video captions. It uses self-augmented contrastive alignment to identify potential hallucinations and then aligns regional objects and actions with visual phrases, improving factual accuracy. Experiments show SANTA o...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04356
• PDF: https://arxiv.org/pdf/2512.04356
• Project Page: https://kpc0810.github.io/santa/
• Github: https://kpc0810.github.io/santa/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MultimodalLLMs #AI #Hallucinations #VideoUnderstanding #ContrastiveLearning

211 views07:06

✨CPPO: Contrastive Perception for Vision Language Policy Optimization

📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.

🔹 Publication Date: Published on Jan 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch

304 views18:44