✨Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment
📝 Summary:
The SANTA framework addresses object and action hallucinations in multimodal LLM video captions. It uses self-augmented contrastive alignment to identify potential hallucinations and then aligns regional objects and actions with visual phrases, improving factual accuracy. Experiments show SANTA o...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04356
• PDF: https://arxiv.org/pdf/2512.04356
• Project Page: https://kpc0810.github.io/santa/
• Github: https://kpc0810.github.io/santa/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalLLMs #AI #Hallucinations #VideoUnderstanding #ContrastiveLearning
📝 Summary:
The SANTA framework addresses object and action hallucinations in multimodal LLM video captions. It uses self-augmented contrastive alignment to identify potential hallucinations and then aligns regional objects and actions with visual phrases, improving factual accuracy. Experiments show SANTA o...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04356
• PDF: https://arxiv.org/pdf/2512.04356
• Project Page: https://kpc0810.github.io/santa/
• Github: https://kpc0810.github.io/santa/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalLLMs #AI #Hallucinations #VideoUnderstanding #ContrastiveLearning
✨CPPO: Contrastive Perception for Vision Language Policy Optimization
📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch
📝 Summary:
CPPO improves vision-language model fine-tuning by detecting perception tokens through entropy shifts. It then applies a Contrastive Perception Loss to enhance multimodal reasoning, outperforming prior methods more efficiently.
🔹 Publication Date: Published on Jan 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00501
• PDF: https://arxiv.org/pdf/2601.00501
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #MultimodalAI #ContrastiveLearning #DeepLearning #AIResearch