ML Research Hub
32.8K subscribers
4.37K photos
268 videos
23 files
4.72K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Title: When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

📝 Summary:
MIRA is a new benchmark for evaluating models that use intermediate visual images to enhance reasoning. It includes 546 multimodal problems requiring models to generate and utilize visual cues. Experiments show models achieve a 33.7% performance gain with visual cues compared to text-only prompts...

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02779
• PDF: https://arxiv.org/pdf/2511.02779

==================================

For more data science resources:
https://t.me/DataScienceT

#VisualReasoning #ChainOfThought #MultimodalAI #AIBenchmark #ComputerVision
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

📝 Summary:
TiViBench is a new benchmark assessing image-to-video models reasoning across four dimensions and 24 tasks. Commercial models show stronger reasoning potential. VideoTPO, a test-time strategy, significantly enhances performance, advancing reasoning in video generation.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13704
• PDF: https://arxiv.org/pdf/2511.13704
• Project Page: https://haroldchen19.github.io/TiViBench-Page/
• Github: https://haroldchen19.github.io/TiViBench-Page/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoGeneration #AIBenchmark #ComputerVision #DeepLearning #AIResearch
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch
1
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

📝 Summary:
This paper introduces LongShOTBench, a diagnostic benchmark for long-form multimodal video understanding with open-ended questions and agentic tool use. It also presents LongShOTAgent, an agentic system for video analysis. Results show state-of-the-art models struggle significantly, highlighting ...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16978
• PDF: https://arxiv.org/pdf/2512.16978
• Project Page: https://mbzuai-oryx.github.io/LongShOT/
• Github: https://github.com/mbzuai-oryx/longshot

Datasets citing this paper:
https://huggingface.co/datasets/MBZUAI/longshot-bench

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoAI #MultimodalAI #AgenticAI #AIbenchmark #AIResearch