ML Research Hub

✨ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

📝 Summary:
ARM-Thinker is an agentic reward model that uses external tools like image cropping and document retrieval to verify judgments in multimodal reasoning tasks. This significantly improves accuracy, interpretability, and visual grounding compared to existing reward models, achieving substantial perf...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05111
• PDF: https://arxiv.org/pdf/2512.05111
• Project Page: https://github.com/InternLM/ARM-Thinker
• Github: https://github.com/open-compass/VLMEvalKit/pull/1334

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MultimodalAI #AgenticAI #RewardModels #VisualReasoning #AIResearch

187 views06:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch

❤1

265 views10:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels

❤1

258 views03:01

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform