ML Research Hub
32.7K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

📝 Summary:
ARM-Thinker is an agentic reward model that uses external tools like image cropping and document retrieval to verify judgments in multimodal reasoning tasks. This significantly improves accuracy, interpretability, and visual grounding compared to existing reward models, achieving substantial perf...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05111
• PDF: https://arxiv.org/pdf/2512.05111
• Project Page: https://github.com/InternLM/ARM-Thinker
• Github: https://github.com/open-compass/VLMEvalKit/pull/1334

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #AgenticAI #RewardModels #VisualReasoning #AIResearch
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

📝 Summary:
MMRB2 is a new benchmark for multimodal reward models, evaluating them on interleaved image and text tasks using 4,000 expert-annotated preferences. It shows top models like Gemini 3 Pro achieve 75-80% accuracy, still below human performance, highlighting areas for improvement in these models.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16899
• PDF: https://arxiv.org/pdf/2512.16899
• Github: https://github.com/facebookresearch/MMRB2/tree/main

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #RewardModels #AIbenchmark #MachineLearning #AIResearch
1
SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
https://t.me/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels
1