ML Research Hub
32.8K subscribers
4.21K photos
253 videos
23 files
4.54K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
https://t.me/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
MOA: Multi-Objective Alignment for Role-Playing Agents

📝 Summary:
MOA is a reinforcement-learning framework for role-playing agents that uses multi-objective optimization and thought-augmented rollout. It simultaneously improves multiple skills like domain knowledge and linguistic style, addressing limitations of prior methods. MOA outperforms strong baselines,...

🔹 Publication Date: Published on Dec 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09756
• PDF: https://arxiv.org/pdf/2512.09756

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #ReinforcementLearning #MultiObjectiveOptimization #RolePlayingAgents #MachineLearning
1
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

📝 Summary:
MiMo-7B is a 7B LLM optimized for reasoning through pre-training with data mixing and Multi-Token Prediction. Post-training uses reinforcement learning on math and programming problems. This approach enables MiMo-7B to achieve superior reasoning performance, outperforming larger models and OpenAI...

🔹 Publication Date: Published on May 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.07608
• PDF: https://arxiv.org/pdf/2505.07608
• Github: https://github.com/XiaomiMiMo/MiMo

🔹 Models citing this paper:
https://huggingface.co/XiaomiMiMo/MiMo-7B-RL
https://huggingface.co/XiaomiMiMo/MiMo-7B-Base
https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530

Spaces citing this paper:
https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator
https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator_old
https://huggingface.co/spaces/sizzlebop/ZeroGPU-LLM-Inference

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #AI #ReinforcementLearning #MachineLearning #Reasoning
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
1
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
2
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover

==================================

For more data science resources:
https://t.me/DataScienceT

#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math
2
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
Meta-RL Induces Exploration in Language Agents

📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848

==================================

For more data science resources:
https://t.me/DataScienceT

#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI
1
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence
2
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
https://t.me/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
2
SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
https://t.me/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels
1
Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

📝 Summary:
Youtu-Agent scales LLM agent productivity, automating generation and enabling continuous evolution. Its hybrid optimization, using in-context learning and scalable reinforcement learning, yields top performance and boosted capabilities.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24615
• PDF: https://arxiv.org/pdf/2512.24615
• Project Page: https://tencentcloudadp.github.io/youtu-agent/
• Github: https://github.com/TencentCloudADP/youtu-tip

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #AIAgents #ReinforcementLearning #MachineLearning #AI
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

📝 Summary:
SenseNova-MARS empowers Vision-Language Models with interleaved visual reasoning and dynamic tool use like search and cropping via reinforcement learning. It achieves state-of-the-art performance on complex visual tasks, outperforming proprietary models on new and existing benchmarks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24330
• PDF: https://arxiv.org/pdf/2512.24330
• Github: https://github.com/OpenSenseNova/SenseNova-MARS

Datasets citing this paper:
https://huggingface.co/datasets/sensenova/SenseNova-MARS-Data
https://huggingface.co/datasets/sensenova/HR-MMSearch

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #ReinforcementLearning #VisionLanguageModels #AgenticAI #ComputerVision
1
Diversity or Precision? A Deep Dive into Next Token Prediction

📝 Summary:
This paper proposes a pre-training objective that reshapes the token-output distribution for better RL exploration. It uses reward-shaping to balance diversity and precision in next-token prediction. Contrary to intuition, a precision-oriented prior surprisingly yields a superior exploration spac...

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22955
• PDF: https://arxiv.org/pdf/2512.22955

==================================

For more data science resources:
https://t.me/DataScienceT

#NextTokenPrediction #ReinforcementLearning #LLM #NLP #AIResearch
1
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch
Unified Thinker: A General Reasoning Modular Core for Image Generation

📝 Summary:
Unified Thinker introduces a modular reasoning core for image generation, decoupling a Thinker from the generator. It uses reinforcement learning to optimize visual correctness, substantially improving image reasoning and generation quality.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03127
• PDF: https://arxiv.org/pdf/2601.03127

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageGeneration #AIResearch #ReinforcementLearning #DeepLearning #GenerativeAI
2
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging
2
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
1