ML Research Hub – Telegram

ML Research Hub

32.8K subscribers

4.21K photos

253 videos

23 files

4.54K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.8K subscribers

ML Research Hub

✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning

260 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MOA: Multi-Objective Alignment for Role-Playing Agents

📝 Summary:
MOA is a reinforcement-learning framework for role-playing agents that uses multi-objective optimization and thought-augmented rollout. It simultaneously improves multiple skills like domain knowledge and linguistic style, addressing limitations of prior methods. MOA outperforms strong baselines,...

🔹 Publication Date: Published on Dec 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09756
• PDF: https://arxiv.org/pdf/2512.09756

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AI #ReinforcementLearning #MultiObjectiveOptimization #RolePlayingAgents #MachineLearning

❤1

324 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

📝 Summary:
MiMo-7B is a 7B LLM optimized for reasoning through pre-training with data mixing and Multi-Token Prediction. Post-training uses reinforcement learning on math and programming problems. This approach enables MiMo-7B to achieve superior reasoning performance, outperforming larger models and OpenAI...

🔹 Publication Date: Published on May 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.07608
• PDF: https://arxiv.org/pdf/2505.07608
• Github: https://github.com/XiaomiMiMo/MiMo

🔹 Models citing this paper:
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL
• https://huggingface.co/XiaomiMiMo/MiMo-7B-Base
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator_old
• https://huggingface.co/spaces/sizzlebop/ZeroGPU-LLM-Inference

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #AI #ReinforcementLearning #MachineLearning #Reasoning

MiMo: Unlocking the Reasoning Potential of Language Model -- From...

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing...

279 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling

❤1

331 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics

❤2

402 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math

❤2

232 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI

❤1

195 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Meta-RL Induces Exploration in Language Agents

📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI

382 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI

❤1

436 views16:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence

❤2

473 views13:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI

❤2

248 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels

❤1

275 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning

121 views09:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

📝 Summary:
Youtu-Agent scales LLM agent productivity, automating generation and enabling continuous evolution. Its hybrid optimization, using in-context learning and scalable reinforcement learning, yields top performance and boosted capabilities.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24615
• PDF: https://arxiv.org/pdf/2512.24615
• Project Page: https://tencentcloudadp.github.io/youtu-agent/
• Github: https://github.com/TencentCloudADP/youtu-tip

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #AIAgents #ReinforcementLearning #MachineLearning #AI

313 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

📝 Summary:
SenseNova-MARS empowers Vision-Language Models with interleaved visual reasoning and dynamic tool use like search and cropping via reinforcement learning. It achieves state-of-the-art performance on complex visual tasks, outperforming proprietary models on new and existing benchmarks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24330
• PDF: https://arxiv.org/pdf/2512.24330
• Github: https://github.com/OpenSenseNova/SenseNova-MARS

✨ Datasets citing this paper:
• https://huggingface.co/datasets/sensenova/SenseNova-MARS-Data
• https://huggingface.co/datasets/sensenova/HR-MMSearch

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#MultimodalAI #ReinforcementLearning #VisionLanguageModels #AgenticAI #ComputerVision

❤1

250 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Diversity or Precision? A Deep Dive into Next Token Prediction

📝 Summary:
This paper proposes a pre-training objective that reshapes the token-output distribution for better RL exploration. It uses reward-shaping to balance diversity and precision in next-token prediction. Contrary to intuition, a precision-oriented prior surprisingly yields a superior exploration spac...

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22955
• PDF: https://arxiv.org/pdf/2512.22955

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#NextTokenPrediction #ReinforcementLearning #LLM #NLP #AIResearch

❤1

474 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch

239 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Unified Thinker: A General Reasoning Modular Core for Image Generation

📝 Summary:
Unified Thinker introduces a modular reasoning core for image generation, decoupling a Thinker from the generator. It uses reinforcement learning to optimize visual correctness, substantially improving image reasoning and generation quality.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03127
• PDF: https://arxiv.org/pdf/2601.03127

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ImageGeneration #AIResearch #ReinforcementLearning #DeepLearning #GenerativeAI

❤2

314 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging

❤2

181 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning

❤1

205 views14:06

✨ Explore Data Science 📝 Write your paper