✨On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral
📝 Summary:
GRPO in tool-integrated RL collapses due to Lazy Likelihood Displacement LLD, a systematic drop in response likelihoods. LLDS regularization addresses this by preserving likelihoods, stabilizing training, preventing gradient explosion, and substantially improving performance.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04220
• PDF: https://arxiv.org/pdf/2512.04220
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #AIResearch
📝 Summary:
GRPO in tool-integrated RL collapses due to Lazy Likelihood Displacement LLD, a systematic drop in response likelihoods. LLDS regularization addresses this by preserving likelihoods, stabilizing training, preventing gradient explosion, and substantially improving performance.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04220
• PDF: https://arxiv.org/pdf/2512.04220
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #AI #DeepLearning #AIResearch
❤1
✨ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
📝 Summary:
ReVSeg enhances video object segmentation. It uses sequential reasoning within pretrained vision language models, optimized by reinforcement learning. This achieves state-of-the-art results and provides interpretable reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02835
• PDF: https://arxiv.org/pdf/2512.02835
• Project Page: https://clementine24.github.io/ReVSeg/
• Github: https://github.com/Clementine24/ReVSeg
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #ReinforcementLearning #VisionLanguageModels #ComputerVision #DeepLearning
📝 Summary:
ReVSeg enhances video object segmentation. It uses sequential reasoning within pretrained vision language models, optimized by reinforcement learning. This achieves state-of-the-art results and provides interpretable reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02835
• PDF: https://arxiv.org/pdf/2512.02835
• Project Page: https://clementine24.github.io/ReVSeg/
• Github: https://github.com/Clementine24/ReVSeg
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #ReinforcementLearning #VisionLanguageModels #ComputerVision #DeepLearning
✨Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
📝 Summary:
This paper introduces Entropy Ratio Clipping ERC to stabilize reinforcement learning. ERC uses the entropy ratio between policies as a global metric, imposing constraints to address distributional shifts overlooked by PPO-Clip. Experiments show consistent performance improvements.
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05591
• PDF: https://arxiv.org/pdf/2512.05591
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #DeepLearning #AI #ERC
📝 Summary:
This paper introduces Entropy Ratio Clipping ERC to stabilize reinforcement learning. ERC uses the entropy ratio between policies as a global metric, imposing constraints to address distributional shifts overlooked by PPO-Clip. Experiments show consistent performance improvements.
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05591
• PDF: https://arxiv.org/pdf/2512.05591
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #DeepLearning #AI #ERC
✨PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling
📝 Summary:
PaCo-RL is a reinforcement learning framework for consistent image generation. It introduces PaCo-Reward for human-aligned consistency evaluation and PaCo-GRPO for efficient RL optimization. The framework achieves state-of-the-art consistency with improved training efficiency.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04784
• PDF: https://arxiv.org/pdf/2512.04784
• Project Page: https://x-gengroup.github.io/HomePage_PaCo-RL/
• Github: https://x-gengroup.github.io/HomePage_PaCo-RL
🔹 Models citing this paper:
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora
• https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ImageGeneration #AI #DeepLearning #GenerativeAI
📝 Summary:
PaCo-RL is a reinforcement learning framework for consistent image generation. It introduces PaCo-Reward for human-aligned consistency evaluation and PaCo-GRPO for efficient RL optimization. The framework achieves state-of-the-art consistency with improved training efficiency.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04784
• PDF: https://arxiv.org/pdf/2512.04784
• Project Page: https://x-gengroup.github.io/HomePage_PaCo-RL/
• Github: https://x-gengroup.github.io/HomePage_PaCo-RL
🔹 Models citing this paper:
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B
• https://huggingface.co/X-GenGroup/PaCo-Reward-7B-Lora
• https://huggingface.co/X-GenGroup/PaCo-FLUX.1-dev-Lora
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ImageGeneration #AI #DeepLearning #GenerativeAI
arXiv.org
PaCo-RL: Advancing Reinforcement Learning for Consistent Image...
Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character...
✨VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
📝 Summary:
VG-Refiner improves visual reasoning by addressing unreliable tool outputs. It uses a two-stage think-rethink mechanism and refinement reward to correct poor tool results. This significantly improves accuracy and correction ability in referring and grounding tasks.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06373
• PDF: https://arxiv.org/pdf/2512.06373
• Github: https://github.com/VoyageWang/VG-Refiner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AIResearch #MachineLearning
📝 Summary:
VG-Refiner improves visual reasoning by addressing unreliable tool outputs. It uses a two-stage think-rethink mechanism and refinement reward to correct poor tool results. This significantly improves accuracy and correction ability in referring and grounding tasks.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06373
• PDF: https://arxiv.org/pdf/2512.06373
• Github: https://github.com/VoyageWang/VG-Refiner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualReasoning #ReinforcementLearning #ComputerVision #AIResearch #MachineLearning
✨GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
📝 Summary:
GLM-4.1V-Thinking is a vision-language model using a reasoning-centric training framework. It achieves state-of-the-art multimodal reasoning across various tasks like STEM and long document understanding. The model outperforms larger models and competes with closed-source systems like GPT-4o.
🔹 Publication Date: Published on Jul 1
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/glm-41v-thinking-towards-versatile-multimodal-reasoning-with-scalable-reinforcement-learning
• PDF: https://arxiv.org/pdf/2507.01006
• Github: https://github.com/THUDM/GLM-4.1V-Thinking
🔹 Models citing this paper:
• https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking
• https://huggingface.co/zai-org/GLM-4.5V
• https://huggingface.co/zai-org/GLM-4.6V-Flash
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-Demo
• https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-API-Demo
• https://huggingface.co/spaces/akhaliq/anycoder
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GLM41VThinking #MultimodalAI #VisionLanguageModels #ReinforcementLearning #AIResearch
📝 Summary:
GLM-4.1V-Thinking is a vision-language model using a reasoning-centric training framework. It achieves state-of-the-art multimodal reasoning across various tasks like STEM and long document understanding. The model outperforms larger models and competes with closed-source systems like GPT-4o.
🔹 Publication Date: Published on Jul 1
🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/glm-41v-thinking-towards-versatile-multimodal-reasoning-with-scalable-reinforcement-learning
• PDF: https://arxiv.org/pdf/2507.01006
• Github: https://github.com/THUDM/GLM-4.1V-Thinking
🔹 Models citing this paper:
• https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking
• https://huggingface.co/zai-org/GLM-4.5V
• https://huggingface.co/zai-org/GLM-4.6V-Flash
✨ Spaces citing this paper:
• https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-Demo
• https://huggingface.co/spaces/zai-org/GLM-4.1V-9B-Thinking-API-Demo
• https://huggingface.co/spaces/akhaliq/anycoder
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GLM41VThinking #MultimodalAI #VisionLanguageModels #ReinforcementLearning #AIResearch
Arxivexplained
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - Explained Simply
By Wenyi Hong, Wenmeng Yu, Xiaotao Gu et al.. # GLM-4.1V-Thinking: The AI That Actually Thinks Through Visual Problems
**The Problem:** Current A...
**The Problem:** Current A...
✨Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
📝 Summary:
Reinforcement Learning enhances decoding-based regression by introducing sequence-level rewards. This overcomes token-level limitations, improving precision and generalization. It establishes a robust and accurate paradigm for numerical prediction.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06533
• PDF: https://arxiv.org/pdf/2512.06533
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #Regression #DataScience #AI
📝 Summary:
Reinforcement Learning enhances decoding-based regression by introducing sequence-level rewards. This overcomes token-level limitations, improving precision and generalization. It establishes a robust and accurate paradigm for numerical prediction.
🔹 Publication Date: Published on Dec 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06533
• PDF: https://arxiv.org/pdf/2512.06533
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #MachineLearning #Regression #DataScience #AI
✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
✨MOA: Multi-Objective Alignment for Role-Playing Agents
📝 Summary:
MOA is a reinforcement-learning framework for role-playing agents that uses multi-objective optimization and thought-augmented rollout. It simultaneously improves multiple skills like domain knowledge and linguistic style, addressing limitations of prior methods. MOA outperforms strong baselines,...
🔹 Publication Date: Published on Dec 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09756
• PDF: https://arxiv.org/pdf/2512.09756
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ReinforcementLearning #MultiObjectiveOptimization #RolePlayingAgents #MachineLearning
📝 Summary:
MOA is a reinforcement-learning framework for role-playing agents that uses multi-objective optimization and thought-augmented rollout. It simultaneously improves multiple skills like domain knowledge and linguistic style, addressing limitations of prior methods. MOA outperforms strong baselines,...
🔹 Publication Date: Published on Dec 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09756
• PDF: https://arxiv.org/pdf/2512.09756
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ReinforcementLearning #MultiObjectiveOptimization #RolePlayingAgents #MachineLearning
❤1
✨MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
📝 Summary:
MiMo-7B is a 7B LLM optimized for reasoning through pre-training with data mixing and Multi-Token Prediction. Post-training uses reinforcement learning on math and programming problems. This approach enables MiMo-7B to achieve superior reasoning performance, outperforming larger models and OpenAI...
🔹 Publication Date: Published on May 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.07608
• PDF: https://arxiv.org/pdf/2505.07608
• Github: https://github.com/XiaomiMiMo/MiMo
🔹 Models citing this paper:
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL
• https://huggingface.co/XiaomiMiMo/MiMo-7B-Base
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator_old
• https://huggingface.co/spaces/sizzlebop/ZeroGPU-LLM-Inference
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #ReinforcementLearning #MachineLearning #Reasoning
📝 Summary:
MiMo-7B is a 7B LLM optimized for reasoning through pre-training with data mixing and Multi-Token Prediction. Post-training uses reinforcement learning on math and programming problems. This approach enables MiMo-7B to achieve superior reasoning performance, outperforming larger models and OpenAI...
🔹 Publication Date: Published on May 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.07608
• PDF: https://arxiv.org/pdf/2505.07608
• Github: https://github.com/XiaomiMiMo/MiMo
🔹 Models citing this paper:
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL
• https://huggingface.co/XiaomiMiMo/MiMo-7B-Base
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator_old
• https://huggingface.co/spaces/sizzlebop/ZeroGPU-LLM-Inference
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #ReinforcementLearning #MachineLearning #Reasoning
arXiv.org
MiMo: Unlocking the Reasoning Potential of Language Model -- From...
We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing...
✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
❤1
✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
❤2
✨Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math
📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math
❤2
✨Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI
📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Meta-RL Induces Exploration in Language Agents
📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI
📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI
✨Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents
📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI
📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI
❤1
✨Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence
📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence
❤2
✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
❤2
✨SWE-RM: Execution-free Feedback For Software Engineering Agents
📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels
📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels
❤1
✨Act2Goal: From World Model To General Goal-conditioned Policy
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning