ML Research Hub

✨WorldVLA: Towards Autoregressive Action World Model

📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.

🔹 Publication Date: Published on Jun 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA

🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AI #MachineLearning #Robotics #ComputerVision #WorldModels

❤1

598 views20:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:59

This media is not supported in your browser

VIEW IN TELEGRAM

✨Geometrically-Constrained Agent for Spatial Reasoning

📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#SpatialReasoning #VLMs #AI #Robotics #DeepLearning

❤1

308 views02:01

✨ Explore Data Science 📝 Write your paper

✨GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI

140 views04:02

✨ Explore Data Science 📝 Write your paper

✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning

124 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic

📝 Summary:
OpenREAD enhances autonomous driving via end-to-end reinforcement fine-tuning for both reasoning and planning. It uses an LLM critic to quantify open-ended reasoning, achieving state-of-the-art performance by addressing prior limitations.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01830
• PDF: https://arxiv.org/pdf/2512.01830
• Github: https://github.com/wyddmw/OpenREAD

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AutonomousDriving #LLMs #ReinforcementLearning #AI #Robotics

199 views09:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs

📝 Summary:
A hierarchical control framework enables stable humanoid locomotion with supernumerary limbs. It combines learning-based gait with model-based limb balancing, improving stability and reducing the CoM trajectory Dynamic Time Warping distance by 47%. This decoupled design effectively mitigates dyna...

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00077
• PDF: https://arxiv.org/pdf/2512.00077
• Github: https://github.com/heyzbw/HuSLs

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #HumanoidRobotics #Locomotion #ControlSystems #SupernumeraryLimbs

254 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SimScale: Learning to Drive via Real-World Simulation at Scale

📝 Summary:
SimScale is a simulation framework synthesizing diverse driving scenarios from logs. Co-training with this data significantly improves autonomous driving robustness and generalization, scaling with simulation data even without new real-world input.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23369
• PDF: https://arxiv.org/pdf/2511.23369
• Project Page: https://opendrivelab.com/SimScale
• Github: https://github.com/OpenDriveLab/SimScale

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AutonomousDriving #Simulation #AI #MachineLearning #Robotics

164 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Mixture of Horizons in Action Chunking

📝 Summary:
VLA models struggle with a fixed action chunk horizon. The Mixture of Horizons MoH strategy combines different horizons for both global foresight and fine-grained precision. This improves robotic performance, generalizability, and throughput, achieving new state-of-the-art.

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19433
• PDF: https://arxiv.org/pdf/2511.19433
• Project Page: https://timsty1.github.io/moh/
• Github: https://github.com/Timsty1/MixtureOfHorizons/tree/main

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #AI #MachineLearning #DeepLearning #ReinforcementLearning

172 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory

📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning

176 views06:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI

118 views08:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning

257 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:00

This media is not supported in your browser

VIEW IN TELEGRAM

✨X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale

📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision

❤2

374 views17:05

✨ Explore Data Science 📝 Write your paper

✨LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI

❤1

314 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics

❤2

320 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics

❤2

395 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch

❤1

295 views08:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.

🔹 Publication Date: Published on Dec 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision

114 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning

107 views09:58

✨ Explore Data Science 📝 Write your paper

✨GR-Dexter Technical Report

📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning

177 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning

288 views15:02

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform