✨WorldVLA: Towards Autoregressive Action World Model
📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.
🔹 Publication Date: Published on Jun 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #MachineLearning #Robotics #ComputerVision #WorldModels
📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.
🔹 Publication Date: Published on Jun 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #MachineLearning #Robotics #ComputerVision #WorldModels
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Geometrically-Constrained Agent for Spatial Reasoning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
❤1
Media is too big
VIEW IN TELEGRAM
✨GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
📝 Summary:
GR-RL improves VLA policies for dexterous long-horizon manipulation. It filters and augments demonstrations, then refines them with RL. This enables unprecedented complex tasks, notably autonomously lacing a shoe.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01801
• PDF: https://arxiv.org/pdf/2512.01801
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #ReinforcementLearning #DexterousManipulation #RoboticManipulation #AI
Media is too big
VIEW IN TELEGRAM
✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
✨OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic
📝 Summary:
OpenREAD enhances autonomous driving via end-to-end reinforcement fine-tuning for both reasoning and planning. It uses an LLM critic to quantify open-ended reasoning, achieving state-of-the-art performance by addressing prior limitations.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01830
• PDF: https://arxiv.org/pdf/2512.01830
• Github: https://github.com/wyddmw/OpenREAD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #LLMs #ReinforcementLearning #AI #Robotics
📝 Summary:
OpenREAD enhances autonomous driving via end-to-end reinforcement fine-tuning for both reasoning and planning. It uses an LLM critic to quantify open-ended reasoning, achieving state-of-the-art performance by addressing prior limitations.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01830
• PDF: https://arxiv.org/pdf/2512.01830
• Github: https://github.com/wyddmw/OpenREAD
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #LLMs #ReinforcementLearning #AI #Robotics
✨A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs
📝 Summary:
A hierarchical control framework enables stable humanoid locomotion with supernumerary limbs. It combines learning-based gait with model-based limb balancing, improving stability and reducing the CoM trajectory Dynamic Time Warping distance by 47%. This decoupled design effectively mitigates dyna...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00077
• PDF: https://arxiv.org/pdf/2512.00077
• Github: https://github.com/heyzbw/HuSLs
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #HumanoidRobotics #Locomotion #ControlSystems #SupernumeraryLimbs
📝 Summary:
A hierarchical control framework enables stable humanoid locomotion with supernumerary limbs. It combines learning-based gait with model-based limb balancing, improving stability and reducing the CoM trajectory Dynamic Time Warping distance by 47%. This decoupled design effectively mitigates dyna...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00077
• PDF: https://arxiv.org/pdf/2512.00077
• Github: https://github.com/heyzbw/HuSLs
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #HumanoidRobotics #Locomotion #ControlSystems #SupernumeraryLimbs
✨SimScale: Learning to Drive via Real-World Simulation at Scale
📝 Summary:
SimScale is a simulation framework synthesizing diverse driving scenarios from logs. Co-training with this data significantly improves autonomous driving robustness and generalization, scaling with simulation data even without new real-world input.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23369
• PDF: https://arxiv.org/pdf/2511.23369
• Project Page: https://opendrivelab.com/SimScale
• Github: https://github.com/OpenDriveLab/SimScale
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #Simulation #AI #MachineLearning #Robotics
📝 Summary:
SimScale is a simulation framework synthesizing diverse driving scenarios from logs. Co-training with this data significantly improves autonomous driving robustness and generalization, scaling with simulation data even without new real-world input.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23369
• PDF: https://arxiv.org/pdf/2511.23369
• Project Page: https://opendrivelab.com/SimScale
• Github: https://github.com/OpenDriveLab/SimScale
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #Simulation #AI #MachineLearning #Robotics
✨Mixture of Horizons in Action Chunking
📝 Summary:
VLA models struggle with a fixed action chunk horizon. The Mixture of Horizons MoH strategy combines different horizons for both global foresight and fine-grained precision. This improves robotic performance, generalizability, and throughput, achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19433
• PDF: https://arxiv.org/pdf/2511.19433
• Project Page: https://timsty1.github.io/moh/
• Github: https://github.com/Timsty1/MixtureOfHorizons/tree/main
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #DeepLearning #ReinforcementLearning
📝 Summary:
VLA models struggle with a fixed action chunk horizon. The Mixture of Horizons MoH strategy combines different horizons for both global foresight and fine-grained precision. This improves robotic performance, generalizability, and throughput, achieving new state-of-the-art.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19433
• PDF: https://arxiv.org/pdf/2511.19433
• Project Page: https://timsty1.github.io/moh/
• Github: https://github.com/Timsty1/MixtureOfHorizons/tree/main
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #DeepLearning #ReinforcementLearning
✨MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning
📝 Summary:
MG-Nav is a dual-scale framework for zero-shot visual navigation, unifying global memory-guided planning via a Sparse Spatial Memory Graph with local geometry-enhanced control using a VGGT-adapter. It achieves state-of-the-art performance and robustness in unseen environments.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22609
• PDF: https://arxiv.org/pdf/2511.22609
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualNavigation #Robotics #AI #ComputerVision #ZeroShotLearning
✨SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
❤2
Media is too big
VIEW IN TELEGRAM
✨LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
❤1
✨Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
❤2
✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
❤2
✨An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch
📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch
❤1
✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
✨Act2Goal: From World Model To General Goal-conditioned Policy
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
Media is too big
VIEW IN TELEGRAM
✨GR-Dexter Technical Report
📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning
📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning
✨Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning
📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning