✨RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
📝 Summary:
RoboChallenge is an online evaluation system for robotic control algorithms, especially VLA models. It enables large-scale, reproducible real-robot testing to survey state-of-the-art models.
🔹 Publication Date: Published on Oct 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.17950
• PDF: https://arxiv.org/pdf/2510.17950
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #EmbodiedAI #RoboticsEvaluation
📝 Summary:
RoboChallenge is an online evaluation system for robotic control algorithms, especially VLA models. It enables large-scale, reproducible real-robot testing to survey state-of-the-art models.
🔹 Publication Date: Published on Oct 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.17950
• PDF: https://arxiv.org/pdf/2510.17950
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #EmbodiedAI #RoboticsEvaluation
✨LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
📝 Summary:
The paper introduces LEGO-Eval, a tool-augmented framework, and LEGO-Bench, a detailed instruction benchmark, to improve 3D scene evaluation. It shows LEGO-Eval accurately assesses scene-instruction alignment, outperforming VLMs, and current generation methods largely fail to create realistic sce...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03001
• PDF: https://arxiv.org/pdf/2511.03001
• Project Page: https://gyeomh.github.io/LEGO-Eval/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DGeneration #EvaluationMetrics #VLMs #Benchmarking
📝 Summary:
The paper introduces LEGO-Eval, a tool-augmented framework, and LEGO-Bench, a detailed instruction benchmark, to improve 3D scene evaluation. It shows LEGO-Eval accurately assesses scene-instruction alignment, outperforming VLMs, and current generation methods largely fail to create realistic sce...
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03001
• PDF: https://arxiv.org/pdf/2511.03001
• Project Page: https://gyeomh.github.io/LEGO-Eval/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DGeneration #EvaluationMetrics #VLMs #Benchmarking
✨AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
📝 Summary:
AffordBot uses MLLMs and chain-of-thought reasoning for fine-grained 3D embodied reasoning. It predicts affordance elements' location, motion type, and axis in 3D scenes per instructions. It achieves state-of-the-art by projecting 3D elements for 2D MLLMs.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10017
• PDF: https://arxiv.org/pdf/2511.10017
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AffordBot #MLLM #EmbodiedAI #3DReasoning #Robotics
📝 Summary:
AffordBot uses MLLMs and chain-of-thought reasoning for fine-grained 3D embodied reasoning. It predicts affordance elements' location, motion type, and axis in 3D scenes per instructions. It achieves state-of-the-art by projecting 3D elements for 2D MLLMs.
🔹 Publication Date: Published on Nov 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10017
• PDF: https://arxiv.org/pdf/2511.10017
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AffordBot #MLLM #EmbodiedAI #3DReasoning #Robotics
This media is not supported in your browser
VIEW IN TELEGRAM
✨PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
✨FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
✨MiMo-Embodied: X-Embodied Foundation Model Technical Report
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
✨GigaWorld-0: World Models as Data Engine to Empower Embodied AI
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
✨Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
📝 Summary:
A new task, ORS3D, is introduced for embodied agents, requiring language understanding, 3D grounding, and efficient parallel task scheduling. The ORS3D-60K dataset and GRANT, an embodied LLM with a scheduling token mechanism, enable agents to minimize total completion time.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19430
• PDF: https://arxiv.org/pdf/2511.19430
• Project Page: https://h-embodvis.github.io/GRANT/
• Github: https://github.com/H-EmbodVis/GRANT
🔹 Models citing this paper:
• https://huggingface.co/H-EmbodVis/GRANT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLM #Robotics #TaskScheduling #AIResearch
📝 Summary:
A new task, ORS3D, is introduced for embodied agents, requiring language understanding, 3D grounding, and efficient parallel task scheduling. The ORS3D-60K dataset and GRANT, an embodied LLM with a scheduling token mechanism, enable agents to minimize total completion time.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19430
• PDF: https://arxiv.org/pdf/2511.19430
• Project Page: https://h-embodvis.github.io/GRANT/
• Github: https://github.com/H-EmbodVis/GRANT
🔹 Models citing this paper:
• https://huggingface.co/H-EmbodVis/GRANT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLM #Robotics #TaskScheduling #AIResearch
✨DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
📝 Summary:
DualVLA tackles action degeneration in VLAs by boosting action performance while retaining reasoning. It uses dual-layer data pruning and dual-teacher adaptive distillation. This balances precise action and multimodal understanding, leading to high success rates.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22134
• PDF: https://arxiv.org/pdf/2511.22134
• Project Page: https://costaliya.github.io/DualVLA/
• Github: https://costaliya.github.io/DualVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #VLAs #AIagents #DeepLearning #AIResearch
📝 Summary:
DualVLA tackles action degeneration in VLAs by boosting action performance while retaining reasoning. It uses dual-layer data pruning and dual-teacher adaptive distillation. This balances precise action and multimodal understanding, leading to high success rates.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22134
• PDF: https://arxiv.org/pdf/2511.22134
• Project Page: https://costaliya.github.io/DualVLA/
• Github: https://costaliya.github.io/DualVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #VLAs #AIagents #DeepLearning #AIResearch
✨SIMA 2: A Generalist Embodied Agent for Virtual Worlds
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
This media is not supported in your browser
VIEW IN TELEGRAM
✨X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
❤2
Media is too big
VIEW IN TELEGRAM
✨LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
❤1
✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
❤2