This media is not supported in your browser
VIEW IN TELEGRAM
✨PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
📝 Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13648
• PDF: https://arxiv.org/pdf/2511.13648
• Project Page: https://physx-anything.github.io/
• Github: https://github.com/ziangcao0312/PhysX-Anything
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
✨FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
📝 Summary:
FreeAskWorld is an interactive simulator using LLMs for human-centric embodied AI with complex social behaviors. It offers a large dataset, improving agent semantic understanding and interaction competency, highlighting interaction as a key information modality.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13524
• PDF: https://arxiv.org/pdf/2511.13524
• Github: https://github.com/AIR-DISCOVER/FreeAskWorld
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Astronaut-PENG/FreeAskWorld
• https://huggingface.co/datasets/Astronaut-PENG/FreeWorld
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLMs #AISimulation #HumanAI #AIResearch
✨MiMo-Embodied: X-Embodied Foundation Model Technical Report
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
✨GigaWorld-0: World Models as Data Engine to Empower Embodied AI
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
✨Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution
📝 Summary:
A new task, ORS3D, is introduced for embodied agents, requiring language understanding, 3D grounding, and efficient parallel task scheduling. The ORS3D-60K dataset and GRANT, an embodied LLM with a scheduling token mechanism, enable agents to minimize total completion time.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19430
• PDF: https://arxiv.org/pdf/2511.19430
• Project Page: https://h-embodvis.github.io/GRANT/
• Github: https://github.com/H-EmbodVis/GRANT
🔹 Models citing this paper:
• https://huggingface.co/H-EmbodVis/GRANT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLM #Robotics #TaskScheduling #AIResearch
📝 Summary:
A new task, ORS3D, is introduced for embodied agents, requiring language understanding, 3D grounding, and efficient parallel task scheduling. The ORS3D-60K dataset and GRANT, an embodied LLM with a scheduling token mechanism, enable agents to minimize total completion time.
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19430
• PDF: https://arxiv.org/pdf/2511.19430
• Project Page: https://h-embodvis.github.io/GRANT/
• Github: https://github.com/H-EmbodVis/GRANT
🔹 Models citing this paper:
• https://huggingface.co/H-EmbodVis/GRANT
✨ Datasets citing this paper:
• https://huggingface.co/datasets/H-EmbodVis/ORS3D-60K
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #LLM #Robotics #TaskScheduling #AIResearch
✨DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
📝 Summary:
DualVLA tackles action degeneration in VLAs by boosting action performance while retaining reasoning. It uses dual-layer data pruning and dual-teacher adaptive distillation. This balances precise action and multimodal understanding, leading to high success rates.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22134
• PDF: https://arxiv.org/pdf/2511.22134
• Project Page: https://costaliya.github.io/DualVLA/
• Github: https://costaliya.github.io/DualVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #VLAs #AIagents #DeepLearning #AIResearch
📝 Summary:
DualVLA tackles action degeneration in VLAs by boosting action performance while retaining reasoning. It uses dual-layer data pruning and dual-teacher adaptive distillation. This balances precise action and multimodal understanding, leading to high success rates.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22134
• PDF: https://arxiv.org/pdf/2511.22134
• Project Page: https://costaliya.github.io/DualVLA/
• Github: https://costaliya.github.io/DualVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #VLAs #AIagents #DeepLearning #AIResearch
✨SIMA 2: A Generalist Embodied Agent for Virtual Worlds
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
📝 Summary:
SIMA 2 is a Gemini-based embodied agent for 3D virtual worlds. It reasons about goals, handles complex instructions, and autonomously learns new skills. This closes the gap with human performance and validates continuous learning agents.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04797
• PDF: https://arxiv.org/pdf/2512.04797
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #AI #VirtualWorlds #ReinforcementLearning #AIagents
This media is not supported in your browser
VIEW IN TELEGRAM
✨X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
❤2
Media is too big
VIEW IN TELEGRAM
✨LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
❤1
✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
❤2
✨CoV: Chain-of-View Prompting for Spatial Reasoning
📝 Summary:
Chain-of-View CoV prompting enhances spatial reasoning in 3D embodied question answering for vision-language models. It actively explores environments by selecting question-aligned views and iteratively adjusting camera positions to gather context, leading to significant performance gains across ...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05172
• PDF: https://arxiv.org/pdf/2601.05172
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VisionLanguageModels #EmbodiedAI #Prompting #AI
📝 Summary:
Chain-of-View CoV prompting enhances spatial reasoning in 3D embodied question answering for vision-language models. It actively explores environments by selecting question-aligned views and iteratively adjusting camera positions to gather context, leading to significant performance gains across ...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05172
• PDF: https://arxiv.org/pdf/2601.05172
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VisionLanguageModels #EmbodiedAI #Prompting #AI
✨CoV: Chain-of-View Prompting for Spatial Reasoning
📝 Summary:
Chain-of-View CoV prompting helps vision-language models improve spatial reasoning in 3D embodied question answering. It actively selects question-aligned views and iteratively adjusts camera positions to gather context, significantly boosting performance without additional training.
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05172
• PDF: https://arxiv.org/pdf/2601.05172
• Github: https://github.com/ziplab/CoV
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VisionLanguageModels #PromptEngineering #EmbodiedAI #AIResearch
📝 Summary:
Chain-of-View CoV prompting helps vision-language models improve spatial reasoning in 3D embodied question answering. It actively selects question-aligned views and iteratively adjusts camera positions to gather context, significantly boosting performance without additional training.
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05172
• PDF: https://arxiv.org/pdf/2601.05172
• Github: https://github.com/ziplab/CoV
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VisionLanguageModels #PromptEngineering #EmbodiedAI #AIResearch