✨WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
📝 Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.
🔹 Publication Date: Published on Nov 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09515
• PDF: https://arxiv.org/pdf/2511.09515
• Project Page: https://wm-po.github.io/
• Github: https://github.com/WM-PO/WMPO
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI
📝 Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.
🔹 Publication Date: Published on Nov 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09515
• PDF: https://arxiv.org/pdf/2511.09515
• Project Page: https://wm-po.github.io/
• Github: https://github.com/WM-PO/WMPO
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI
❤1
✨PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
📝 Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.
🔹 Publication Date: Published on Nov 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09057
• PDF: https://arxiv.org/pdf/2511.09057
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WorldModels #AI #Simulation #GenerativeAI #Robotics
📝 Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.
🔹 Publication Date: Published on Nov 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09057
• PDF: https://arxiv.org/pdf/2511.09057
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WorldModels #AI #Simulation #GenerativeAI #Robotics
❤1
✨RynnVLA-002: A Unified Vision-Language-Action and World Model
📝 Summary:
RynnVLA-002 unifies a Vision-Language-Action and world model, enabling joint learning of environmental dynamics and action planning. This mutual enhancement leads to superior performance, achieving 97.4% success in simulation and a 50% boost in real-world robot tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17502
• PDF: https://arxiv.org/pdf/2511.17502
• Github: https://github.com/alibaba-damo-academy/RynnVLA-002
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageAction #WorldModels #Robotics #AI #DeepLearning
📝 Summary:
RynnVLA-002 unifies a Vision-Language-Action and world model, enabling joint learning of environmental dynamics and action planning. This mutual enhancement leads to superior performance, achieving 97.4% success in simulation and a 50% boost in real-world robot tasks.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17502
• PDF: https://arxiv.org/pdf/2511.17502
• Github: https://github.com/alibaba-damo-academy/RynnVLA-002
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageAction #WorldModels #Robotics #AI #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?
📝 Summary:
Target-Bench evaluates world models for mapless robot path planning to semantic targets in real-world environments. It reveals off-the-shelf models perform poorly, but fine-tuning significantly improves their planning capability.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17792
• PDF: https://arxiv.org/pdf/2511.17792
• Project Page: https://target-bench.github.io/
• Github: https://github.com/TUM-AVS/target-bench
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #PathPlanning #WorldModels #ArtificialIntelligence #MachineLearning
📝 Summary:
Target-Bench evaluates world models for mapless robot path planning to semantic targets in real-world environments. It reveals off-the-shelf models perform poorly, but fine-tuning significantly improves their planning capability.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17792
• PDF: https://arxiv.org/pdf/2511.17792
• Project Page: https://target-bench.github.io/
• Github: https://github.com/TUM-AVS/target-bench
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #PathPlanning #WorldModels #ArtificialIntelligence #MachineLearning
✨GigaWorld-0: World Models as Data Engine to Empower Embodied AI
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
📝 Summary:
GigaWorld-0 is a unified world model framework that generates high-quality, diverse, and physically plausible VLA data by integrating video and 3D modeling. This synthetic data enables embodied AI models to achieve strong real-world performance on physical robots without any real-world training.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19861
• PDF: https://arxiv.org/pdf/2511.19861
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EmbodiedAI #WorldModels #SyntheticData #AI #Robotics
✨Reinforcing Action Policies by Prophesying
📝 Summary:
ProphRL improves Vision-Language-Action policies by overcoming imitation learning limits. It uses Prophet, a learned world model simulator, with tailored reinforcement learning FA-GRPO and FlowScale for data-efficient and stable post-training. This yields significant success gains on benchmarks a...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20633
• PDF: https://arxiv.org/pdf/2511.20633
• Project Page: https://logosroboticsgroup.github.io/ProphRL/
• Github: https://github.com/LogosRoboticsGroup/ProphRL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ProphRL #WorldModels #Robotics #DeepLearning
📝 Summary:
ProphRL improves Vision-Language-Action policies by overcoming imitation learning limits. It uses Prophet, a learned world model simulator, with tailored reinforcement learning FA-GRPO and FlowScale for data-efficient and stable post-training. This yields significant success gains on benchmarks a...
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20633
• PDF: https://arxiv.org/pdf/2511.20633
• Project Page: https://logosroboticsgroup.github.io/ProphRL/
• Github: https://github.com/LogosRoboticsGroup/ProphRL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #ProphRL #WorldModels #Robotics #DeepLearning
✨GigaBrain-0: A World Model-Powered Vision-Language-Action Model
📝 Summary:
GigaBrain-0 is a VLA model that uses world model-generated data to overcome limitations of real robot data, improving cross-task generalization and policy robustness. This boosts real-world performance on complex manipulation tasks.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19430
• PDF: https://arxiv.org/pdf/2510.19430
• Project Page: https://gigabrain0.github.io/
• Github: https://github.com/open-gigaai/giga-brain-0
🔹 Models citing this paper:
• https://huggingface.co/open-gigaai/GigaBrain-0-3.5B-Base
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #WorldModels #Robotics #AI #MachineLearning
📝 Summary:
GigaBrain-0 is a VLA model that uses world model-generated data to overcome limitations of real robot data, improving cross-task generalization and policy robustness. This boosts real-world performance on complex manipulation tasks.
🔹 Publication Date: Published on Oct 22
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.19430
• PDF: https://arxiv.org/pdf/2510.19430
• Project Page: https://gigabrain0.github.io/
• Github: https://github.com/open-gigaai/giga-brain-0
🔹 Models citing this paper:
• https://huggingface.co/open-gigaai/GigaBrain-0-3.5B-Base
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLAModels #WorldModels #Robotics #AI #MachineLearning
❤2
✨WorldVLA: Towards Autoregressive Action World Model
📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.
🔹 Publication Date: Published on Jun 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #MachineLearning #Robotics #ComputerVision #WorldModels
📝 Summary:
WorldVLA unifies VLA and world models, showing mutual enhancement in image understanding and action generation. It addresses autoregressive action prediction errors with an attention mask strategy that significantly improves performance.
🔹 Publication Date: Published on Jun 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2506.21539
• PDF: https://arxiv.org/pdf/2506.21539
• Project Page: https://github.com/alibaba-damo-academy/WorldVLA
• Github: https://github.com/alibaba-damo-academy/WorldVLA
🔹 Models citing this paper:
• https://huggingface.co/Alibaba-DAMO-Academy/WorldVLA
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-Goal-256
• https://huggingface.co/jcenaa/WorldVLA-ActionModel-LIBERO-10-256
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #MachineLearning #Robotics #ComputerVision #WorldModels
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨RELIC: Interactive Video World Model with Long-Horizon Memory
📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
✨Act2Goal: From World Model To General Goal-conditioned Policy
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
✨Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models
📝 Summary:
This paper presents a four-stage framework for AI in digital twins: modeling, mirroring, intervention, and autonomous management. It details how physics-informed AI and large language models empower proactive, self-improving digital twins, acknowledging key challenges.
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01321
• PDF: https://arxiv.org/pdf/2601.01321
• Github: https://github.com/rongzhou7/Awesome-Digital-Twin-AI/tree/main
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DigitalTwin #AI #LLM #WorldModels #PhysicsInformedAI
📝 Summary:
This paper presents a four-stage framework for AI in digital twins: modeling, mirroring, intervention, and autonomous management. It details how physics-informed AI and large language models empower proactive, self-improving digital twins, acknowledging key challenges.
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01321
• PDF: https://arxiv.org/pdf/2601.01321
• Github: https://github.com/rongzhou7/Awesome-Digital-Twin-AI/tree/main
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DigitalTwin #AI #LLM #WorldModels #PhysicsInformedAI
arXiv.org
Digital Twin AI: Opportunities and Challenges from Large Language...
Digital twins, as precise digital representations of physical systems, have evolved from passive simulation tools into intelligent and autonomous entities through the integration of artificial...
This media is not supported in your browser
VIEW IN TELEGRAM
✨DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
📝 Summary:
DrivingGen is the first comprehensive benchmark for generative driving world models, addressing prior evaluation gaps. It uses diverse datasets and new metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking reveals trade-offs between visua...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01528
• PDF: https://arxiv.org/pdf/2601.01528
• Project Page: https://drivinggen-bench.github.io/
• Github: https://github.com/youngzhou1999/DrivingGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yangzhou99/DrivingGen
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #GenerativeAI #WorldModels #AIResearch #Benchmarking
📝 Summary:
DrivingGen is the first comprehensive benchmark for generative driving world models, addressing prior evaluation gaps. It uses diverse datasets and new metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking reveals trade-offs between visua...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01528
• PDF: https://arxiv.org/pdf/2601.01528
• Project Page: https://drivinggen-bench.github.io/
• Github: https://github.com/youngzhou1999/DrivingGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yangzhou99/DrivingGen
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #GenerativeAI #WorldModels #AIResearch #Benchmarking
This media is not supported in your browser
VIEW IN TELEGRAM
✨DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
📝 Summary:
DrivingGen is the first comprehensive benchmark for generative driving world models, addressing prior evaluation gaps. It uses diverse datasets and new metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking reveals trade-offs between visua...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01528
• PDF: https://arxiv.org/pdf/2601.01528
• Project Page: https://drivinggen-bench.github.io/
• Github: https://github.com/youngzhou1999/DrivingGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yangzhou99/DrivingGen
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #GenerativeAI #WorldModels #AIResearch #Benchmarking
📝 Summary:
DrivingGen is the first comprehensive benchmark for generative driving world models, addressing prior evaluation gaps. It uses diverse datasets and new metrics to assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking reveals trade-offs between visua...
🔹 Publication Date: Published on Jan 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.01528
• PDF: https://arxiv.org/pdf/2601.01528
• Project Page: https://drivinggen-bench.github.io/
• Github: https://github.com/youngzhou1999/DrivingGen
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yangzhou99/DrivingGen
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AutonomousDriving #GenerativeAI #WorldModels #AIResearch #Benchmarking
❤2