✨Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
📝 Summary:
Ariadne is a framework using synthetic mazes and RLVR to enhance VLM visual-centric spatial reasoning. It expanded VLM capabilities, raising accuracy from 0 percent to over 50 percent, and significantly improved zero-shot generalization on real-world benchmarks.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00710
• PDF: https://arxiv.org/pdf/2511.00710
• Project Page: https://mingheshen.github.io/Ariadne/
🔹 Models citing this paper:
• https://huggingface.co/KOKKKOKK/Ariadne
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLM #AI #MachineLearning #ComputerVision #SpatialReasoning
📝 Summary:
Ariadne is a framework using synthetic mazes and RLVR to enhance VLM visual-centric spatial reasoning. It expanded VLM capabilities, raising accuracy from 0 percent to over 50 percent, and significantly improved zero-shot generalization on real-world benchmarks.
🔹 Publication Date: Published on Nov 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00710
• PDF: https://arxiv.org/pdf/2511.00710
• Project Page: https://mingheshen.github.io/Ariadne/
🔹 Models citing this paper:
• https://huggingface.co/KOKKKOKK/Ariadne
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLM #AI #MachineLearning #ComputerVision #SpatialReasoning
✨G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
📝 Summary:
G^2VLM integrates 3D geometry learning into vision-language models to overcome their spatial intelligence deficits. It unifies 3D reconstruction and spatial reasoning, leveraging learned 3D features to achieve strong performance in both tasks.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21688
• PDF: https://arxiv.org/pdf/2511.21688
• Project Page: https://gordonhu608.github.io/g2vlm.github.io/
• Github: https://github.com/InternRobotics/G2VLM
🔹 Models citing this paper:
• https://huggingface.co/InternRobotics/G2VLM-2B-MoT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #3DReconstruction #SpatialReasoning #ComputerVision #ArtificialIntelligence
📝 Summary:
G^2VLM integrates 3D geometry learning into vision-language models to overcome their spatial intelligence deficits. It unifies 3D reconstruction and spatial reasoning, leveraging learned 3D features to achieve strong performance in both tasks.
🔹 Publication Date: Published on Nov 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21688
• PDF: https://arxiv.org/pdf/2511.21688
• Project Page: https://gordonhu608.github.io/g2vlm.github.io/
• Github: https://github.com/InternRobotics/G2VLM
🔹 Models citing this paper:
• https://huggingface.co/InternRobotics/G2VLM-2B-MoT
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #3DReconstruction #SpatialReasoning #ComputerVision #ArtificialIntelligence
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Geometrically-Constrained Agent for Spatial Reasoning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
❤1
✨Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation
📝 Summary:
Video4Spatial uses video diffusion models with only visual data to perform complex spatial tasks like navigation and object grounding. It demonstrates strong spatial understanding, planning, and generalization, advancing visuospatial reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03040
• PDF: https://arxiv.org/pdf/2512.03040
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Video4Spatial #VisuospatialAI #DiffusionModels #SpatialReasoning #ComputerVision
📝 Summary:
Video4Spatial uses video diffusion models with only visual data to perform complex spatial tasks like navigation and object grounding. It demonstrates strong spatial understanding, planning, and generalization, advancing visuospatial reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03040
• PDF: https://arxiv.org/pdf/2512.03040
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Video4Spatial #VisuospatialAI #DiffusionModels #SpatialReasoning #ComputerVision
✨Artemis: Structured Visual Reasoning for Perception Policy Learning
📝 Summary:
Artemis improves visual perception by using structured spatial reasoning with label bounding-box pairs instead of linguistic intermediate reasoning. This avoids language ambiguity, enables direct supervision, and leads to strong performance and generalization across diverse visual tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01988
• PDF: https://arxiv.org/pdf/2512.01988
• Project Page: https://vi-ocean.github.io/projects/artemis/
• Github: https://github.com/WayneTomas/Artemis
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualPerception #ComputerVision #SpatialReasoning #AI #MachineLearning
📝 Summary:
Artemis improves visual perception by using structured spatial reasoning with label bounding-box pairs instead of linguistic intermediate reasoning. This avoids language ambiguity, enables direct supervision, and leads to strong performance and generalization across diverse visual tasks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01988
• PDF: https://arxiv.org/pdf/2512.01988
• Project Page: https://vi-ocean.github.io/projects/artemis/
• Github: https://github.com/WayneTomas/Artemis
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisualPerception #ComputerVision #SpatialReasoning #AI #MachineLearning
✨SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
This media is not supported in your browser
VIEW IN TELEGRAM
✨Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
📝 Summary:
DSR Suite improves vision language models weak dynamic spatial reasoning. It creates 4D training data from videos using an automated pipeline and integrates geometric priors via a Geometry Selection Module. This significantly enhances VLM dynamic spatial reasoning capability while maintaining gen...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20557
• PDF: https://arxiv.org/pdf/2512.20557
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #SpatialReasoning #4D #ComputerVision #AIResearch
📝 Summary:
DSR Suite improves vision language models weak dynamic spatial reasoning. It creates 4D training data from videos using an automated pipeline and integrates geometric priors via a Geometry Selection Module. This significantly enhances VLM dynamic spatial reasoning capability while maintaining gen...
🔹 Publication Date: Published on Dec 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20557
• PDF: https://arxiv.org/pdf/2512.20557
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VisionLanguageModels #SpatialReasoning #4D #ComputerVision #AIResearch