ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

📝 Summary:
Ariadne is a framework using synthetic mazes and RLVR to enhance VLM visual-centric spatial reasoning. It expanded VLM capabilities, raising accuracy from 0 percent to over 50 percent, and significantly improved zero-shot generalization on real-world benchmarks.

🔹 Publication Date: Published on Nov 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.00710
• PDF: https://arxiv.org/pdf/2511.00710
• Project Page: https://mingheshen.github.io/Ariadne/

🔹 Models citing this paper:
https://huggingface.co/KOKKKOKK/Ariadne

==================================

For more data science resources:
https://t.me/DataScienceT

#VLM #AI #MachineLearning #ComputerVision #SpatialReasoning
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

📝 Summary:
G^2VLM integrates 3D geometry learning into vision-language models to overcome their spatial intelligence deficits. It unifies 3D reconstruction and spatial reasoning, leveraging learned 3D features to achieve strong performance in both tasks.

🔹 Publication Date: Published on Nov 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.21688
• PDF: https://arxiv.org/pdf/2511.21688
• Project Page: https://gordonhu608.github.io/g2vlm.github.io/
• Github: https://github.com/InternRobotics/G2VLM

🔹 Models citing this paper:
https://huggingface.co/InternRobotics/G2VLM-2B-MoT

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #3DReconstruction #SpatialReasoning #ComputerVision #ArtificialIntelligence
1
This media is not supported in your browser
VIEW IN TELEGRAM
Geometrically-Constrained Agent for Spatial Reasoning

📝 Summary:
Geometrically Constrained Agent GCA resolves the semantic to geometric gap in VLMs for spatial reasoning. It uses a formal task constraint to guide the VLM from semantic analysis to constrained tool execution, achieving SOTA performance.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22659
• PDF: https://arxiv.org/pdf/2511.22659
• Project Page: https://gca-spatial-reasoning.github.io
• Github: https://github.com/gca-spatial-reasoning/gca

==================================

For more data science resources:
https://t.me/DataScienceT

#SpatialReasoning #VLMs #AI #Robotics #DeepLearning
1
Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation

📝 Summary:
Video4Spatial uses video diffusion models with only visual data to perform complex spatial tasks like navigation and object grounding. It demonstrates strong spatial understanding, planning, and generalization, advancing visuospatial reasoning.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03040
• PDF: https://arxiv.org/pdf/2512.03040

==================================

For more data science resources:
https://t.me/DataScienceT

#Video4Spatial #VisuospatialAI #DiffusionModels #SpatialReasoning #ComputerVision
Artemis: Structured Visual Reasoning for Perception Policy Learning

📝 Summary:
Artemis improves visual perception by using structured spatial reasoning with label bounding-box pairs instead of linguistic intermediate reasoning. This avoids language ambiguity, enables direct supervision, and leads to strong performance and generalization across diverse visual tasks.

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01988
• PDF: https://arxiv.org/pdf/2512.01988
• Project Page: https://vi-ocean.github.io/projects/artemis/
• Github: https://github.com/WayneTomas/Artemis

==================================

For more data science resources:
https://t.me/DataScienceT

#VisualPerception #ComputerVision #SpatialReasoning #AI #MachineLearning
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
This media is not supported in your browser
VIEW IN TELEGRAM
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

📝 Summary:
DSR Suite improves vision language models weak dynamic spatial reasoning. It creates 4D training data from videos using an automated pipeline and integrates geometric priors via a Geometry Selection Module. This significantly enhances VLM dynamic spatial reasoning capability while maintaining gen...

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20557
• PDF: https://arxiv.org/pdf/2512.20557

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #SpatialReasoning #4D #ComputerVision #AIResearch