ML Research Hub

✨Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

📝 Summary:
VR-Bench evaluates video models' spatial reasoning using maze-solving tasks. It demonstrates that video models excel in spatial perception and reasoning, outperforming VLMs, and benefit from diverse sampling during inference. These findings show the strong potential of reasoning via video for spa...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.15065
• PDF: https://arxiv.org/pdf/2511.15065
• Project Page: https://imyangc7.github.io/VRBench_Web/
• Github: https://github.com/ImYangC7/VR-Bench

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#VideoModels #AIReasoning #SpatialAI #ComputerVision #MachineLearning

❤1

283 views03:00

✨iMontage: Unified, Versatile, Highly Dynamic Many-to-many Image Generation

📝 Summary:
iMontage repurposes pre-trained video models to generate high-quality, diverse image sets. It uses a unified framework and minimal adaptation, combining temporal coherence with image diversity for natural transitions and expanded dynamics across many tasks.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20635
• PDF: https://arxiv.org/pdf/2511.20635
• Project Page: https://kr1sjfu.github.io/iMontage-web/

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#ImageGeneration #DeepLearning #ComputerVision #AIMethods #VideoModels

230 views04:02