ML Research Hub
32.8K subscribers
4.13K photos
243 videos
23 files
4.45K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
EdgeTAM: On-Device Track Anything Model

📝 Summary:
EdgeTAM optimizes SAM 2 for mobile devices by addressing memory attention bottlenecks with a novel 2D Spatial Perceiver. This lightweight Transformer encodes frame-level memories to reduce computational cost. A distillation pipeline improves performance, enabling high-quality video segmentation a...

🔹 Publication Date: Published on Jan 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.07256
• PDF: https://arxiv.org/pdf/2501.07256
• Github: https://github.com/facebookresearch/edgetam

🔹 Models citing this paper:
https://huggingface.co/yonigozlan/EdgeTAM-hf
https://huggingface.co/facebook/EdgeTAM

Spaces citing this paper:
https://huggingface.co/spaces/merve/EdgeTAM
https://huggingface.co/spaces/yonigozlan/edgetam
https://huggingface.co/spaces/facebook/EdgeTAM

==================================

For more data science resources:
https://t.me/DataScienceT

#EdgeAI #VideoSegmentation #ComputerVision #MobileAI #DeepLearning
1
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

📝 Summary:
ReVSeg enhances video object segmentation. It uses sequential reasoning within pretrained vision language models, optimized by reinforcement learning. This achieves state-of-the-art results and provides interpretable reasoning.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02835
• PDF: https://arxiv.org/pdf/2512.02835
• Project Page: https://clementine24.github.io/ReVSeg/
• Github: https://github.com/Clementine24/ReVSeg

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoSegmentation #ReinforcementLearning #VisionLanguageModels #ComputerVision #DeepLearning
MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation

📝 Summary:
MeViS is a multi-modal dataset for referring motion expression video segmentation, addressing the need to segment and track objects based on their motion descriptions. It provides text and audio annotations for complex videos, enabling research into motion-guided video understanding.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10945
• PDF: https://arxiv.org/pdf/2512.10945
• Project Page: https://henghuiding.com/MeViS/

==================================

For more data science resources:
https://t.me/DataScienceT

#VideoSegmentation #MultiModalAI #ComputerVision #Dataset #MotionUnderstanding
2