✨EdgeTAM: On-Device Track Anything Model
📝 Summary:
EdgeTAM optimizes SAM 2 for mobile devices by addressing memory attention bottlenecks with a novel 2D Spatial Perceiver. This lightweight Transformer encodes frame-level memories to reduce computational cost. A distillation pipeline improves performance, enabling high-quality video segmentation a...
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.07256
• PDF: https://arxiv.org/pdf/2501.07256
• Github: https://github.com/facebookresearch/edgetam
🔹 Models citing this paper:
• https://huggingface.co/yonigozlan/EdgeTAM-hf
• https://huggingface.co/facebook/EdgeTAM
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merve/EdgeTAM
• https://huggingface.co/spaces/yonigozlan/edgetam
• https://huggingface.co/spaces/facebook/EdgeTAM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EdgeAI #VideoSegmentation #ComputerVision #MobileAI #DeepLearning
📝 Summary:
EdgeTAM optimizes SAM 2 for mobile devices by addressing memory attention bottlenecks with a novel 2D Spatial Perceiver. This lightweight Transformer encodes frame-level memories to reduce computational cost. A distillation pipeline improves performance, enabling high-quality video segmentation a...
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2501.07256
• PDF: https://arxiv.org/pdf/2501.07256
• Github: https://github.com/facebookresearch/edgetam
🔹 Models citing this paper:
• https://huggingface.co/yonigozlan/EdgeTAM-hf
• https://huggingface.co/facebook/EdgeTAM
✨ Spaces citing this paper:
• https://huggingface.co/spaces/merve/EdgeTAM
• https://huggingface.co/spaces/yonigozlan/edgetam
• https://huggingface.co/spaces/facebook/EdgeTAM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EdgeAI #VideoSegmentation #ComputerVision #MobileAI #DeepLearning
arXiv.org
EdgeTAM: On-Device Track Anything Model
On top of Segment Anything Model (SAM), SAM 2 further extends its capability from image to video inputs through a memory bank mechanism and obtains a remarkable performance compared with previous...
❤1
✨ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
📝 Summary:
ReVSeg enhances video object segmentation. It uses sequential reasoning within pretrained vision language models, optimized by reinforcement learning. This achieves state-of-the-art results and provides interpretable reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02835
• PDF: https://arxiv.org/pdf/2512.02835
• Project Page: https://clementine24.github.io/ReVSeg/
• Github: https://github.com/Clementine24/ReVSeg
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #ReinforcementLearning #VisionLanguageModels #ComputerVision #DeepLearning
📝 Summary:
ReVSeg enhances video object segmentation. It uses sequential reasoning within pretrained vision language models, optimized by reinforcement learning. This achieves state-of-the-art results and provides interpretable reasoning.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02835
• PDF: https://arxiv.org/pdf/2512.02835
• Project Page: https://clementine24.github.io/ReVSeg/
• Github: https://github.com/Clementine24/ReVSeg
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #ReinforcementLearning #VisionLanguageModels #ComputerVision #DeepLearning
✨MeViS: A Multi-Modal Dataset for Referring Motion Expression Video Segmentation
📝 Summary:
MeViS is a multi-modal dataset for referring motion expression video segmentation, addressing the need to segment and track objects based on their motion descriptions. It provides text and audio annotations for complex videos, enabling research into motion-guided video understanding.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10945
• PDF: https://arxiv.org/pdf/2512.10945
• Project Page: https://henghuiding.com/MeViS/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #MultiModalAI #ComputerVision #Dataset #MotionUnderstanding
📝 Summary:
MeViS is a multi-modal dataset for referring motion expression video segmentation, addressing the need to segment and track objects based on their motion descriptions. It provides text and audio annotations for complex videos, enabling research into motion-guided video understanding.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10945
• PDF: https://arxiv.org/pdf/2512.10945
• Project Page: https://henghuiding.com/MeViS/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VideoSegmentation #MultiModalAI #ComputerVision #Dataset #MotionUnderstanding
❤2