✨DINOv3
📝 Summary:
DINOv3 is a self-supervised vision model excelling across tasks. It scales datasets, prevents dense feature degradation via Gram anchoring, and uses post-hoc strategies for flexibility. This versatile foundation model outperforms specialized state of the art without fine-tuning.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
• Github: https://github.com/facebookresearch/dinov3
🔹 Models citing this paper:
• https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuangzhe1229/test_dataset
• https://huggingface.co/datasets/simon123905/vitl
✨ Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/manu02/DINOv3-Interactive-Patch-Cosine-Similarity
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DINOv3 #SelfSupervisedLearning #ComputerVision #FoundationModels #AI
📝 Summary:
DINOv3 is a self-supervised vision model excelling across tasks. It scales datasets, prevents dense feature degradation via Gram anchoring, and uses post-hoc strategies for flexibility. This versatile foundation model outperforms specialized state of the art without fine-tuning.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
• Github: https://github.com/facebookresearch/dinov3
🔹 Models citing this paper:
• https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuangzhe1229/test_dataset
• https://huggingface.co/datasets/simon123905/vitl
✨ Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/manu02/DINOv3-Interactive-Patch-Cosine-Similarity
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DINOv3 #SelfSupervisedLearning #ComputerVision #FoundationModels #AI
huggingface.co
DINOv3 - a facebook Collection
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
🤖🧠 Concerto: How Joint 2D-3D Self-Supervised Learning Is Redefining Spatial Intelligence
🗓️ 09 Nov 2025
📚 AI News & Trends
The world of artificial intelligence is rapidly evolving and self-supervised learning has become a driving force behind breakthroughs in computer vision and 3D scene understanding. Traditional supervised learning relies heavily on labeled datasets which are expensive and time-consuming to produce. Self-supervised learning, on the other hand, extracts meaningful patterns without manual labels allowing models to ...
#SelfSupervisedLearning #ComputerVision #3DSceneUnderstanding #SpatialIntelligence #AIResearch #DeepLearning
🗓️ 09 Nov 2025
📚 AI News & Trends
The world of artificial intelligence is rapidly evolving and self-supervised learning has become a driving force behind breakthroughs in computer vision and 3D scene understanding. Traditional supervised learning relies heavily on labeled datasets which are expensive and time-consuming to produce. Self-supervised learning, on the other hand, extracts meaningful patterns without manual labels allowing models to ...
#SelfSupervisedLearning #ComputerVision #3DSceneUnderstanding #SpatialIntelligence #AIResearch #DeepLearning
✨VideoSSR: Video Self-Supervised Reinforcement Learning
📝 Summary:
VideoSSR is a novel self-supervised reinforcement learning framework that leverages intrinsic video information to generate high-quality training data. It uses three pretext tasks and the VideoSSR-30K dataset, improving MLLM performance across 17 benchmarks by over 5%.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06281
• PDF: https://arxiv.org/pdf/2511.06281
• Project Page: https://github.com/lcqysl/VideoSSR
• Github: https://github.com/lcqysl/VideoSSR
🔹 Models citing this paper:
• https://huggingface.co/yhx12/VideoSSR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #SelfSupervisedLearning #VideoAI #MachineLearning #DeepLearning
📝 Summary:
VideoSSR is a novel self-supervised reinforcement learning framework that leverages intrinsic video information to generate high-quality training data. It uses three pretext tasks and the VideoSSR-30K dataset, improving MLLM performance across 17 benchmarks by over 5%.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06281
• PDF: https://arxiv.org/pdf/2511.06281
• Project Page: https://github.com/lcqysl/VideoSSR
• Github: https://github.com/lcqysl/VideoSSR
🔹 Models citing this paper:
• https://huggingface.co/yhx12/VideoSSR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ReinforcementLearning #SelfSupervisedLearning #VideoAI #MachineLearning #DeepLearning
✨OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation
📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
✨UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
📝 Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13714
• PDF: https://arxiv.org/pdf/2511.13714
• Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
• Github: https://github.com/yujunwei04/UnSAMv2
✨ Spaces citing this paper:
• https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
📝 Summary:
UnSAMv2 enables continuous segmentation granularity control for the SAM model without human annotations. It uses self-supervised learning on unlabeled data to discover mask-granularity pairs and a novel control embedding. UnSAMv2 significantly enhances SAM-2s performance across various segmentati...
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13714
• PDF: https://arxiv.org/pdf/2511.13714
• Project Page: https://yujunwei04.github.io/UnSAMv2-Project-Page/
• Github: https://github.com/yujunwei04/UnSAMv2
✨ Spaces citing this paper:
• https://huggingface.co/spaces/yujunwei04/UnSAMv2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ComputerVision #SelfSupervisedLearning #ImageSegmentation #DeepLearning
✨Φeat: Physically-Grounded Feature Representation
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
📝 Summary:
Φeat is a new self-supervised visual backbone that captures material identity like reflectance and mesostructure. It learns robust features invariant to external physical factors such as shape and lighting, promoting physics-aware perception.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11270
• PDF: https://arxiv.org/pdf/2511.11270
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ComputerVision #SelfSupervisedLearning #DeepLearning #FeatureLearning #PhysicsAwareAI
✨EvoVLA: Self-Evolving Vision-Language-Action Model
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
EvoVLA is a self-supervised VLA framework tackling stage hallucination in long-horizon robotic manipulation. It uses triplet contrastive learning, pose-based exploration, and memory to prevent shortcuts. EvoVLA significantly improves success, sample efficiency, and reduces hallucination in sim an...
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16166
• PDF: https://arxiv.org/pdf/2511.16166
• Project Page: https://aigeeksgroup.github.io/EvoVLA/
• Github: https://aigeeksgroup.github.io/EvoVLA/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Robotics #VisionLanguageAction #SelfSupervisedLearning #AI #DeepLearning
✨TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition
📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/TRivia-3B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/TRivia-3B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning
📝 Summary:
TRivia is a self-supervised fine-tuning method for vision-language models to learn table recognition from unlabeled data. It uses a question-answering reward mechanism to autonomously optimize the model. This open-source solution outperforms state-of-the-art systems on popular benchmarks.
🔹 Publication Date: Published on Dec 1
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01248
• PDF: https://arxiv.org/pdf/2512.01248
• Github: https://github.com/opendatalab/TRivia
🔹 Models citing this paper:
• https://huggingface.co/opendatalab/TRivia-3B
✨ Spaces citing this paper:
• https://huggingface.co/spaces/opendatalab/TRivia-3B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TableRecognition #VisionLanguageModels #SelfSupervisedLearning #AI #DeepLearning
🤖🧠 S3PRL Toolkit: Advancing Self-Supervised Speech Representation Learning
🗓️ 13 Dec 2025
📚 AI News & Trends
The field of speech technology has witnessed a transformative shift in recent years, powered by the rise of self-supervised learning (SSL). Instead of relying on large amounts of labeled data, self-supervised models learn from the patterns and structures inherent in raw audio, enabling powerful and general-purpose speech representations. At the forefront of this innovation stands ...
#S3PRL #SelfSupervisedLearning #SpeechTechnology #SSL #SpeechRepresentationLearning #AI
🗓️ 13 Dec 2025
📚 AI News & Trends
The field of speech technology has witnessed a transformative shift in recent years, powered by the rise of self-supervised learning (SSL). Instead of relying on large amounts of labeled data, self-supervised models learn from the patterns and structures inherent in raw audio, enabling powerful and general-purpose speech representations. At the forefront of this innovation stands ...
#S3PRL #SelfSupervisedLearning #SpeechTechnology #SSL #SpeechRepresentationLearning #AI
❤2
✨Puzzle Curriculum GRPO for Vision-Centric Reasoning
📝 Summary:
Puzzle Curriculum GRPO PC-GRPO improves VLM visual reasoning without annotations. It uses self-supervised puzzle environments for verifiable rewards and a difficulty-aware curriculum to enhance consistency and accuracy.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14944
• PDF: https://arxiv.org/pdf/2512.14944
• Project Page: https://pcgrpo.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLM #VisualReasoning #SelfSupervisedLearning #ComputerVision #AI
📝 Summary:
Puzzle Curriculum GRPO PC-GRPO improves VLM visual reasoning without annotations. It uses self-supervised puzzle environments for verifiable rewards and a difficulty-aware curriculum to enhance consistency and accuracy.
🔹 Publication Date: Published on Dec 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.14944
• PDF: https://arxiv.org/pdf/2512.14944
• Project Page: https://pcgrpo.github.io/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#VLM #VisualReasoning #SelfSupervisedLearning #ComputerVision #AI
❤1