✨TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models
📝 Summary:
TabTune is a unified library that standardizes the workflow for tabular foundation models. It provides consistent access to state-of-the-art models, diverse adaptation strategies, and integrated evaluation for performance, calibration, and fairness.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02802
• PDF: https://arxiv.org/pdf/2511.02802
• Github: https://github.com/Lexsi-Labs/TabTune
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TabularData #FoundationModels #MachineLearning #DataScience #AIResearch
📝 Summary:
TabTune is a unified library that standardizes the workflow for tabular foundation models. It provides consistent access to state-of-the-art models, diverse adaptation strategies, and integrated evaluation for performance, calibration, and fairness.
🔹 Publication Date: Published on Nov 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02802
• PDF: https://arxiv.org/pdf/2511.02802
• Github: https://github.com/Lexsi-Labs/TabTune
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TabularData #FoundationModels #MachineLearning #DataScience #AIResearch
❤1
✨DINOv3
📝 Summary:
DINOv3 is a self-supervised vision model excelling across tasks. It scales datasets, prevents dense feature degradation via Gram anchoring, and uses post-hoc strategies for flexibility. This versatile foundation model outperforms specialized state of the art without fine-tuning.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
• Github: https://github.com/facebookresearch/dinov3
🔹 Models citing this paper:
• https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuangzhe1229/test_dataset
• https://huggingface.co/datasets/simon123905/vitl
✨ Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/manu02/DINOv3-Interactive-Patch-Cosine-Similarity
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DINOv3 #SelfSupervisedLearning #ComputerVision #FoundationModels #AI
📝 Summary:
DINOv3 is a self-supervised vision model excelling across tasks. It scales datasets, prevents dense feature degradation via Gram anchoring, and uses post-hoc strategies for flexibility. This versatile foundation model outperforms specialized state of the art without fine-tuning.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
• Github: https://github.com/facebookresearch/dinov3
🔹 Models citing this paper:
• https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m
• https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m
✨ Datasets citing this paper:
• https://huggingface.co/datasets/zhuangzhe1229/test_dataset
• https://huggingface.co/datasets/simon123905/vitl
✨ Spaces citing this paper:
• https://huggingface.co/spaces/atalaydenknalbant/DINOv3
• https://huggingface.co/spaces/manu02/DINOv3-Interactive-Patch-Cosine-Similarity
• https://huggingface.co/spaces/merve/dinov3-viz
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#DINOv3 #SelfSupervisedLearning #ComputerVision #FoundationModels #AI
huggingface.co
DINOv3 - a facebook Collection
DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104
✨OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation
📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
✨Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...
🔹 Publication Date: Published on Nov 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
✨Medal S: Spatio-Textual Prompt Model for Medical Segmentation
📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S
🔹 Models citing this paper:
• https://huggingface.co/spc819/Medal-S-V1.0
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision
📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S
🔹 Models citing this paper:
• https://huggingface.co/spc819/Medal-S-V1.0
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision
✨Scaling Spatial Intelligence with Multimodal Foundation Models
📝 Summary:
SenseNova-SI is a new scaled multimodal foundation model that achieves superior spatial intelligence. By using 8 million diverse data samples, it sets unprecedented performance on various spatial benchmarks. The models are publicly released to foster further research.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13719
• PDF: https://arxiv.org/pdf/2511.13719
• Project Page: https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B
• Github: https://github.com/OpenSenseNova/SenseNova-SI
🔹 Models citing this paper:
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B
• https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #FoundationModels #SpatialIntelligence #ComputerVision #AI
📝 Summary:
SenseNova-SI is a new scaled multimodal foundation model that achieves superior spatial intelligence. By using 8 million diverse data samples, it sets unprecedented performance on various spatial benchmarks. The models are publicly released to foster further research.
🔹 Publication Date: Published on Nov 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13719
• PDF: https://arxiv.org/pdf/2511.13719
• Project Page: https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B
• Github: https://github.com/OpenSenseNova/SenseNova-SI
🔹 Models citing this paper:
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B
• https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B
• https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #FoundationModels #SpatialIntelligence #ComputerVision #AI
arXiv.org
Scaling Spatial Intelligence with Multimodal Foundation Models
Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to...
✨MiMo-Embodied: X-Embodied Foundation Model Technical Report
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
Media is too big
VIEW IN TELEGRAM
✨SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
📝 Summary:
SAM2S is a foundation model enhancing interactive video object segmentation in surgery. It leverages a new large benchmark, robust memory, and temporal learning to achieve superior accuracy 80.42 J and F and real-time performance in surgical video analysis.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16618
• PDF: https://arxiv.org/pdf/2511.16618
• Project Page: https://jinlab-imvr.github.io/SAM2S
• Github: https://github.com/jinlab-imvr/SAM2S
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SurgicalAI #MedicalImaging #ComputerVision #FoundationModels #DeepLearning
📝 Summary:
SAM2S is a foundation model enhancing interactive video object segmentation in surgery. It leverages a new large benchmark, robust memory, and temporal learning to achieve superior accuracy 80.42 J and F and real-time performance in surgical video analysis.
🔹 Publication Date: Published on Nov 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16618
• PDF: https://arxiv.org/pdf/2511.16618
• Project Page: https://jinlab-imvr.github.io/SAM2S
• Github: https://github.com/jinlab-imvr/SAM2S
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SurgicalAI #MedicalImaging #ComputerVision #FoundationModels #DeepLearning
❤1
✨Pillar-0: A New Frontier for Radiology Foundation Models
📝 Summary:
Pillar-0 is a new radiology foundation model pretrained on diverse CT/MRI scans, utilizing RATE for scalable label extraction. It significantly outperforms existing models across various radiology tasks and extends to new applications like lung cancer risk prediction and brain hemorrhage detection.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17803
• PDF: https://arxiv.org/pdf/2511.17803
• Github: https://github.com/YalaLab/rate-evals
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Radiology #FoundationModels #AI #MedicalImaging #MachineLearning
📝 Summary:
Pillar-0 is a new radiology foundation model pretrained on diverse CT/MRI scans, utilizing RATE for scalable label extraction. It significantly outperforms existing models across various radiology tasks and extends to new applications like lung cancer risk prediction and brain hemorrhage detection.
🔹 Publication Date: Published on Nov 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17803
• PDF: https://arxiv.org/pdf/2511.17803
• Github: https://github.com/YalaLab/rate-evals
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#Radiology #FoundationModels #AI #MedicalImaging #MachineLearning
✨Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus
📝 Summary:
Benchmarking LLMs on subjective tasks like emotional intelligence is challenging. The Language Model Council LMC uses a democratic process with 20 LLMs to formulate, administer, and evaluate tests. This yields more robust, less biased rankings that align better with human leaderboards.
🔹 Publication Date: Published on Jun 12, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08598
• PDF: https://arxiv.org/pdf/2406.08598
• Github: https://github.com/llm-council/llm-council
✨ Datasets citing this paper:
• https://huggingface.co/datasets/llm-council/emotional_application
✨ Spaces citing this paper:
• https://huggingface.co/spaces/llm-council/llm-council
• https://huggingface.co/spaces/llm-council/sandbox
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Benchmarking #AIEvaluation #FoundationModels #ConsensusAI
📝 Summary:
Benchmarking LLMs on subjective tasks like emotional intelligence is challenging. The Language Model Council LMC uses a democratic process with 20 LLMs to formulate, administer, and evaluate tests. This yields more robust, less biased rankings that align better with human leaderboards.
🔹 Publication Date: Published on Jun 12, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08598
• PDF: https://arxiv.org/pdf/2406.08598
• Github: https://github.com/llm-council/llm-council
✨ Datasets citing this paper:
• https://huggingface.co/datasets/llm-council/emotional_application
✨ Spaces citing this paper:
• https://huggingface.co/spaces/llm-council/llm-council
• https://huggingface.co/spaces/llm-council/sandbox
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #Benchmarking #AIEvaluation #FoundationModels #ConsensusAI
✨MedSAM3: Delving into Segment Anything with Medical Concepts
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...
🔹 Publication Date: Published on Nov 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
✨Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.
🔹 Publication Date: Published on Nov 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
❤1
✨From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
📝 Summary:
This paper provides a practical guide to code LLMs, covering their lifecycle from data to deployment. It examines techniques, analyzes various models, and discusses real-world challenges like correctness and security. Experiments on pre-training and fine-tuning are included.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18538
• PDF: https://arxiv.org/pdf/2511.18538
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#CodeLLMs #AI #MachineLearning #SoftwareEngineering #FoundationModels
📝 Summary:
This paper provides a practical guide to code LLMs, covering their lifecycle from data to deployment. It examines techniques, analyzes various models, and discusses real-world challenges like correctness and security. Experiments on pre-training and fine-tuning are included.
🔹 Publication Date: Published on Nov 23
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18538
• PDF: https://arxiv.org/pdf/2511.18538
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#CodeLLMs #AI #MachineLearning #SoftwareEngineering #FoundationModels
✨OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion
📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion
🔹 Models citing this paper:
• https://huggingface.co/skoneru/OmniFusion
• https://huggingface.co/skoneru/OmniFusion_v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion
🔹 Models citing this paper:
• https://huggingface.co/skoneru/OmniFusion
• https://huggingface.co/skoneru/OmniFusion_v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
👍1
✨LFM2 Technical Report
📝 Summary:
LFM2 is a family of compact foundation models designed for efficient on-device deployment. It uses hardware-in-the-loop architecture search and advanced training to achieve high performance across diverse tasks, including multimodal applications.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23404
• PDF: https://arxiv.org/pdf/2511.23404
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EdgeAI #MultimodalAI #AIResearch #MachineLearning
📝 Summary:
LFM2 is a family of compact foundation models designed for efficient on-device deployment. It uses hardware-in-the-loop architecture search and advanced training to achieve high performance across diverse tasks, including multimodal applications.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23404
• PDF: https://arxiv.org/pdf/2511.23404
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#FoundationModels #EdgeAI #MultimodalAI #AIResearch #MachineLearning
✨Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Yejy53/Echo-4o-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.
🔹 Publication Date: Published on Aug 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Yejy53/Echo-4o-Image
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
✨The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation
📝 Summary:
This paper highlights the gap between SAM2 and SAM3. SAM2 uses spatial prompts for geometric segmentation, but SAM3 is a concept-driven multimodal model with a unified vision-language architecture. SAM3 represents a new class of foundation model for concept-driven segmentation.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06032
• PDF: https://arxiv.org/pdf/2512.06032
• Github: https://github.com/Applied-AI-Research-Lab/The-SAM2-to-SAM3-Gap-in-the-Segment-Anything-Model-Family
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageSegmentation #FoundationModels #ComputerVision #MultimodalAI #AIResearch
📝 Summary:
This paper highlights the gap between SAM2 and SAM3. SAM2 uses spatial prompts for geometric segmentation, but SAM3 is a concept-driven multimodal model with a unified vision-language architecture. SAM3 represents a new class of foundation model for concept-driven segmentation.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06032
• PDF: https://arxiv.org/pdf/2512.06032
• Github: https://github.com/Applied-AI-Research-Lab/The-SAM2-to-SAM3-Gap-in-the-Segment-Anything-Model-Family
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#ImageSegmentation #FoundationModels #ComputerVision #MultimodalAI #AIResearch
❤1
✨SAM Audio: Segment Anything in Audio
📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio
🔹 Models citing this paper:
• https://huggingface.co/facebook/sam-audio-large
• https://huggingface.co/facebook/sam-audio-small
• https://huggingface.co/facebook/sam-audio-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/lpeterl/sam-audio-webui
• https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
• https://huggingface.co/spaces/chippie1/SAM-Audio-Demo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio
🔹 Models citing this paper:
• https://huggingface.co/facebook/sam-audio-large
• https://huggingface.co/facebook/sam-audio-small
• https://huggingface.co/facebook/sam-audio-base
✨ Spaces citing this paper:
• https://huggingface.co/spaces/lpeterl/sam-audio-webui
• https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
• https://huggingface.co/spaces/chippie1/SAM-Audio-Demo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
arXiv.org
SAM Audio: Segment Anything in Audio
General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation models are...
✨A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems
📝 Summary:
This survey reviews self-evolving AI agents that adapt to dynamic environments via automatic enhancement from interaction data. It proposes a unified framework and systematically reviews current techniques, addressing evaluation, safety, and ethics.
🔹 Publication Date: Published on Aug 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents
✨ Spaces citing this paper:
• https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SelfEvolvingAI #AIAgents #FoundationModels #LifelongLearning #ArtificialIntelligence
📝 Summary:
This survey reviews self-evolving AI agents that adapt to dynamic environments via automatic enhancement from interaction data. It proposes a unified framework and systematically reviews current techniques, addressing evaluation, safety, and ethics.
🔹 Publication Date: Published on Aug 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents
✨ Spaces citing this paper:
• https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SelfEvolvingAI #AIAgents #FoundationModels #LifelongLearning #ArtificialIntelligence
❤1
✨Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
📝 Summary:
Omni-Weather is a new multimodal foundation model that unifies weather generation and understanding in a single architecture. It uses shared self-attention and a Chain-of-Thought dataset for interpretable, high-quality outputs, achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21643
• PDF: https://arxiv.org/pdf/2512.21643
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WeatherGeneration #FoundationModels #MultimodalAI #AIResearch #DeepLearning
📝 Summary:
Omni-Weather is a new multimodal foundation model that unifies weather generation and understanding in a single architecture. It uses shared self-attention and a Chain-of-Thought dataset for interpretable, high-quality outputs, achieving state-of-the-art performance.
🔹 Publication Date: Published on Dec 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21643
• PDF: https://arxiv.org/pdf/2512.21643
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#WeatherGeneration #FoundationModels #MultimodalAI #AIResearch #DeepLearning
❤1