ML Research Hub
32.7K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models

📝 Summary:
TabTune is a unified library that standardizes the workflow for tabular foundation models. It provides consistent access to state-of-the-art models, diverse adaptation strategies, and integrated evaluation for performance, calibration, and fairness.

🔹 Publication Date: Published on Nov 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.02802
• PDF: https://arxiv.org/pdf/2511.02802
• Github: https://github.com/Lexsi-Labs/TabTune

==================================

For more data science resources:
https://t.me/DataScienceT

#TabularData #FoundationModels #MachineLearning #DataScience #AIResearch
1
DINOv3

📝 Summary:
DINOv3 is a self-supervised vision model excelling across tasks. It scales datasets, prevents dense feature degradation via Gram anchoring, and uses post-hoc strategies for flexibility. This versatile foundation model outperforms specialized state of the art without fine-tuning.

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/facebook/dinov3
• PDF: https://arxiv.org/pdf/2508.10104
• Project Page: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/
• Github: https://github.com/facebookresearch/dinov3

🔹 Models citing this paper:
https://huggingface.co/facebook/dinov3-vit7b16-pretrain-lvd1689m
https://huggingface.co/facebook/dinov3-vitb16-pretrain-lvd1689m
https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m

Datasets citing this paper:
https://huggingface.co/datasets/zhuangzhe1229/test_dataset
https://huggingface.co/datasets/simon123905/vitl

Spaces citing this paper:
https://huggingface.co/spaces/atalaydenknalbant/DINOv3
https://huggingface.co/spaces/manu02/DINOv3-Interactive-Patch-Cosine-Similarity
https://huggingface.co/spaces/merve/dinov3-viz

==================================

For more data science resources:
https://t.me/DataScienceT

#DINOv3 #SelfSupervisedLearning #ComputerVision #FoundationModels #AI
OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

📝 Summary:
OlmoEarth is a novel multimodal spatio-temporal foundation model for Earth observation data. It employs new self-supervised learning methods to achieve state-of-the-art performance on many tasks. It is deployed as a platform for non-profits and NGOs.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13655
• PDF: https://arxiv.org/pdf/2511.13655
• Project Page: https://olmoearth.allenai.org/
• Github: https://github.com/allenai/olmoearth_pretrain

==================================

For more data science resources:
https://t.me/DataScienceT

#EarthObservation #FoundationModels #AI #RemoteSensing #SelfSupervisedLearning
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

📝 Summary:
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and video generation. It includes Lite and Pro versions with varying parameters and uses advanced training techniques for superior quality and speed. This publicly available framework aims to advance generat...

🔹 Publication Date: Published on Nov 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14993
• PDF: https://arxiv.org/pdf/2511.14993
• Project Page: https://kandinskylab.ai/
• Github: https://github.com/kandinskylab/kandinsky-5

==================================

For more data science resources:
https://t.me/DataScienceT

#FoundationModels #ImageGeneration #VideoGeneration #AI #DeepLearning
Medal S: Spatio-Textual Prompt Model for Medical Segmentation

📝 Summary:
Medal S is a medical segmentation foundation model using spatio-textual prompts for efficient, high-accuracy multi-class segmentation across diverse modalities. It uniquely aligns volumetric prompts with text embeddings and processes masks in parallel, significantly outperforming prior methods.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13001
• PDF: https://arxiv.org/pdf/2511.13001
• Github: https://github.com/yinghemedical/Medal-S

🔹 Models citing this paper:
https://huggingface.co/spc819/Medal-S-V1.0

==================================

For more data science resources:
https://t.me/DataScienceT

#MedicalSegmentation #FoundationModels #AI #DeepLearning #ComputerVision
Scaling Spatial Intelligence with Multimodal Foundation Models

📝 Summary:
SenseNova-SI is a new scaled multimodal foundation model that achieves superior spatial intelligence. By using 8 million diverse data samples, it sets unprecedented performance on various spatial benchmarks. The models are publicly released to foster further research.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13719
• PDF: https://arxiv.org/pdf/2511.13719
• Project Page: https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-8B
• Github: https://github.com/OpenSenseNova/SenseNova-SI

🔹 Models citing this paper:
https://huggingface.co/sensenova/SenseNova-SI-InternVL3-8B
https://huggingface.co/sensenova/SenseNova-SI-InternVL3-2B
https://huggingface.co/sensenova/SenseNova-SI-1.1-InternVL3-2B

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #FoundationModels #SpatialIntelligence #ComputerVision #AI
MiMo-Embodied: X-Embodied Foundation Model Technical Report

📝 Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16518
• PDF: https://arxiv.org/pdf/2511.16518
• Github: https://github.com/XiaomiMiMo/MiMo-Embodied

==================================

For more data science resources:
https://t.me/DataScienceT

#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
Media is too big
VIEW IN TELEGRAM
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking

📝 Summary:
SAM2S is a foundation model enhancing interactive video object segmentation in surgery. It leverages a new large benchmark, robust memory, and temporal learning to achieve superior accuracy 80.42 J and F and real-time performance in surgical video analysis.

🔹 Publication Date: Published on Nov 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.16618
• PDF: https://arxiv.org/pdf/2511.16618
• Project Page: https://jinlab-imvr.github.io/SAM2S
• Github: https://github.com/jinlab-imvr/SAM2S

==================================

For more data science resources:
https://t.me/DataScienceT

#SurgicalAI #MedicalImaging #ComputerVision #FoundationModels #DeepLearning
1
Pillar-0: A New Frontier for Radiology Foundation Models

📝 Summary:
Pillar-0 is a new radiology foundation model pretrained on diverse CT/MRI scans, utilizing RATE for scalable label extraction. It significantly outperforms existing models across various radiology tasks and extends to new applications like lung cancer risk prediction and brain hemorrhage detection.

🔹 Publication Date: Published on Nov 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.17803
• PDF: https://arxiv.org/pdf/2511.17803
• Github: https://github.com/YalaLab/rate-evals

==================================

For more data science resources:
https://t.me/DataScienceT

#Radiology #FoundationModels #AI #MedicalImaging #MachineLearning
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus

📝 Summary:
Benchmarking LLMs on subjective tasks like emotional intelligence is challenging. The Language Model Council LMC uses a democratic process with 20 LLMs to formulate, administer, and evaluate tests. This yields more robust, less biased rankings that align better with human leaderboards.

🔹 Publication Date: Published on Jun 12, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08598
• PDF: https://arxiv.org/pdf/2406.08598
• Github: https://github.com/llm-council/llm-council

Datasets citing this paper:
https://huggingface.co/datasets/llm-council/emotional_application

Spaces citing this paper:
https://huggingface.co/spaces/llm-council/llm-council
https://huggingface.co/spaces/llm-council/sandbox

==================================

For more data science resources:
https://t.me/DataScienceT

#LLM #Benchmarking #AIEvaluation #FoundationModels #ConsensusAI
MedSAM3: Delving into Segment Anything with Medical Concepts

📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3

==================================

For more data science resources:
https://t.me/DataScienceT

#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
1
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

📝 Summary:
This paper provides a practical guide to code LLMs, covering their lifecycle from data to deployment. It examines techniques, analyzes various models, and discusses real-world challenges like correctness and security. Experiments on pre-training and fine-tuning are included.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18538
• PDF: https://arxiv.org/pdf/2511.18538

==================================

For more data science resources:
https://t.me/DataScienceT

#CodeLLMs #AI #MachineLearning #SoftwareEngineering #FoundationModels
OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion

🔹 Models citing this paper:
https://huggingface.co/skoneru/OmniFusion
https://huggingface.co/skoneru/OmniFusion_v2

==================================

For more data science resources:
https://t.me/DataScienceT

#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
👍1
LFM2 Technical Report

📝 Summary:
LFM2 is a family of compact foundation models designed for efficient on-device deployment. It uses hardware-in-the-loop architecture search and advanced training to achieve high performance across diverse tasks, including multimodal applications.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23404
• PDF: https://arxiv.org/pdf/2511.23404

==================================

For more data science resources:
https://t.me/DataScienceT

#FoundationModels #EdgeAI #MultimodalAI #AIResearch #MachineLearning
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o

Datasets citing this paper:
https://huggingface.co/datasets/Yejy53/Echo-4o-Image

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation

📝 Summary:
This paper highlights the gap between SAM2 and SAM3. SAM2 uses spatial prompts for geometric segmentation, but SAM3 is a concept-driven multimodal model with a unified vision-language architecture. SAM3 represents a new class of foundation model for concept-driven segmentation.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06032
• PDF: https://arxiv.org/pdf/2512.06032
• Github: https://github.com/Applied-AI-Research-Lab/The-SAM2-to-SAM3-Gap-in-the-Segment-Anything-Model-Family

==================================

For more data science resources:
https://t.me/DataScienceT

#ImageSegmentation #FoundationModels #ComputerVision #MultimodalAI #AIResearch
1
SAM Audio: Segment Anything in Audio

📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio

🔹 Models citing this paper:
https://huggingface.co/facebook/sam-audio-large
https://huggingface.co/facebook/sam-audio-small
https://huggingface.co/facebook/sam-audio-base

Spaces citing this paper:
https://huggingface.co/spaces/lpeterl/sam-audio-webui
https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
https://huggingface.co/spaces/chippie1/SAM-Audio-Demo

==================================

For more data science resources:
https://t.me/DataScienceT

#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

📝 Summary:
This survey reviews self-evolving AI agents that adapt to dynamic environments via automatic enhancement from interaction data. It proposes a unified framework and systematically reviews current techniques, addressing evaluation, safety, and ethics.

🔹 Publication Date: Published on Aug 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents

Spaces citing this paper:
https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents

==================================

For more data science resources:
https://t.me/DataScienceT

#SelfEvolvingAI #AIAgents #FoundationModels #LifelongLearning #ArtificialIntelligence
1
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

📝 Summary:
Omni-Weather is a new multimodal foundation model that unifies weather generation and understanding in a single architecture. It uses shared self-attention and a Chain-of-Thought dataset for interpretable, high-quality outputs, achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21643
• PDF: https://arxiv.org/pdf/2512.21643

==================================

For more data science resources:
https://t.me/DataScienceT

#WeatherGeneration #FoundationModels #MultimodalAI #AIResearch #DeepLearning
1