ML Research Hub
32.8K subscribers
4.29K photos
260 videos
23 files
4.64K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

📝 Summary:
Learnable multipliers are introduced to address weight decay-induced normalization artifacts in large language model training, outperforming traditional methods while reducing computational overhead. ...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04890
• PDF: https://arxiv.org/pdf/2601.04890
• Project Page: https://tiiuae.github.io/Falcon-H1/
• Github: https://github.com/tiiuae/falcon-h1

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing

📝 Summary:
Re-Align addresses the gap between understanding and generation in in-context image generation and editing through structured reasoning-guided alignment and reinforcement learning training. AI-generat...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05124
• PDF: https://arxiv.org/pdf/2601.05124
• Project Page: https://hrz2000.github.io/realign/
• Github: https://github.com/hrz2000/realign

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

📝 Summary:
HairGuard is a framework designed to recover fine-grained soft boundary details in 3D vision tasks. It refines depth around these ambiguous regions and synthesizes novel views, achieving state-of-the-art performance for delicate structures like hair.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03362
• PDF: https://arxiv.org/pdf/2601.03362

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach

📝 Summary:
Learning Using Privileged Information paradigm enhances object detection accuracy by integrating additional training-time information through teacher-student architectures without increasing inference...

🔹 Publication Date: Published on Jan 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02016
• PDF: https://arxiv.org/pdf/2601.02016

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

📝 Summary:
AT²PO is a unified framework for multi-turn agentic reinforcement learning that improves exploration diversity, credit assignment, and policy optimization through tree search and turn-level learning o...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04767
• PDF: https://arxiv.org/pdf/2601.04767
• Github: https://github.com/zzfoutofspace/ATPO

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging
2
Beyond Binary Preference: Aligning Diffusion Models to Fine-grained Criteria by Decoupling Attributes

📝 Summary:
Current diffusion model alignment struggles with complex, fine-grained human expertise due to simplified preferences. This paper proposes a framework with hierarchical criteria and Complex Preference Optimization CPO, maximizing positive and minimizing negative attributes to improve generation qu...

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04300
• PDF: https://arxiv.org/pdf/2601.04300

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #AIAlignment #MachineLearning #GenerativeAI #PreferenceLearning
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

📝 Summary:
This paper introduces IMDD-1M, a large dataset of 1 million industrial defect image-text pairs. It enables training a vision-language foundation model tailored for industrial use. This model achieves comparable performance with less data for specialized tasks, promoting data-efficient quality ins...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24160
• PDF: https://arxiv.org/pdf/2512.24160

==================================

For more data science resources:
https://t.me/DataScienceT

#IndustrialAI #VisionLanguageModel #DefectDetection #MultimodalAI #ComputerVision
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering

📝 Summary:
AgentDevel reframes LLM agent improvement as release engineering, treating agents as shippable software. It emphasizes stable, auditable improvements through an externalized pipeline that prioritizes non-regression, leading to more reliable and traceable agent development.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04620
• PDF: https://arxiv.org/pdf/2601.04620

==================================

For more data science resources:
https://t.me/DataScienceT

#LLMAgents #ReleaseEngineering #SoftwareDevelopment #AIResearch #MLOps
VERSE: Visual Embedding Reduction and Space Exploration. Clustering-Guided Insights for Training Data Enhancement in Visually-Rich Document Understanding

📝 Summary:
VERSE analyzes Vision-Language Models by visualizing latent representations to find error-prone clusters. It guides synthetic data generation to boost performance in these areas. This significantly improves F1 scores, allowing on-premise models to match or exceed top SaaS solutions.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05125
• PDF: https://arxiv.org/pdf/2601.05125
• Project Page: https://huggingface.co/spaces/de-Rodrigo/Embeddings
• Github: https://github.com/nachoDRT/VrDU-Doctor

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #DeepLearning #EmbeddingVisualization #SyntheticData #DocumentUnderstanding
ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting

📝 Summary:
ProFuse enhances open-vocabulary 3DGS understanding via an efficient, context-aware framework. It uses a pre-registration phase to fuse semantic features onto Gaussians for cross-view coherence, completing semantic attachment twice as fast as SOTA.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04754
• PDF: https://arxiv.org/pdf/2601.04754
• Project Page: https://chiou1203.github.io/ProFuse/
• Github: https://chiou1203.github.io/ProFuse/

==================================

For more data science resources:
https://t.me/DataScienceT

#3DGaussianSplatting #ComputerVision #OpenVocabulary #3DReconstruction #DeepLearning
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models

📝 Summary:
Targeting high-entropy tokens in vision-language models causes significant semantic degradation with reduced budgets. This attack strategy reveals critical transferable safety risks across different VLM architectures.

🔹 Publication Date: Published on Dec 26, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21815
• PDF: https://arxiv.org/pdf/2512.21815

==================================

For more data science resources:
https://t.me/DataScienceT

#VisionLanguageModels #AdversarialAI #AIsecurity #MachineLearning #DeepLearning
Multi-Agent Software Development through Cross-Team Collaboration

📝 Summary:
Existing multi-agent LLM software development yields a single solution, missing better alternatives. We introduce Cross-Team Collaboration CTC, a framework where multiple agent teams propose and communicate diverse decisions. This significantly improves software quality and generalizes well.

🔹 Publication Date: Published on Jun 13, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.08979
• PDF: https://arxiv.org/pdf/2406.08979
• Github: https://github.com/OpenBMB/ChatDev

Spaces citing this paper:
https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList

==================================

For more data science resources:
https://t.me/DataScienceT

#MultiAgentSystems #LLMAgents #SoftwareDevelopment #AICollaboration #AIResearch
CoV: Chain-of-View Prompting for Spatial Reasoning

📝 Summary:
Chain-of-View CoV prompting enhances spatial reasoning in 3D embodied question answering for vision-language models. It actively explores environments by selecting question-aligned views and iteratively adjusting camera positions to gather context, leading to significant performance gains across ...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05172
• PDF: https://arxiv.org/pdf/2601.05172

==================================

For more data science resources:
https://t.me/DataScienceT

#SpatialReasoning #VisionLanguageModels #EmbodiedAI #Prompting #AI
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
https://t.me/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
1
LEMAS: Large A 150K-Hour Large-scale Extensible Multilingual Audio Suite with Generative Speech Models

📝 Summary:
LEMAS introduces the largest open-source 150K-hour multilingual speech dataset with word-level timestamps. Models trained on this dataset, LEMAS-TTS and LEMAS-Edit, achieve high-quality zero-shot speech synthesis and seamless speech editing.

🔹 Publication Date: Published on Jan 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04233
• PDF: https://arxiv.org/pdf/2601.04233
• Project Page: https://huggingface.co/spaces/LEMAS-Project/LEMAS-Edit

🔹 Models citing this paper:
https://huggingface.co/LEMAS-Project/LEMAS-TTS

Datasets citing this paper:
https://huggingface.co/datasets/LEMAS-Project/LEMAS-Dataset-train
https://huggingface.co/datasets/LEMAS-Project/LEMAS-Dataset-eval

Spaces citing this paper:
https://huggingface.co/spaces/LEMAS-Project/LEMAS-TTS
https://huggingface.co/spaces/LEMAS-Project/LEMAS-Edit
https://huggingface.co/spaces/Kaiden423/LEMAS-TTS

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Multi-Scale Local Speculative Decoding for Image Generation

📝 Summary:
Multi-Scale Local Speculative Decoding accelerates autoregressive image generation through multi-resolution drafting and spatially informed verification while maintaining semantic quality and perceptu...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05149
• PDF: https://arxiv.org/pdf/2601.05149
• Project Page: https://qualcomm-ai-research.github.io/mulo-sd-webpage/
• Github: https://qualcomm-ai-research.github.io/mulo-sd-webpage

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing

📝 Summary:
Behavior cloning demonstrates improved performance and causal reasoning through scaling model size and training data, achieving human-level gameplay in 3D video games. AI-generated summary Behavior cl...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04575
• PDF: https://arxiv.org/pdf/2601.04575
• Project Page: https://elefant-ai.github.io/open-p2p/
• Github: https://github.com/elefant-ai/open-p2p

🔹 Models citing this paper:
https://huggingface.co/elefantai/open-p2p

Datasets citing this paper:
https://huggingface.co/datasets/elefantai/p2p-toy-examples
https://huggingface.co/datasets/elefantai/p2p-full-data

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling Large-Language-Model-based Multi-Agent Collaboration

📝 Summary:
This paper introduces MacNet for multi-agent collaboration using DAGs for reasoning, outperforming baselines and scaling to many agents. It unveils a collaborative scaling law where emergent abilities appear much earlier than neural emergence.

🔹 Publication Date: Published on Jun 11, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2406.07155
• PDF: https://arxiv.org/pdf/2406.07155
• Project Page: https://github.com/OpenBMB/ChatDev/tree/macnet
• Github: https://github.com/OpenBMB/ChatDev/tree/macnet

Spaces citing this paper:
https://huggingface.co/spaces/shanghengdu/LLM-Agent-Optimization-PaperList

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

📝 Summary:
Pyramidal diffusion models offer efficient inference by varying resolution based on noise. This paper presents a low-cost finetuning pipeline to convert pretrained diffusion models into pyramidal ones, maintaining output quality. They also explore step distillation for enhanced efficiency.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04792
• PDF: https://arxiv.org/pdf/2601.04792
• Project Page: https://qualcomm-ai-research.github.io/PyramidalWan

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research