ML Research Hub
32.9K subscribers
4.37K photos
269 videos
23 files
4.73K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents

📝 Summary:
TowerMind is a new low-computation tower defense environment for evaluating large language model planning and decision-making with multimodal observations. Experiments show a performance gap between large language models and humans, revealing limitations in model planning and action use.

🔹 Publication Date: Published on Jan 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05899
• PDF: https://arxiv.org/pdf/2601.05899
• Github: https://github.com/tb6147877/TowerMind

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
SAM 3D: 3Dfy Anything in Images

📝 Summary:
SAM 3D reconstructs 3D objects from single images, predicting geometry, texture, and layout. It uses a multi-stage training framework combining synthetic pretraining and real-world alignment, overcoming the 3D data barrier. It achieves significant gains in human preference tests.

🔹 Publication Date: Published on Nov 20, 2025

🔹 Paper Links:
• arXiv Page: https://arxivlens.com/PaperView/Details/sam-3d-3dfy-anything-in-images-9667-03d581e7
• PDF: https://arxiv.org/pdf/2511.16624
• Project Page: https://ai.meta.com/sam3d/
• Github: https://github.com/facebookresearch/sam-3d-objects

🔹 Models citing this paper:
https://huggingface.co/facebook/sam-3d-objects
https://huggingface.co/jetjodh/sam-3d-objects
https://huggingface.co/RunyiY/d3mas

Spaces citing this paper:
https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Text-to-3D
https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Image-to-3D
https://huggingface.co/spaces/HorizonRobotics/EmbodiedGen-Texture-Gen

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

📝 Summary:
Current VLMs struggle with real-world underspecified queries. A new benchmark reveals explicit query rewriting improves performance by 8-22 points across models. This gap stems from natural query under-specification, not merely model capability.

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06165
• PDF: https://arxiv.org/pdf/2601.06165

Datasets citing this paper:
https://huggingface.co/datasets/HAERAE-HUB/HAERAE-VISION

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

📝 Summary:
TourPlanner addresses travel planning challenges through multi-path reasoning and constraint-gated reinforcement learning to optimize both hard and soft constraints effectively. AI-generated summary T...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04698
• PDF: https://arxiv.org/pdf/2601.04698

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AI-Researcher: Autonomous Scientific Innovation

📝 Summary:
AI-Researcher automates the scientific research process, achieving high implementation success and manuscript quality through a comprehensive benchmark system. AI-generated summary The powerful reason...

🔹 Publication Date: Published on May 24, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.18705
• PDF: https://arxiv.org/pdf/2505.18705
• Github: https://github.com/hkuds/ai-researcher

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

📝 Summary:
Parallel Coordinated Reasoning enables large-scale test-time compute scaling beyond sequential reasoning limitations through parallel exploration and message-passing architecture. AI-generated summary...

🔹 Publication Date: Published on Jan 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05593
• PDF: https://arxiv.org/pdf/2601.05593
• Github: https://github.com/stepfun-ai/PaCoRe

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

📝 Summary:
VideoDR benchmark enables video question answering by combining cross-frame visual extraction, web retrieval, and multi-hop reasoning in open-domain settings. AI-generated summary In real-world video ...

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06943
• PDF: https://arxiv.org/pdf/2601.06943
• Github: https://github.com/QuantaAlpha/VideoDR-Benchmark

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Boosting Latent Diffusion Models via Disentangled Representation Alignment

📝 Summary:
Latent Diffusion Models generate high-quality images by operating in compressed latent space, typically obtained through image tokenizers such as Variational Autoencoders (VAEs). In pursuit of a gener...

🔹 Publication Date: Published on Jan 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05823
• PDF: https://arxiv.org/pdf/2601.05823
• Github: https://github.com/Kwai-Kolors/Send-VAE

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

📝 Summary:
ET-Agent is a training framework that calibrates tool-use behavior in large language models through self-evolving data flywheels and behavior calibration training to improve task execution effectivene...

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06860
• PDF: https://arxiv.org/pdf/2601.06860

🔹 Models citing this paper:
https://huggingface.co/zhangboguodong/ET-Agent-based-on-Qwen2.5-7B-it

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Structured Episodic Event Memory

📝 Summary:
Structured Episodic Event Memory (SEEM) enhances LLMs with hierarchical memory architecture combining graph and episodic layers for improved narrative coherence and reasoning. AI-generated summary Cur...

🔹 Publication Date: Published on Jan 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06411
• PDF: https://arxiv.org/pdf/2601.06411

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors

📝 Summary:
NoisyBench benchmark reveals significant performance degradation in state-of-the-art models when exposed to noisy contextual information, with agentic workflows amplifying errors and attention mechani...

🔹 Publication Date: Published on Jan 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07226
• PDF: https://arxiv.org/pdf/2601.07226

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

📝 Summary:
Code LLMs trained on fully synthetic data using a feature-based synthesis pipeline achieve superior performance on competitive programming benchmarks while reducing dependence on real-world coding dat...

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06953
• PDF: https://arxiv.org/pdf/2601.06953
• Github: https://github.com/JieWu02/X-Coder

🔹 Models citing this paper:
https://huggingface.co/IIGroup/X-Coder-SFT-Qwen3-8B
https://huggingface.co/IIGroup/X-Coder-SFT-Qwen2.5-7B
https://huggingface.co/IIGroup/X-Coder-RL-Qwen2.5-7B

Datasets citing this paper:
https://huggingface.co/datasets/IIGroup/X-Coder-SFT-376k
https://huggingface.co/datasets/IIGroup/X-Coder-RL-40k

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
ShowUI-Aloha: Human-Taught GUI Agent

📝 Summary:
ShowUI-Aloha presents a pipeline that converts unstructured human screen recordings into structured GUI tasks through recording, semantic interpretation, planning, and execution components. AI-generat...

🔹 Publication Date: Published on Jan 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07181
• PDF: https://arxiv.org/pdf/2601.07181
• Project Page: https://showlab.github.io/Aloha_Page/

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SketchJudge: A Diagnostic Benchmark for Grading Hand-drawn Diagrams with Multimodal Large Language Models

📝 Summary:
SketchJudge benchmark evaluates multimodal large language models' ability to grade hand-drawn STEM diagrams, revealing significant limitations in visual understanding compared to human performance. AI...

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06944
• PDF: https://arxiv.org/pdf/2601.06944

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
BabyVision: Visual Reasoning Beyond Language

📝 Summary:
Current multimodal large language models exhibit significant gaps in fundamental visual understanding compared to human children, as demonstrated by the BabyVision benchmark. AI-generated summary Whil...

🔹 Publication Date: Published on Jan 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06521
• PDF: https://arxiv.org/pdf/2601.06521

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence

📝 Summary:
3D CoCa v2 enhances 3D captioning by combining contrastive vision-language learning with spatially-aware 3D scene encoding and test-time search for improved generalization across diverse environments....

🔹 Publication Date: Published on Jan 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06496
• PDF: https://arxiv.org/pdf/2601.06496
• Github: https://github.com/AIGeeksGroup/3DCoCav2

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings

📝 Summary:
Omni-modal embedding models face challenges with modality-dependent similarity scaling, ineffective in-batch negatives, and mismatched statistics across modalities, which are addressed through explici...

🔹 Publication Date: Published on Jan 7

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/Haon-Chen/e5-omni
• PDF: https://arxiv.org/pdf/2601.03666

🔹 Models citing this paper:
https://huggingface.co/Haon-Chen/e5-omni-3B
https://huggingface.co/Haon-Chen/e5-omni-7B

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

📝 Summary:
MegaFlow is a distributed orchestration system for large-scale AI agent training and evaluation. It addresses the lack of open-source infrastructure by providing efficient scheduling, resource allocation, and task management through modular services. MegaFlow successfully handles tens of thousand...

🔹 Publication Date: Published on Jan 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07526
• PDF: https://arxiv.org/pdf/2601.07526

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Dr. Zero: Self-Evolving Search Agents without Training Data

📝 Summary:
A data-free self-evolution framework enables large language models to autonomously improve reasoning capabilities through iterative question generation and solving, achieving performance comparable to...

🔹 Publication Date: Published on Jan 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07055
• PDF: https://arxiv.org/pdf/2601.07055
• Github: https://github.com/facebookresearch/drzero

==================================

For more data science resources:
https://t.me/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research