ML Research Hub
32.8K subscribers
4.18K photos
253 videos
23 files
4.52K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ€–πŸ§  DeepEval: The Ultimate LLM Evaluation Framework for AI Developers

πŸ—“οΈ 07 Oct 2025
πŸ“š AI News & Trends

In today’s AI-driven world, large language models (LLMs) have become central to modern applications from chatbots to intelligent AI agents. However, ensuring the accuracy, reliability and safety of these models is a significant challenge. Even small errors, biases or hallucinations can result in misleading information, frustrated users or business setbacks. This is where DeepEval, an ...

#DeepEval #LLM #AIDevelopment #LanguageModels #ModelEvaluation #ArtificialIntelligence
❀2
✨CodeClash: Benchmarking Goal-Oriented Software Engineering

πŸ“ Summary:
CodeClash is a benchmark evaluating language models on open-ended, goal-oriented code development through competitive tournaments. It shows LMs struggle with strategic reasoning and long-term codebase maintenance, performing poorly against human experts.

πŸ”Ή Publication Date: Published on Nov 2

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.00839
β€’ PDF: https://arxiv.org/pdf/2511.00839

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#LanguageModels #SoftwareEngineering #AIEvaluation #CodeDevelopment #Benchmarking
❀1
✨Diffusion Language Models are Super Data Learners

πŸ“ Summary:
Diffusion Language Models DLMs consistently outperform autoregressive models, especially in low-data settings. This is due to any-order modeling, iterative bidirectional denoising, and Monte Carlo augmentation. DLMs maintain advantages at scale, achieving strong performance even by repeating limi...

πŸ”Ή Publication Date: Published on Nov 5

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.03276
β€’ PDF: https://arxiv.org/pdf/2511.03276
β€’ Project Page: https://github.com/JinjieNi/dlms-are-super-data-learners
β€’ Github: https://github.com/JinjieNi/OpenMoE2

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#DiffusionModels #LanguageModels #MachineLearning #LowDataLearning #AI
✨Dense Motion Captioning

πŸ“ Summary:
The paper introduces Dense Motion Captioning, a new task for 3D human motion understanding. It presents CompMo, a large dataset with complex, temporally annotated motions, and DEMO, a model combining a language model with a motion adapter to generate detailed, grounded captions.

πŸ”Ή Publication Date: Published on Nov 7

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.05369
β€’ PDF: https://arxiv.org/pdf/2511.05369
β€’ Project Page: https://xusy2333.com/demo/
β€’ Github: https://github.com/41xu/DEMO

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#MotionCaptioning #3DMotion #ComputerVision #LanguageModels #AIResearch
✨Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

πŸ“ Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.

πŸ”Ή Publication Date: Published on Nov 10

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.07025
β€’ PDF: https://arxiv.org/pdf/2511.07025

πŸ”Ή Models citing this paper:
β€’ https://huggingface.co/nvidia/llama-embed-nemotron-8b

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
✨Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models

πŸ“ Summary:
This paper proposes an AI agent framework for adaptive long-form writing. It uses recursive task decomposition and dynamically integrates retrieval, reasoning, and composition, overcoming rigid outline-based methods. The framework consistently outperforms state-of-the-art approaches.

πŸ”Ή Publication Date: Published on Mar 11

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2503.08275
β€’ PDF: https://arxiv.org/pdf/2503.08275
β€’ Github: https://github.com/principia-ai/WriteHERE

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#AI #LanguageModels #LongformWriting #NLP #GenerativeAI
❀1
✨AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

πŸ“ Summary:
AraLingBench is a human-annotated benchmark evaluating Arabic LLM linguistic competence using expert-designed questions. It reveals models achieve surface proficiency but lack deep understanding, often relying on memorization rather than true comprehension.

πŸ”Ή Publication Date: Published on Nov 18

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.14295
β€’ PDF: https://arxiv.org/pdf/2511.14295

✨ Datasets citing this paper:
β€’ https://huggingface.co/datasets/hammh0a/AraLingBench

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#ArabicNLP #LLMEvaluation #AIResearch #LanguageModels #NLPBenchmarking
This media is not supported in your browser
VIEW IN TELEGRAM
✨Computer-Use Agents as Judges for Generative User Interface

πŸ“ Summary:
This paper introduces a framework where Computer-Use Agents CUA act as judges for coding language models Coder to automatically design GUIs. The goal is to optimize interfaces for CUA efficiency and task solvability, rather than human aesthetics, using a new benchmark called AUI-Gym.

πŸ”Ή Publication Date: Published on Nov 19

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.15567
β€’ PDF: https://arxiv.org/pdf/2511.15567
β€’ Project Page: https://showlab.github.io/AUI/
β€’ Github: https://github.com/showlab/AUI/

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#AIAgents #GUIDesign #GenerativeAI #AIevaluation #LanguageModels
✨AICC: Parse HTML Finer, Make Models Better -- A 7.3T AI-Ready Corpus Built by a Model-Based HTML Parser

πŸ“ Summary:
This paper introduces MinerU-HTML, a novel language model-based HTML parser that semantically extracts web content, preserving structure better than heuristic methods. It constructs the 7.3T AICC corpus, demonstrating that models trained on AICC significantly outperform those from other parsers, ...

πŸ”Ή Publication Date: Published on Nov 20

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.16397
β€’ PDF: https://arxiv.org/pdf/2511.16397

✨ Datasets citing this paper:
β€’ https://huggingface.co/datasets/opendatalab/AICC

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#AI #HTMLParsing #Corpus #LanguageModels #WebData
✨Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

πŸ“ Summary:
ReVeL converts multiple-choice questions to verifiable open-form questions to address unreliable MCQA metrics and answer guessing. This framework improves data efficiency and robustness for multimodal language models, revealing significant score inflation in MCQA benchmarks.

πŸ”Ή Publication Date: Published on Nov 21

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.17405
β€’ PDF: https://arxiv.org/pdf/2511.17405
β€’ Github: https://flageval-baai.github.io/ReVeL/

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#OpenQA #VisionLanguage #LanguageModels #AIEvaluation #MachineLearning
✨Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

πŸ“ Summary:
Xmodel-2.5 is a 1.3B language model designed for efficient edge deployments. It uses maximal-update parameterization and a novel training curriculum that switches from AdamW to Muon, improving reasoning skills by 4.58% while maintaining efficiency.

πŸ”Ή Publication Date: Published on Nov 23

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.19496
β€’ PDF: https://arxiv.org/pdf/2511.19496
β€’ Github: https://github.com/XiaoduoAILab/Xmodel-2.5

πŸ”Ή Models citing this paper:
β€’ https://huggingface.co/XiaoduoAILab/Xmodel-2.5

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#SLM #EdgeAI #LanguageModels #DeepLearning #ReasoningAI
❀1
✨Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models

πŸ“ Summary:
Masked Diffusion Language Models MDLMs show locality bias and poor context comprehension due to appended mask tokens acting as distractors. A mask-agnostic loss function was introduced. This function improves MDLM robustness by mitigating the masks distracting effect.

πŸ”Ή Publication Date: Published on Nov 26

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2511.21338
β€’ PDF: https://arxiv.org/pdf/2511.21338

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#LanguageModels #DiffusionModels #NLP #ContextComprehension #AIResearch
❀1
✨Scaling Behavior of Discrete Diffusion Language Models

πŸ“ Summary:
Research on discrete diffusion language models DLMs shows their scaling behavior depends on noise type. Uniform diffusion is more parameter and data efficient than masked diffusion, making it promising for data-bound settings. A 10B parameter model confirmed this.

πŸ”Ή Publication Date: Published on Dec 11

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2512.10858
β€’ PDF: https://arxiv.org/pdf/2512.10858
β€’ Github: https://github.com/dvruette/gidd-easydel

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#DiffusionModels #LanguageModels #NLP #AIResearch #DeepLearning
❀1
✨Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

πŸ“ Summary:
Canon layers are lightweight architectural components that enhance language model reasoning depth and breadth by promoting horizontal information flow. They improve performance across various architectures, validated in synthetic tasks and real-world pretraining.

πŸ”Ή Publication Date: Published on Dec 19

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2512.17351
β€’ PDF: https://arxiv.org/pdf/2512.17351
β€’ Project Page: https://physics.allen-zhu.com/part-4-architecture-design/part-4-1
β€’ Github: https://github.com/facebookresearch/PhysicsLM4

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#LanguageModels #LLM #AIArchitecture #DeepLearning #NLP
❀1
✨Bolmo: Byteifying the Next Generation of Language Models

πŸ“ Summary:
Bolmo introduces competitive byte-level language models by efficiently converting existing subword models. This byteification overcomes subword limitations, matching performance with minimal training. Bolmo makes byte-level LMs practical.

πŸ”Ή Publication Date: Published on Dec 17

πŸ”Ή Paper Links:
β€’ arXiv Page: https://arxiv.org/abs/2512.15586
β€’ PDF: https://arxiv.org/pdf/2512.15586

πŸ”Ή Models citing this paper:
β€’ https://huggingface.co/allenai/Bolmo-7B
β€’ https://huggingface.co/allenai/Bolmo-1B

✨ Datasets citing this paper:
β€’ https://huggingface.co/datasets/allenai/bolmo_mix

==================================

For more data science resources:
βœ“ https://t.me/DataScienceT

#LanguageModels #ByteLevelLMs #NLP #DeepLearning #AIResearch
❀1