ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

📝 Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07025
• PDF: https://arxiv.org/pdf/2511.07025

🔹 Models citing this paper:
https://huggingface.co/nvidia/llama-embed-nemotron-8b

==================================

For more data science resources:
https://t.me/DataScienceT

#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560

==================================

For more data science resources:
https://t.me/DataScienceT

#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP