ML Research Hub
32.8K subscribers
4.12K photos
243 videos
23 files
4.45K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

📝 Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07025
• PDF: https://arxiv.org/pdf/2511.07025

🔹 Models citing this paper:
https://huggingface.co/nvidia/llama-embed-nemotron-8b

==================================

For more data science resources:
https://t.me/DataScienceT

#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.

🔹 Publication Date: Published on Apr 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST

🔹 Models citing this paper:
https://huggingface.co/leduckhai/MultiMed-ST

Datasets citing this paper:
https://huggingface.co/datasets/leduckhai/MultiMed-ST

Spaces citing this paper:
https://huggingface.co/spaces/HaoVuong/MedicalASR

==================================

For more data science resources:
https://t.me/DataScienceT

#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset