✨Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks
📝 Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07025
• PDF: https://arxiv.org/pdf/2511.07025
🔹 Models citing this paper:
• https://huggingface.co/nvidia/llama-embed-nemotron-8b
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
📝 Summary:
Llama-Embed-Nemotron-8B is an open-source text embedding model achieving state-of-the-art performance, especially in multilingual tasks. Its success comes from a novel data mix and detailed ablation studies, making it a universal solution.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07025
• PDF: https://arxiv.org/pdf/2511.07025
🔹 Models citing this paper:
• https://huggingface.co/nvidia/llama-embed-nemotron-8b
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#TextEmbeddings #MultilingualNLP #CrossLingual #LanguageModels #AIResearch
✨MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset