✨Beyond English: Toward Inclusive and Scalable Multilingual Machine Translation with LLMs
📝 Summary:
LMT introduces new multilingual translation models covering 60 languages, centered on Chinese and English. It uses Strategic Downsampling and Parallel Multilingual Prompting to improve translation quality and cross-lingual transfer, achieving state-of-the-art performance.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07003
• PDF: https://arxiv.org/pdf/2511.07003
• Project Page: https://github.com/NiuTrans/LMT
• Github: https://github.com/NiuTrans/LMT
🔹 Models citing this paper:
• https://huggingface.co/NiuTrans/LMT-60-1.7B
• https://huggingface.co/NiuTrans/LMT-60-0.6B-Base
• https://huggingface.co/NiuTrans/LMT-60-0.6B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultilingualTranslation #LLMs #MachineTranslation #NLP #AI
📝 Summary:
LMT introduces new multilingual translation models covering 60 languages, centered on Chinese and English. It uses Strategic Downsampling and Parallel Multilingual Prompting to improve translation quality and cross-lingual transfer, achieving state-of-the-art performance.
🔹 Publication Date: Published on Nov 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07003
• PDF: https://arxiv.org/pdf/2511.07003
• Project Page: https://github.com/NiuTrans/LMT
• Github: https://github.com/NiuTrans/LMT
🔹 Models citing this paper:
• https://huggingface.co/NiuTrans/LMT-60-1.7B
• https://huggingface.co/NiuTrans/LMT-60-0.6B-Base
• https://huggingface.co/NiuTrans/LMT-60-0.6B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultilingualTranslation #LLMs #MachineTranslation #NLP #AI
🔥1
✨DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
📝 Summary:
A new benchmark, DiscoX, and evaluation system, Metric-S, are introduced for discourse-level, expert Chinese-English translation. Findings show advanced LLMs still fall short of human performance, underscoring challenges in professional machine translation.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10984
• PDF: https://arxiv.org/pdf/2511.10984
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineTranslation #NLP #LLM #Benchmarking #AI
📝 Summary:
A new benchmark, DiscoX, and evaluation system, Metric-S, are introduced for discourse-level, expert Chinese-English translation. Findings show advanced LLMs still fall short of human performance, underscoring challenges in professional machine translation.
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10984
• PDF: https://arxiv.org/pdf/2511.10984
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineTranslation #NLP #LLM #Benchmarking #AI
✨OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion
📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion
🔹 Models citing this paper:
• https://huggingface.co/skoneru/OmniFusion
• https://huggingface.co/skoneru/OmniFusion_v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.
🔹 Publication Date: Published on Nov 28
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion
🔹 Models citing this paper:
• https://huggingface.co/skoneru/OmniFusion
• https://huggingface.co/skoneru/OmniFusion_v2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
👍1
✨MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
📝 Summary:
MultiMed-ST, a large-scale multilingual medical speech translation dataset, is introduced. With 290,000 samples in five languages, it is the largest medical MT and multilingual ST dataset. This work also provides an extensive comparative analysis.
🔹 Publication Date: Published on Apr 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.03546
• PDF: https://arxiv.org/pdf/2504.03546
• Project Page: https://github.com/leduckhai/MultiMed-ST
• Github: https://github.com/leduckhai/MultiMed-ST
🔹 Models citing this paper:
• https://huggingface.co/leduckhai/MultiMed-ST
✨ Datasets citing this paper:
• https://huggingface.co/datasets/leduckhai/MultiMed-ST
✨ Spaces citing this paper:
• https://huggingface.co/spaces/HaoVuong/MedicalASR
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechTranslation #MedicalAI #MultilingualNLP #MachineTranslation #Dataset
✨Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems
📝 Summary:
Simulstream is an open-source toolkit for evaluating and demonstrating streaming speech-to-text translation. It supports long-form audio, incremental decoding, and re-translation, plus offers an interactive demo interface.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17648
• PDF: https://arxiv.org/pdf/2512.17648
• Project Page: https://pypi.org/project/simulstream/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechToText #MachineTranslation #NLP #OpenSource #StreamingAI
📝 Summary:
Simulstream is an open-source toolkit for evaluating and demonstrating streaming speech-to-text translation. It supports long-form audio, incremental decoding, and re-translation, plus offers an interactive demo interface.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17648
• PDF: https://arxiv.org/pdf/2512.17648
• Project Page: https://pypi.org/project/simulstream/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#SpeechToText #MachineTranslation #NLP #OpenSource #StreamingAI
❤1