✨UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
📝 Summary:
UniCorn is a self-improvement framework enhancing multimodal model generation. It uses self-play and cognitive reconstruction, without external data or supervision. UniCorn achieves state-of-the-art text-to-image generation.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03193
• PDF: https://arxiv.org/pdf/2601.03193
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
UniCorn is a self-improvement framework enhancing multimodal model generation. It uses self-play and cognitive reconstruction, without external data or supervision. UniCorn achieves state-of-the-art text-to-image generation.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03193
• PDF: https://arxiv.org/pdf/2601.03193
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
📝 Summary:
Audio geo-localization benchmark AGL1K is introduced to advance audio language models' geospatial reasoning capabilities through curated audio clips and evaluation across multiple models. AI-generated...
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03227
• PDF: https://arxiv.org/pdf/2601.03227
• Github: https://github.com/Rising0321/AGL1K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/RisingZhang/AudioGeoLoc
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Audio geo-localization benchmark AGL1K is introduced to advance audio language models' geospatial reasoning capabilities through curated audio clips and evaluation across multiple models. AI-generated...
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03227
• PDF: https://arxiv.org/pdf/2601.03227
• Github: https://github.com/Rising0321/AGL1K
✨ Spaces citing this paper:
• https://huggingface.co/spaces/RisingZhang/AudioGeoLoc
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨SOP: A Scalable Online Post-Training System for Vision-Language-Action Models
📝 Summary:
SOP is a scalable online post-training system for VLA models that enables real-world robot policy adaptation. It uses a robot fleet to continuously learn from interaction, improving task proficiency while maintaining generality. SOP significantly boosts VLA model performance within hours.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03044
• PDF: https://arxiv.org/pdf/2601.03044
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SOP is a scalable online post-training system for VLA models that enables real-world robot policy adaptation. It uses a robot fleet to continuously learn from interaction, improving task proficiency while maintaining generality. SOP significantly boosts VLA model performance within hours.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03044
• PDF: https://arxiv.org/pdf/2601.03044
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules
📝 Summary:
AceFF is a new machine learning potential for small molecule drug discovery. It offers DFT-level accuracy with high speed, supporting essential elements and charged states. Validation shows it is state-of-the-art for organic molecules.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00581
• PDF: https://arxiv.org/pdf/2601.00581
• Github: https://github.com/torchmd/torchmd-net
🔹 Models citing this paper:
• https://huggingface.co/Acellera/AceFF-2.0
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineLearning #DrugDiscovery #ComputationalChemistry #AIforScience #SmallMolecules
📝 Summary:
AceFF is a new machine learning potential for small molecule drug discovery. It offers DFT-level accuracy with high speed, supporting essential elements and charged states. Validation shows it is state-of-the-art for organic molecules.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00581
• PDF: https://arxiv.org/pdf/2601.00581
• Github: https://github.com/torchmd/torchmd-net
🔹 Models citing this paper:
• https://huggingface.co/Acellera/AceFF-2.0
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MachineLearning #DrugDiscovery #ComputationalChemistry #AIforScience #SmallMolecules
❤1
✨U-Net-Like Spiking Neural Networks for Single Image Dehazing
📝 Summary:
DehazeSNN introduces a U-Net-like Spiking Neural Network with an Orthogonal Leaky-Integrate-and-Fire Block for efficient image dehazing. It achieves competitive performance with reduced computational resources and a smaller model size.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23950
• PDF: https://arxiv.org/pdf/2512.23950
• Github: https://github.com/HaoranLiu507/DehazeSNN
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
DehazeSNN introduces a U-Net-like Spiking Neural Network with an Orthogonal Leaky-Integrate-and-Fire Block for efficient image dehazing. It achieves competitive performance with reduced computational resources and a smaller model size.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23950
• PDF: https://arxiv.org/pdf/2512.23950
• Github: https://github.com/HaoranLiu507/DehazeSNN
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners
📝 Summary:
Large reasoning models show multilingual latent reasoning, stronger in resource-rich languages but weaker in low-resource ones. Despite varying strength, their internal prediction evolution is consistent across languages, suggesting an English-centered latent reasoning pathway.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02996
• PDF: https://arxiv.org/pdf/2601.02996
• Github: https://github.com/cisnlp/multilingual-latent-reasoner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large reasoning models show multilingual latent reasoning, stronger in resource-rich languages but weaker in low-resource ones. Despite varying strength, their internal prediction evolution is consistent across languages, suggesting an English-centered latent reasoning pathway.
🔹 Publication Date: Published on Jan 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02996
• PDF: https://arxiv.org/pdf/2601.02996
• Github: https://github.com/cisnlp/multilingual-latent-reasoner
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
❤1
✨UniVideo: Unified Understanding, Generation, and Editing for Videos
📝 Summary:
UniVideo, a dual-stream framework combining a Multimodal Large Language Model and a Multimodal DiT, extends unified modeling to video generation and editing, achieving state-of-the-art performance and...
🔹 Publication Date: Published on Oct 9, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08377
• PDF: https://arxiv.org/pdf/2510.08377
• Project Page: https://congwei1230.github.io/UniVideo/
• Github: https://github.com/KwaiVGI/UniVideo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
UniVideo, a dual-stream framework combining a Multimodal Large Language Model and a Multimodal DiT, extends unified modeling to video generation and editing, achieving state-of-the-art performance and...
🔹 Publication Date: Published on Oct 9, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.08377
• PDF: https://arxiv.org/pdf/2510.08377
• Project Page: https://congwei1230.github.io/UniVideo/
• Github: https://github.com/KwaiVGI/UniVideo
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MindWatcher: Toward Smarter Multimodal Tool-Integrated Reasoning
📝 Summary:
MindWatcher is a tool-integrated reasoning agent using interleaved thinking and multimodal chain-of-thought. It autonomously coordinates diverse tools for complex tasks without human prompts. It outperforms larger models and provides agent training insights.
🔹 Publication Date: Published on Dec 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23412
• PDF: https://arxiv.org/pdf/2512.23412
• Github: https://github.com/TIMMY-CHAN/MindWatcher
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MindWatcher is a tool-integrated reasoning agent using interleaved thinking and multimodal chain-of-thought. It autonomously coordinates diverse tools for complex tasks without human prompts. It outperforms larger models and provides agent training insights.
🔹 Publication Date: Published on Dec 29, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23412
• PDF: https://arxiv.org/pdf/2512.23412
• Github: https://github.com/TIMMY-CHAN/MindWatcher
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
📝 Summary:
MDAgent2 enables automated molecular dynamics code generation and question answering through domain-adapted language models and a multi-agent runtime system. AI-generated summary Molecular dynamics (M...
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02075
• PDF: https://arxiv.org/pdf/2601.02075
• Github: https://github.com/FredericVAN/PKU_MDAgent2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MDAgent2 enables automated molecular dynamics code generation and question answering through domain-adapted language models and a multi-agent runtime system. AI-generated summary Molecular dynamics (M...
🔹 Publication Date: Published on Jan 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.02075
• PDF: https://arxiv.org/pdf/2601.02075
• Github: https://github.com/FredericVAN/PKU_MDAgent2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
✨Choreographing a World of Dynamic Objects
📝 Summary:
CHORD is a universal generative framework that extracts Lagrangian motion information from Eulerian video representations to synthesize diverse 4D dynamic scenes without requiring category-specific ru...
🔹 Publication Date: Published on Jan 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04194
• PDF: https://arxiv.org/pdf/2601.04194
• Project Page: https://yanzhelyu.github.io/chord/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
CHORD is a universal generative framework that extracts Lagrangian motion information from Eulerian video representations to synthesize diverse 4D dynamic scenes without requiring category-specific ru...
🔹 Publication Date: Published on Jan 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04194
• PDF: https://arxiv.org/pdf/2601.04194
• Project Page: https://yanzhelyu.github.io/chord/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research