✨X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
📝 Summary:
Code LLMs trained on fully synthetic data using a feature-based synthesis pipeline achieve superior performance on competitive programming benchmarks while reducing dependence on real-world coding dat...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06953
• PDF: https://arxiv.org/pdf/2601.06953
• Github: https://github.com/JieWu02/X-Coder
🔹 Models citing this paper:
• https://huggingface.co/IIGroup/X-Coder-SFT-Qwen3-8B
• https://huggingface.co/IIGroup/X-Coder-SFT-Qwen2.5-7B
• https://huggingface.co/IIGroup/X-Coder-RL-Qwen2.5-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IIGroup/X-Coder-SFT-376k
• https://huggingface.co/datasets/IIGroup/X-Coder-RL-40k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Code LLMs trained on fully synthetic data using a feature-based synthesis pipeline achieve superior performance on competitive programming benchmarks while reducing dependence on real-world coding dat...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06953
• PDF: https://arxiv.org/pdf/2601.06953
• Github: https://github.com/JieWu02/X-Coder
🔹 Models citing this paper:
• https://huggingface.co/IIGroup/X-Coder-SFT-Qwen3-8B
• https://huggingface.co/IIGroup/X-Coder-SFT-Qwen2.5-7B
• https://huggingface.co/IIGroup/X-Coder-RL-Qwen2.5-7B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/IIGroup/X-Coder-SFT-376k
• https://huggingface.co/datasets/IIGroup/X-Coder-RL-40k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
✨ShowUI-Aloha: Human-Taught GUI Agent
📝 Summary:
ShowUI-Aloha presents a pipeline that converts unstructured human screen recordings into structured GUI tasks through recording, semantic interpretation, planning, and execution components. AI-generat...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07181
• PDF: https://arxiv.org/pdf/2601.07181
• Project Page: https://showlab.github.io/Aloha_Page/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
ShowUI-Aloha presents a pipeline that converts unstructured human screen recordings into structured GUI tasks through recording, semantic interpretation, planning, and execution components. AI-generat...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07181
• PDF: https://arxiv.org/pdf/2601.07181
• Project Page: https://showlab.github.io/Aloha_Page/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨SketchJudge: A Diagnostic Benchmark for Grading Hand-drawn Diagrams with Multimodal Large Language Models
📝 Summary:
SketchJudge benchmark evaluates multimodal large language models' ability to grade hand-drawn STEM diagrams, revealing significant limitations in visual understanding compared to human performance. AI...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06944
• PDF: https://arxiv.org/pdf/2601.06944
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
SketchJudge benchmark evaluates multimodal large language models' ability to grade hand-drawn STEM diagrams, revealing significant limitations in visual understanding compared to human performance. AI...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06944
• PDF: https://arxiv.org/pdf/2601.06944
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨BabyVision: Visual Reasoning Beyond Language
📝 Summary:
Current multimodal large language models exhibit significant gaps in fundamental visual understanding compared to human children, as demonstrated by the BabyVision benchmark. AI-generated summary Whil...
🔹 Publication Date: Published on Jan 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06521
• PDF: https://arxiv.org/pdf/2601.06521
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Current multimodal large language models exhibit significant gaps in fundamental visual understanding compared to human children, as demonstrated by the BabyVision benchmark. AI-generated summary Whil...
🔹 Publication Date: Published on Jan 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06521
• PDF: https://arxiv.org/pdf/2601.06521
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence
📝 Summary:
3D CoCa v2 enhances 3D captioning by combining contrastive vision-language learning with spatially-aware 3D scene encoding and test-time search for improved generalization across diverse environments....
🔹 Publication Date: Published on Jan 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06496
• PDF: https://arxiv.org/pdf/2601.06496
• Github: https://github.com/AIGeeksGroup/3DCoCav2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
3D CoCa v2 enhances 3D captioning by combining contrastive vision-language learning with spatially-aware 3D scene encoding and test-time search for improved generalization across diverse environments....
🔹 Publication Date: Published on Jan 10
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.06496
• PDF: https://arxiv.org/pdf/2601.06496
• Github: https://github.com/AIGeeksGroup/3DCoCav2
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨e5-omni: Explicit Cross-modal Alignment for Omni-modal Embeddings
📝 Summary:
Omni-modal embedding models face challenges with modality-dependent similarity scaling, ineffective in-batch negatives, and mismatched statistics across modalities, which are addressed through explici...
🔹 Publication Date: Published on Jan 7
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/Haon-Chen/e5-omni
• PDF: https://arxiv.org/pdf/2601.03666
🔹 Models citing this paper:
• https://huggingface.co/Haon-Chen/e5-omni-3B
• https://huggingface.co/Haon-Chen/e5-omni-7B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Omni-modal embedding models face challenges with modality-dependent similarity scaling, ineffective in-batch negatives, and mismatched statistics across modalities, which are addressed through explici...
🔹 Publication Date: Published on Jan 7
🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/Haon-Chen/e5-omni
• PDF: https://arxiv.org/pdf/2601.03666
🔹 Models citing this paper:
• https://huggingface.co/Haon-Chen/e5-omni-3B
• https://huggingface.co/Haon-Chen/e5-omni-7B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
📝 Summary:
MegaFlow is a distributed orchestration system for large-scale AI agent training and evaluation. It addresses the lack of open-source infrastructure by providing efficient scheduling, resource allocation, and task management through modular services. MegaFlow successfully handles tens of thousand...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07526
• PDF: https://arxiv.org/pdf/2601.07526
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
MegaFlow is a distributed orchestration system for large-scale AI agent training and evaluation. It addresses the lack of open-source infrastructure by providing efficient scheduling, resource allocation, and task management through modular services. MegaFlow successfully handles tens of thousand...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07526
• PDF: https://arxiv.org/pdf/2601.07526
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Dr. Zero: Self-Evolving Search Agents without Training Data
📝 Summary:
A data-free self-evolution framework enables large language models to autonomously improve reasoning capabilities through iterative question generation and solving, achieving performance comparable to...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07055
• PDF: https://arxiv.org/pdf/2601.07055
• Github: https://github.com/facebookresearch/drzero
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
A data-free self-evolution framework enables large language models to autonomously improve reasoning capabilities through iterative question generation and solving, achieving performance comparable to...
🔹 Publication Date: Published on Jan 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.07055
• PDF: https://arxiv.org/pdf/2601.07055
• Github: https://github.com/facebookresearch/drzero
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
📝 Summary:
Large reasoning models' inference latency can be reduced by routing reasoning steps to larger models based on the entropy of their first token, enabling efficient collaborative inference without addit...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05110
• PDF: https://arxiv.org/pdf/2601.05110
• Github: https://github.com/Zengwh02/GlimpRouter
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Large reasoning models' inference latency can be reduced by routing reasoning steps to larger models based on the entropy of their first token, enabling efficient collaborative inference without addit...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05110
• PDF: https://arxiv.org/pdf/2601.05110
• Github: https://github.com/Zengwh02/GlimpRouter
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨OpenTinker: Separating Concerns in Agentic Reinforcement Learning
📝 Summary:
OpenTinker provides a modular infrastructure for reinforcement learning of large language model agents with separated components and managed execution runtime. AI-generated summary We introduce OpenTi...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2601.07376
• PDF: https://arxiv.org/pdf/2601.07376
• Project Page: https://open-tinker.github.io/opentinker-page/
• Github: https://github.com/open-tinker/OpenTinker
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
OpenTinker provides a modular infrastructure for reinforcement learning of large language model agents with separated components and managed execution runtime. AI-generated summary We introduce OpenTi...
🔹 Publication Date: Published on Jan 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2601.07376
• PDF: https://arxiv.org/pdf/2601.07376
• Project Page: https://open-tinker.github.io/opentinker-page/
• Github: https://github.com/open-tinker/OpenTinker
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research