ML Research Hub
32.8K subscribers
4.09K photos
237 videos
23 files
4.41K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470

Datasets citing this paper:
https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO

==================================

For more data science resources:
https://t.me/DataScienceT

#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
2