ML Research Hub

✨SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios

📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...

🔹 Publication Date: Published on Dec 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470

✨ Datasets citing this paper:
• https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch

❤2

534 views09:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform