✨SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
📝 Summary:
SWE-EVO is a new benchmark for AI coding agents that evaluates them on long-horizon, multi-step software evolution tasks across many files. It reveals a significant gap in current models abilities, with even top models achieving only 21 percent resolution. This highlights their struggle with sust...
🔹 Publication Date: Published on Dec 20
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18470
• PDF: https://arxiv.org/pdf/2512.18470
✨ Datasets citing this paper:
• https://huggingface.co/datasets/Fsoft-AIC/SWE-EVO
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AICoding #SoftwareEvolution #Benchmarking #LLMs #AIResearch
❤2