https://puzzlellms.github.io/posts/Evaluating-Large-Language-Models-Trained-on-Code/
Evaluating Large Language Models Trained on Code - PuzzleLLMs