https://akillness.github.io/posts/llm-agents-eval/
Microsoft's AgentEval seems like a promising tool to assist with this! - Fodev Jeong