PythonHub
2.53K subscribers
2.35K photos
50.1K links
News & links about Python programming.
https://pythonhub.dev/
Download Telegram
Macro Evals for Agentic Systems

This cookbook outlines a macro-evaluation workflow for analyzing multi-agent systems at scale using a simulated electric vehicle order pipeline. It demonstrates how to look past individual responses and evaluate systemic behaviors such as orchestration, routing, and tool choices by combining lower-level execution checks (via Promptfoo) into population-level trace analyses to discover and...

https://developers.openai.com/cookbook/examples/partners/macro_evals_for_agentic_systems/macro_evals_for_agentic_systems