https://akillness.github.io/posts/evaluating-llm/
How to evaluate LLM Model? - Fodev Jeong