https://mpost.io/ja/openai-new-process-supervised-reward-modeling-improves-ai-reasoning/