https://mpost.io/tr/researchers-replicated-openais-work-based-on-proximal-policy-optimisation-ppo-in-rlhf/