https://mpost.io/ms/researchers-replicated-openais-work-based-on-proximal-policy-optimisation-ppo-in-rlhf/