https://mpost.io/uk/researchers-replicated-openais-work-based-on-proximal-policy-optimisation-ppo-in-rlhf/