https://mpost.io/fi/researchers-replicated-openais-work-based-on-proximal-policy-optimisation-ppo-in-rlhf/