https://123dok.net/document/lq511m7y-stationary-markov-decision-processes-worst-approach-reinforcement-learning.html