Issue on page /content/exercises/reinforcement/markov.html
Task: Are there policies that maximize return for \gamma=1
, but not for $\gamma<1$?
answer in solution: yes, all policies that visit 5 regularly, but not via shortest path example: 1→2→3→6→9→8→7→4→5
my answer: no, gamma = 1 is impossible for stationary continous problems with positive returns: because if a policy reaches state 5 from all others states (in some way), then the return is infinite and infinite is no maximum