Issue on page /content/reinforcement/dynamic-programming.html
In policy iteration:
i don't think, that the mapping in the Bellman fixpoint iteration v=Bv is a matrix-vector-multiplication. Of course the Bellman equation is a linear system, but we have to add r; the system is not homogeneous