Issue on page /content/reinforcement/monte-carlo.html
in \eps-soft-policies:
the shown \eps-greedy policy after the algorithm isn't \eps-greedy in the sence of definition (see basic notations), because the probapility is not 1-\eps for greedy behaviour. Better: \begin{equation*} \pi(a,s)=\begin{cases} 1-\varepsilon,&\text{if }a=a_{\mathrm{g}}(s),\ \frac{\varepsilon}{|\mathcal{A}(s)|-1},&\text{else}. \end{cases} \end{equation*}
or in case of more then one Q-maximazing actions: split 1-\eps to the Q-maximazing actions and \eps to the remainder of actions