Hendrik Weiß · 88197946
--- a/content_md/reinforcement/approximate-methods/policy-evaluation.md

+ 1

− 1
+++ b/content_md/reinforcement/approximate-methods/policy-evaluation.md

+ 1

− 1
 @@ -51,7 +51,7 @@ Each of these three tabular methods gives rise to an approximate value function

 ## Loss Function and Gradient

-Given training samples $(s_1,a_1,y_1),\ldots,(s_n,a_n,y_n)$ with inputs $(s_l,a_l)$ and targets $y_l$ we want to find the weigths $w$ of $Q_w$ by gradient descent. The loss function for training is usual mean squared error, but with additional weight factors $\mu_k$:
+Given training samples $(s_1,a_1,y_1),\ldots,(s_n,a_n,y_n)$ with inputs $(s_l,a_l)$ and targets $y_l$ we want to find the weigths $w$ of $Q_w$ by gradient descent. The loss function for training is usual mean squared error, but with additional weight factors $\mu_l$:
 \begin{equation*}
 L(w):=\sum_{l=1}^n\mu_l\,\bigl(Q_w(s_l,a_l)-y_l\bigr)^2
 \end{equation*}