Skip to content
Snippets Groups Projects

Update policy-evaluation.md

Closed Hendrik Weiß requested to merge henwe--fh-zwickau.de-main-patch-46518 into main
1 file
+ 1
1
Compare changes
  • Side-by-side
  • Inline
@@ -51,7 +51,7 @@ Each of these three tabular methods gives rise to an approximate value function
## Loss Function and Gradient
Given training samples $(s_1,a_1,y_1),\ldots,(s_n,a_n,y_n)$ with inputs $(s_l,a_l)$ and targets $y_l$ we want to find the weigths $w$ of $Q_w$ by gradient descent. The loss function for training is usual mean squared error, but with additional weight factors $\mu_k$:
Given training samples $(s_1,a_1,y_1),\ldots,(s_n,a_n,y_n)$ with inputs $(s_l,a_l)$ and targets $y_l$ we want to find the weigths $w$ of $Q_w$ by gradient descent. The loss function for training is usual mean squared error, but with additional weight factors $\mu_l$:
\begin{equation*}
L(w):=\sum_{l=1}^n\mu_l\,\bigl(Q_w(s_l,a_l)-y_l\bigr)^2
\end{equation*}
Loading