Update policy-evaluation.md
Compare changes
@@ -51,7 +51,7 @@ Each of these three tabular methods gives rise to an approximate value function
#Loss Function and Gradient: inconsistent usage of train sample. The role of the weigth factors is not clear: if n is the sample size, there is no need of weight factors. If the weigth factors are the frequency of state-action-target triple, then we need the frequency table of the sample including the target value (triple not pair) and n is not the sample size but the number of different triples.