Update policy-evaluation.md
#Loss Function and Gradient: inconsistent usage of train sample. The role of the weigth factors is not clear: if n is the sample size, there is no need of weight factors. If the weigth factors are the frequency of state-action-target triple, then we need the frequency table of the sample including the target value (triple not pair) and n is not the sample size but the number of different triples.