Skip to content

Update stateless.md

Hendrik Weiß requested to merge henwe--fh-zwickau.de-main-patch-12319 into main

in 111 and 126: an non-chosen action should have an action-value smaller than all possible reward values (Maximation-return). Sometimes its nice to give negative rewards. I think -\infty is better than zero, but then there is a problem with update Q_t at beginning (causes if n=0)

Merge request reports