Chapter 3 SARSA

Page 60, the first square bullet:

(s_0, a_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^*(s_0, a_{UP}) = 0.

contains a typo in the index of the state. It should read:

(s_1, a_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^*(s_1, a_{UP}) = 0.

Page 62, Figure 3.3, Episode 4, Time step 5, the target value calculation:

0 + 0.9 + 0.9 = 0.81

contains a typo, the second + should have been x. It should read:

0 + 0.9 \times 0.9 =0.81

Algorithm 3.1 SARSA, line 13:

θ = θ − α∇θJ(θ)

Contains a typo; the loss function J should have been L:

θ = θ − α∇θL(θ)

Last updated 2 years ago

Was this helpful?