> For the complete documentation index, see [llms.txt](https://slm-lab.gitbook.io/foundations-of-deep-rl/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://slm-lab.gitbook.io/foundations-of-deep-rl/book-errata/chapter-1.md).

# Chapter 3 SARSA

### Page 60, Section 3.2.1 Intuition for Temporal Difference Learning&#x20;

Page 60, the first square bullet:

$$
(s\_0, a\_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^\*(s\_0, a\_{UP}) = 0.
$$

contains a typo in the index of the state. It should read:

$$
(s\_1, a\_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^\*(s\_1, a\_{UP}) = 0.
$$

### Page 62, Section 3.2.1 Intuition for Temporal Difference Learning

Page 62, Figure 3.3, Episode 4, Time step 5, the target value calculation:

$$
0 + 0.9 + 0.9 = 0.81
$$

contains a typo, the second + should have been x. It should read:

$$
0 + 0.9 \times 0.9 =0.81
$$

### Page 67, Section 3.4 SARSA Algorithm, Algorithm 3.1

Algorithm 3.1 SARSA, line 13:

$$
θ = θ − α∇θJ(θ)
$$

Contains a typo; the loss function *J* should have been *L*:

$$
θ = θ − α∇θL(θ)
$$
