Foundations of Deep Reinforcement Learning
  • Foundations of Deep Reinforcement Learning
  • Errata
    • Chapter 1 Introduction
    • Chapter 2 REINFORCE
    • Chapter 3 SARSA
    • Chapter 6 Advantage Actor-Critic
    • Chapter 7 PPO
    • Chapter 14 States
    • Appendix A Timeline
    • Appendix B Example Environments
  • Contact
Powered by GitBook
On this page
  • Page 60, Section 3.2.1 Intuition for Temporal Difference Learning
  • Page 62, Section 3.2.1 Intuition for Temporal Difference Learning
  • Page 67, Section 3.4 SARSA Algorithm, Algorithm 3.1

Was this helpful?

  1. Errata

Chapter 3 SARSA

PreviousChapter 2 REINFORCENextChapter 6 Advantage Actor-Critic

Last updated 2 years ago

Was this helpful?

Page 60, Section 3.2.1 Intuition for Temporal Difference Learning

Page 60, the first square bullet:

(s0,aUP): The agent moves out of the corridor, receives a reward of 0, and theepisode terminates, so Q∗(s0,aUP)=0.(s_0, a_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^*(s_0, a_{UP}) = 0.(s0​,aUP​): The agent moves out of the corridor, receives a reward of 0, and theepisode terminates, so Q∗(s0​,aUP​)=0.

contains a typo in the index of the state. It should read:

(s1,aUP): The agent moves out of the corridor, receives a reward of 0, and theepisode terminates, so Q∗(s1,aUP)=0.(s_1, a_{UP}): \text{ The agent moves out of the corridor, receives a reward of 0, and the} \newline \text{episode terminates, so } Q^*(s_1, a_{UP}) = 0.(s1​,aUP​): The agent moves out of the corridor, receives a reward of 0, and theepisode terminates, so Q∗(s1​,aUP​)=0.

Page 62, Section 3.2.1 Intuition for Temporal Difference Learning

Page 62, Figure 3.3, Episode 4, Time step 5, the target value calculation:

0+0.9+0.9=0.810 + 0.9 + 0.9 = 0.810+0.9+0.9=0.81

contains a typo, the second + should have been x. It should read:

Page 67, Section 3.4 SARSA Algorithm, Algorithm 3.1

Algorithm 3.1 SARSA, line 13:

Contains a typo; the loss function J should have been L:

0+0.9×0.9=0.810 + 0.9 \times 0.9 =0.810+0.9×0.9=0.81
θ=θ−α∇θJ(θ)θ = θ − α∇θJ(θ)θ=θ−α∇θJ(θ)
θ=θ−α∇θL(θ)θ = θ − α∇θL(θ)θ=θ−α∇θL(θ)