Foundations of Deep Reinforcement Learning
  • Foundations of Deep Reinforcement Learning
  • Errata
    • Chapter 1 Introduction
    • Chapter 2 REINFORCE
    • Chapter 3 SARSA
    • Chapter 6 Advantage Actor-Critic
    • Chapter 7 PPO
    • Chapter 14 States
    • Appendix A Timeline
    • Appendix B Example Environments
  • Contact
Powered by GitBook
On this page

Was this helpful?

  1. Errata

Chapter 6 Advantage Actor-Critic

Page 142, Section 6.3 A2C Algorithm, Algorithm 6.1

Thanks to Jérémie Clair Coté for this correction.

Algorithm 6.1 A2C, lines 18 and 20:

θC=θC+αC∇θCLval(θC)\theta_C = \theta_C + \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)θC​=θC​+αC​∇θC​​Lval​(θC​)
θA=θA+αA∇θALpol(θA)\theta_A = \theta_A + \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)θA​=θA​+αA​∇θA​​Lpol​(θA​)

each contain a typo. The second term on the right hand side of each equation should be subtracted not added since the loss is being minimized. They should read:

θC=θC−αC∇θCLval(θC)\theta_C = \theta_C - \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)θC​=θC​−αC​∇θC​​Lval​(θC​)
θA=θA−αA∇θALpol(θA)\theta_A = \theta_A - \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)θA​=θA​−αA​∇θA​​Lpol​(θA​)

PreviousChapter 3 SARSANextChapter 7 PPO

Last updated 2 years ago

Was this helpful?