Chapter 6 Advantage Actor-Critic

Page 142, Section 6.3 A2C Algorithm, Algorithm 6.1

Thanks to Jérémie Clair Coté for this correction.

Algorithm 6.1 A2C, lines 18 and 20:

\theta_C = \theta_C + \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)

\theta_A = \theta_A + \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)

each contain a typo. The second term on the right hand side of each equation should be subtracted not added since the loss is being minimized. They should read:

\theta_C = \theta_C - \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)

\theta_A = \theta_A - \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)

PreviousChapter 3 SARSA NextChapter 7 PPO

Last updated 2 years ago

Was this helpful?