Chapter 6 Advantage Actor-Critic

Page 142, Section 6.3 A2C Algorithm, Algorithm 6.1

Thanks to Jérémie Clair Coté for this correction.

Algorithm 6.1 A2C, lines 18 and 20:

θC=θC+αCθCLval(θC)\theta_C = \theta_C + \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)
θA=θA+αAθALpol(θA)\theta_A = \theta_A + \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)

each contain a typo. The second term on the right hand side of each equation should be subtracted not added since the loss is being minimized. They should read:

θC=θCαCθCLval(θC)\theta_C = \theta_C - \alpha_C \nabla_{\theta_C} L_{val}(\theta_C)
θA=θAαAθALpol(θA)\theta_A = \theta_A - \alpha_A \nabla_{\theta_A} L_{pol}(\theta_A)

Last updated