Chapter 7 PPO

Page 176, Section 7.2 Proximal Policy Optimization (PPO)

Thanks to Jérémie Clair Coté for suggesting we clarify this and for the discussion, and HyeAnn Lee for correction.

Page 176, the last sentence of the 1st paragraph and the first two sentences of the 2nd paragraph read:

Page 178, Section 7.3 PPO Algorithm, Algorithm 7.2

Thanks to Jérémie Clair Coté for this correction.

Algorithm 7.2 PPO with clipping, line 35:

contains a typo. The second term on the right hand side of the equation should be subtracted not added since the loss is being minimized. It should read:

Note that the actor parameter update on line 33 of algorithm 7.2 is correct because the policy "loss" for PPO is formulated as an objective to be maximized (see equation 7.39 on page 177).

Last updated