> For the complete documentation index, see [llms.txt](https://slm-lab.gitbook.io/foundations-of-deep-rl/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://slm-lab.gitbook.io/foundations-of-deep-rl/book-errata/chapter-7-ppo.md).

# Chapter 7 PPO

### Page 176, Section 7.2 Proximal Policy Optimization (PPO)

*Thanks to Jérémie Clair Coté for suggesting we clarify this and for the discussion, and HyeAnn Lee for correction.*

Page 176, the last sentence of the 1st paragraph and the first two sentences of the 2nd paragraph read:

$$
\text{To see why this is the case, consider when } r\_t(\theta)A\_t \text{ would assume large positive} \newline \text{values, which is either } A\_t > 0, r\_t(\theta) > 0, \text{ or } A\_t < 0, r\_t(\theta) < 0.
$$

$$
\text{When } A\_t > 0, r\_t(\theta) > 0, \text{ if } r\_t(\theta) \text{ becomes much larger than 1, the upper clip term} \newline 1 - \epsilon \text{ applies to upper-bound } r\_t(\theta) \le 1 + \epsilon, \text{ hence } J^{CLIP} \le (1 + \epsilon)A\_t. \text{ On the} \newline \text{ other hand, when } A\_t < 0, r\_t(\theta) < 0, \text{ if } r\_t(\theta) \text{ becomes much smaller than 1, the} \newline \text{lower clip term } 1 - \epsilon \text{ applies to again upper-bound } J^{CLIP} \le (1 - \epsilon)A\_t.
$$

This is confusing because **(1)** $$r\_t(\theta)$$ cannot be < 0 because it is a ratio of two probabilities and **(2)** there is a typo when referring to the upper clip term. The sentences should be replaced with:

$$
\text{To see why this is the case, let's consider when } A\_t \text{ is either } \gt 0 \text{ or } \lt 0. \text{ Note} \newline \text{ that } r\_t(\theta) \text{ is always} \ge 0 \text{ because it is a ratio of two probabilites}.
$$

$$
\text{When } A\_t > 0, \text{ if } r\_t(\theta) > 1 + \epsilon, \text{ the upper clip term } 1 + \epsilon \text{ applies to upper-bound } \newline r\_t(\theta) \le 1 + \epsilon, \text{ hence } J^{CLIP} \le (1 + \epsilon)A\_t. \text{ On the other hand, when } A\_t < 0, \text{ if } \newline r\_t(\theta) < 1 - \epsilon \text{ the lower clip term } 1 - \epsilon \text{ applies to again upper-bound } \newline J^{CLIP} \le (1 - \epsilon)A\_t.
$$

### Page 178, Section 7.3 PPO Algorithm, Algorithm 7.2

*Thanks to Jérémie Clair Coté for this correction.*

Algorithm 7.2 PPO with clipping, line 35:

$$
\theta\_C = \theta\_C + \alpha\_C \nabla\_{\theta\_C} L\_{val}(\theta\_C)
$$

contains a typo. The second term on the right hand side of the equation should be subtracted not added since the loss is being minimized. It should read:

$$
\theta\_C = \theta\_C - \alpha\_C \nabla\_{\theta\_C} L\_{val}(\theta\_C)
$$

Note that the actor parameter update on line 33 of algorithm 7.2 is correct because the policy "loss" for PPO is formulated as an objective to be maximized (see equation 7.39 on page 177).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://slm-lab.gitbook.io/foundations-of-deep-rl/book-errata/chapter-7-ppo.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
