Appendix B Example Environments

Page 347, B.1.1 CartPole-v0

There is an error in the termination condition. The pole falls over when is it greater than 15 degrees from vertical, not 12 degrees.

Page 347, B.1.3 LunarLander-v2

There is an error in the objective definition. "Land the lander without crashing fuel" should read "Land the lander without crashing and conserve fuel."

Page 348, B.1.3 LunarLander-v2

The reward definition is not complete. The reward for the LunarLander-v2 environment is defined as follows: -100 for crashing, +100 for landing. -0.3 per time step when firing main engine, +100 to +140 for landing between the flags (landing pad) with zero speed, +10 for each leg ground contact.

PreviousAppendix A Timeline NextContact

Last updated 5 years ago

Was this helpful?