Appendix B Example Environments
Page 347, B.1.1 CartPole-v0
There is an error in the termination condition. The pole falls over when is it greater than 15 degrees from vertical, not 12 degrees.
Page 347, B.1.3 LunarLander-v2
There is an error in the objective definition. "Land the lander without crashing fuel" should read "Land the lander without crashing and conserve fuel."
Page 348, B.1.3 LunarLander-v2
The reward definition is not complete. The reward for the LunarLander-v2 environment is defined as follows: -100 for crashing, +100 for landing. -0.3 per time step when firing main engine, +100 to +140 for landing between the flags (landing pad) with zero speed, +10 for each leg ground contact.
Last updated