Discrete Environment Benchmark
Env. \ Alg. | DQN | DDQN+PER | A2C (GAE) | A2C (n-step) | PPO | SAC |
Breakout | 80.88 | 182 | 377 | 398 | 443 | 3.51* |
Pong | 18.48 | 20.5 | 19.31 | 19.56 | 20.58 | 19.87* |
Qbert | 5494 | 11426 | 12405 | 13590 | 13460 | 923* |
Seaquest | 1185 | 4405 | 1070 | 1684 | 1715 | 171* |
LunarLander | 192 | 233 | 25.21 | 68.23 | 214 | 276 |
UnityHallway | -0.32 | 0.27 | 0.08 | -0.96 | 0.73 | 0.01 |
UnityPushBlock | 4.88 | 4.93 | 4.68 | 4.93 | 4.97 | -0.70 |
Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with*
were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.







