Discrete Environment Benchmark
Last updated
Last updated
Env. \ Alg.
DQN
DDQN+PER
A2C (GAE)
A2C (n-step)
PPO
SAC
Breakout
80.88
182
377
398
443
3.51*
Pong
18.48
20.5
19.31
19.56
20.58
19.87*
Qbert
5494
11426
12405
13590
13460
923*
Seaquest
1185
4405
1070
1684
1715
171*
LunarLander
192
233
25.21
68.23
214
276
UnityHallway
-0.32
0.27
0.08
-0.96
0.73
0.01
UnityPushBlock
4.88
4.93
4.68
4.93
4.97
-0.70
Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with
*
were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.For the full Atari benchmark, see Atari Benchmark