Discrete Environment Benchmark

🥇 Discrete Environment Benchmark Result

Env. \ Alg.

DQN

DDQN+PER

A2C (GAE)

A2C (n-step)

PPO

SAC

Breakout

80.88

182

377

398

443

3.51*

Pong

18.48

20.5

19.31

19.56

20.58

19.87*

Qbert

5494

11426

12405

13590

13460

923*

Seaquest

1185

4405

1070

1684

1715

171*

LunarLander

192

233

25.21

68.23

214

276

UnityHallway

-0.32

0.27

0.08

-0.96

0.73

0.01

UnityPushBlock

4.88

4.93

4.68

4.93

4.97

-0.70

Episode score at the end of training attained by SLM Lab implementations on discrete-action control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. A Random baseline with score averaged over 100 episodes is included. Results marked with * were trained using the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time. For SAC, Breakout, Pong and Seaquest were trained for 2M frames instead of 10M frames.
For the full Atari benchmark, see Atari Benchmark