๐พAtari Benchmark
A2C & PPO Atari Results (v5)
SLM Lab v5 validates A2C and PPO on ALE (Arcade Learning Environment) environments. The ALE provides 50+ classic Atari 2600 games as standardized RL benchmarks.




54 games tested with all results available on HuggingFace.
v5 vs v4 Difficulty: Gymnasium ALE v5 is significantly harder than OpenAI Gym's NoFrameskip-v4:
Sticky actions (
repeat_action_probability=0.25) per Machado et al. (2018)Deterministic frame skipping with proper action handling
Stricter termination conditions
Expect 10-40% lower scores compared to older benchmarks. Some games (Bowling, Skiing) are much harder in v5.
Methodology
Results show Trial-level performance:
Trial = 4 Sessions with different random seeds
Session = One complete training run
Score = Final 100-checkpoint moving average (
total_reward_ma)
The trial score is the mean across 4 sessions, providing statistically meaningful results.
Configuration
Settings: max_frame 10e6 | num_envs 16 | max_session 4 | log_frequency 10000
Algorithm Specs (all use Nature CNN [32,64,64] + 512fc):
DDQN+PER: Skipped - off-policy variants ~6x slower (~230 fps vs ~1500 fps), not cost effective at 10M frames
A2C: a2c_gae_atari.json - RMSprop (lr=7e-4), training_frequency=32
PPO: ppo_atari.json - AdamW (lr=2.5e-4), minibatch=256, horizon=128, epochs=4
Environment: Gymnasium ALE v5 with life_loss_info=true, sticky actions (repeat_action_probability=0.25)
PPO Lambda Variants
Different games benefit from different lambda values for GAE. All variants use the same spec file:
ppo_atari
0.95
Strategic games (default)
ppo_atari_lam85
0.85
Mixed games
ppo_atari_lam70
0.70
Action games
Lambda Comparison Table - click to expand
Shows scores for all three lambda variants where tested. Bold = best score, - = not tested.
ALE/AirRaid-v5
8245
-
-
ALE/Alien-v5
1453
1353
1274
ALE/Amidar-v5
574
580
-
ALE/Assault-v5
4059
4293
3314
ALE/Asterix-v5
2967
3482
-
ALE/Asteroids-v5
1497
1554
-
ALE/Atlantis-v5
792886
754k
710k
ALE/BankHeist-v5
1045
1045
-
ALE/BattleZone-v5
21270
26383
13857
ALE/BeamRider-v5
2765
-
-
ALE/Berzerk-v5
1072
-
-
ALE/Bowling-v5
46.45
-
-
ALE/Boxing-v5
91.17
-
-
ALE/Breakout-v5
191
292
327
ALE/Carnival-v5
3071
3013
3967
ALE/Centipede-v5
3917
-
4915
ALE/ChopperCommand-v5
5355
-
-
ALE/CrazyClimber-v5
107183
107370
-
ALE/Defender-v5
37162
-
51439
ALE/DemonAttack-v5
7755
-
16558
ALE/DoubleDunk-v5
-2.38
-
-
ALE/ElevatorAction-v5
5446
363
3933
ALE/Enduro-v5
414
898
872
ALE/FishingDerby-v5
22.80
27.10
-
ALE/Freeway-v5
31.30
-
-
ALE/Frostbite-v5
301
275
267
ALE/Gopher-v5
4172
-
6508
ALE/Gravitar-v5
599
253
145
ALE/Hero-v5
21052
28238
-
ALE/IceHockey-v5
-3.93
-5.58
-7.36
ALE/Jamesbond-v5
662
-
-
ALE/JourneyEscape-v5
-1582
-1252
-1547
ALE/Kangaroo-v5
2623
9912
-
ALE/Krull-v5
7841
-
-
ALE/KungFuMaster-v5
18973
28334
29068
ALE/MsPacman-v5
2308
2372
2297
ALE/NameThisGame-v5
5993
-
-
ALE/Phoenix-v5
7940
-
15659
ALE/Pong-v5
15.01
16.91
12.85
ALE/Pooyan-v5
4704
-
5716
ALE/Qbert-v5
15094
-
-
ALE/Riverraid-v5
7319
9428
-
ALE/RoadRunner-v5
24204
37015
-
ALE/Robotank-v5
20.07
8.24
2.59
ALE/Seaquest-v5
1796
-
-
ALE/Skiing-v5
-19340
-22980
-29975
ALE/Solaris-v5
2094
-
-
ALE/SpaceInvaders-v5
726
-
-
ALE/StarGunner-v5
31862
-
47495
ALE/Surround-v5
-2.52
-
-6.79
ALE/Tennis-v5
-7.66
-4.41
-
ALE/TimePilot-v5
4668
-
-
ALE/Tutankham-v5
203
217
-
ALE/UpNDown-v5
182472
-
-
ALE/VideoPinball-v5
31385
-
56746
ALE/WizardOfWor-v5
5814
5466
4740
ALE/YarsRevenge-v5
17120
-
-
ALE/Zaxxon-v5
10756
-
-
Running Benchmarks
All games use the same spec file with variable substitution for the environment.
Remote (recommended) - cloud GPU via dstack, auto-syncs to HuggingFace:
Remote setup: cp .env.example .env then set HF_TOKEN. See Remote Training for dstack config.
Local - runs on your machine (requires GPU, ~2-3 hours per game):
GPU required for Atari. Each game runs 10M frames and takes 2-3 hours on cloud GPU (L4/A10G). Local CPU training is not practical. Cloud GPUs via dstack are faster and often cheaper than running on local hardware.
Download and Replay
Results
Skipped (hard exploration): Adventure, MontezumaRevenge, Pitfall, PrivateEye, Venture
Training Curves
Multi-trial comparison plots showing A2C vs PPO mean returns (moving average) vs training frames. Shaded regions show standard deviation across 4 sessions.


























































Historical Results (v4)
OpenAI Gym Atari Results (v4) - click to expand
Deprecated Environments: These v4 results used OpenAI Gym NoFrameskip-v4 environments (no sticky actions). Gymnasium ALE v5 environments are harder due to sticky action probability. Results are not directly comparable.
Adventure
-0.94
-0.92
-0.77
-0.85
-0.3
AirRaid
1876
3974
4202
3557
4028
Alien
822
1574
1519
1627
1413
Amidar
90.95
431
577
418
795
Assault
1392
2567
3366
3312
3619
Asterix
1253
6866
5559
5223
6132
Asteroids
439
426
2951
2147
2186
Atlantis
68679
644810
2747371
2259733
2148077
BankHeist
131
623
855
1170
1183
BattleZone
6564
6395
4336
4533
13649
BeamRider
2799
5870
2659
4139
4299
Berzerk
319
401
1073
763
860
Bowling
30.29
39.5
24.51
23.75
31.64
Boxing
72.11
90.98
1.57
1.26
96.53
Breakout
80.88
182
377
398
443
Carnival
4280
4773
2473
1827
4566
Centipede
1899
2153
3909
4202
5003
ChopperCommand
1083
4020
3043
1280
3357
CrazyClimber
46984
88814
106256
109998
116820
Defender
281999
313018
665609
657823
534639
DemonAttack
1705
19856
23779
19615
121172
DoubleDunk
-21.44
-22.38
-5.15
-13.3
-6.01
ElevatorAction
32.62
17.91
9966
8818
6471
Enduro
437
959
787
0.0
1926
FishingDerby
-88.14
-1.7
16.54
1.65
36.03
Freeway
24.46
30.49
30.97
0.0
32.11
Frostbite
98.8
2497
277
261
1062
Gopher
1095
7562
929
1545
2933
Gravitar
87.34
258
313
433
223
Hero
1051
12579
16502
19322
17412
IceHockey
-14.96
-14.24
-5.79
-6.06
-6.43
Jamesbond
44.87
702
521
453
561
JourneyEscape
-4818
-2003
-921
-2032
-1094
Kangaroo
1965
8897
67.62
554
4989
Krull
5522
6650
7785
6642
8477
KungFuMaster
2288
16547
31199
25554
34523
MontezumaRevenge
0.0
0.02
0.08
0.19
1.08
MsPacman
1175
2215
1965
2158
2350
NameThisGame
3915
4474
5178
5795
6386
Phoenix
2909
8179
16345
13586
30504
Pitfall
-68.83
-73.65
-101
-31.13
-35.93
Pong
18.48
20.5
19.31
19.56
20.58
Pooyan
1958
2741
2862
2531
6799
PrivateEye
784
303
93.22
78.07
50.12
Qbert
5494
11426
12405
13590
13460
Riverraid
953
10492
8308
7565
9636
RoadRunner
15237
29047
30152
31030
32956
Robotank
3.43
9.05
2.98
2.27
2.27
Seaquest
1185
4405
1070
1684
1715
Skiing
-14094
-12883
-19481
-14234
-24713
Solaris
612
1396
2115
2236
1892
SpaceInvaders
451
670
733
750
797
StarGunner
3565
38238
44816
48410
60579
Tennis
-23.78
-10.33
-22.42
-19.06
-11.52
TimePilot
2819
1884
3331
3440
4398
Tutankham
35.03
159
161
175
211
UpNDown
2043
11632
89769
18878
262208
Venture
4.56
9.61
0.0
0.0
11.84
VideoPinball
8056
79730
35371
40423
58096
WizardOfWor
869
328
1516
1247
4283
YarsRevenge
5816
15698
27097
11742
10114
Zaxxon
442
54.28
64.72
24.7
641
The table above presents results for 62 Atari games. All agents were trained for 10M frames (40M including skipped frames). Reported results are the episode score at the end of training, averaged over the previous 100 evaluation checkpoints with each checkpoint averaged over 4 Sessions.
Last updated
Was this helpful?