# Atari Benchmark

## A2C, PPO & SAC Atari Results

SLM Lab v5.2 validates A2C, PPO, and SAC on [ALE (Arcade Learning Environment)](https://ale.farama.org/environments/) environments using the TorchArc neural network architecture. The ALE provides 50+ classic Atari 2600 games as standardized RL benchmarks.

|                                                     MsPacman                                                     |                                                     Breakout                                                     |                                                     Qbert                                                     |                                                     BeamRider                                                     |
| :--------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------: |
| ![MsPacman](https://user-images.githubusercontent.com/8209263/63994685-5cb30d00-caaa-11e9-8f35-78e29a7d60f5.gif) | ![Breakout](https://user-images.githubusercontent.com/8209263/63994695-650b4800-caaa-11e9-9982-2462738caa45.gif) | ![Qbert](https://user-images.githubusercontent.com/8209263/63994672-54f36880-caaa-11e9-9757-7780725b53af.gif) | ![BeamRider](https://user-images.githubusercontent.com/8209263/63994698-689ecf00-caaa-11e9-991f-0a5e9c2f5804.gif) |

**57 games tested** with all results available on [HuggingFace](https://huggingface.co/datasets/SLM-Lab/benchmark).

{% hint style="warning" %}
**v5 vs v4 Difficulty:** Gymnasium ALE v5 is significantly harder than OpenAI Gym's NoFrameskip-v4:

* **Sticky actions** (`repeat_action_probability=0.25`) per [Machado et al. (2018)](https://arxiv.org/abs/1709.06009)
* **Deterministic frame skipping** with proper action handling
* **Stricter termination** conditions

Expect **10-40% lower scores** compared to older benchmarks. Some games (Bowling, Skiing) are much harder in v5.
{% endhint %}

### Methodology

Results show **Trial-level** performance:

1. **Trial** = 4 Sessions with different random seeds
2. **Session** = One complete training run
3. **Score** = Final 100-checkpoint moving average (`total_reward_ma`)

The trial score is the mean across 4 sessions, providing statistically meaningful results.

### Configuration

**Settings**: num\_envs 16 | max\_session 4 | log\_frequency 10000

**Algorithm Specs** (all use Nature CNN \[32,64,64] + 512fc):

* **A2C**: [a2c\_atari\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/a2c/a2c_atari_arc.yaml) - RMSprop (lr=7e-4), training\_frequency=32, max\_frame=10e6
* **PPO**: [ppo\_atari\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_atari_arc.yaml) - AdamW (lr=2.5e-4), minibatch=256, horizon=128, epochs=4, max\_frame=10e6
* **SAC**: [sac\_atari\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_atari_arc.yaml) - Categorical SAC, AdamW (lr=3e-4), training\_iter=3, training\_frequency=4, max\_frame=2e6
* **CrossQ** *(experimental)*: [crossq\_atari.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_atari.yaml) - Categorical CrossQ, AdamW (lr=1e-3), training\_iter=3, FC1024 critics, max\_frame=2e6 — tested on 6 games only

{% hint style="info" %}
**SAC and CrossQ use only 2M frames** vs A2C/PPO's 10M frames. CrossQ runs at \~320 fps (2.5x faster than SAC) but generally underperforms SAC/PPO on Atari — cross-batch BN is less effective with temporally correlated frames.
{% endhint %}

**Environment:** Gymnasium ALE v5 with `life_loss_info=true`, sticky actions (`repeat_action_probability=0.25`)

### PPO Lambda Variants

Different games benefit from different lambda values for GAE. All variants use the same spec file:

| SPEC\_NAME             | Lambda | Best for                  |
| ---------------------- | ------ | ------------------------- |
| ppo\_atari\_arc        | 0.95   | Strategic games (default) |
| ppo\_atari\_lam85\_arc | 0.85   | Mixed games               |
| ppo\_atari\_lam70\_arc | 0.70   | Action games              |

<details>

<summary>PPO Lambda Comparison Table - click to expand</summary>

Shows the best PPO lambda variant per game. **Bold** = best score, `-` = not tested.

| ENV                   | ppo\_atari\_arc | ppo\_atari\_lam85\_arc | ppo\_atari\_lam70\_arc |
| --------------------- | --------------- | ---------------------- | ---------------------- |
| ALE/AirRaid-v5        | **7042.84**     | -                      | -                      |
| ALE/Alien-v5          | **1789.26**     | -                      | -                      |
| ALE/Amidar-v5         | -               | **584.28**             | -                      |
| ALE/Assault-v5        | -               | **4448.16**            | -                      |
| ALE/Asterix-v5        | -               | **3235.46**            | -                      |
| ALE/Asteroids-v5      | -               | **1577.92**            | -                      |
| ALE/Atlantis-v5       | **848087.19**   | -                      | -                      |
| ALE/BankHeist-v5      | **1058.25**     | -                      | -                      |
| ALE/BattleZone-v5     | -               | **27176.78**           | -                      |
| ALE/BeamRider-v5      | **2761.75**     | -                      | -                      |
| ALE/Berzerk-v5        | **835.46**      | -                      | -                      |
| ALE/Bowling-v5        | **45.02**       | -                      | -                      |
| ALE/Boxing-v5         | **92.18**       | -                      | -                      |
| ALE/Breakout-v5       | -               | -                      | **326.47**             |
| ALE/Carnival-v5       | -               | -                      | **3912.59**            |
| ALE/Centipede-v5      | -               | -                      | **4780.75**            |
| ALE/ChopperCommand-v5 | **5391.30**     | -                      | -                      |
| ALE/CrazyClimber-v5   | -               | **112094.03**          | -                      |
| ALE/Defender-v5       | -               | -                      | **47894.69**           |
| ALE/DemonAttack-v5    | -               | -                      | **19370.38**           |
| ALE/DoubleDunk-v5     | **-3.03**       | -                      | -                      |
| ALE/Enduro-v5         | -               | **986.46**             | -                      |
| ALE/FishingDerby-v5   | -               | **25.71**              | -                      |
| ALE/Freeway-v5        | **32.42**       | -                      | -                      |
| ALE/Frostbite-v5      | **284.07**      | -                      | -                      |
| ALE/Gopher-v5         | -               | -                      | **6500.38**            |
| ALE/Gravitar-v5       | **602.58**      | -                      | -                      |
| ALE/Hero-v5           | -               | **22477.89**           | -                      |
| ALE/IceHockey-v5      | **-4.05**       | -                      | -                      |
| ALE/Jamesbond-v5      | **710.98**      | -                      | -                      |
| ALE/JourneyEscape-v5  | -               | **-1248.98**           | -                      |
| ALE/Kangaroo-v5       | -               | -                      | **10660.35**           |
| ALE/Krull-v5          | **7874.33**     | -                      | -                      |
| ALE/KungFuMaster-v5   | -               | -                      | **28128.04**           |
| ALE/MsPacman-v5       | -               | **2330.74**            | -                      |
| ALE/NameThisGame-v5   | **6879.23**     | -                      | -                      |
| ALE/Phoenix-v5        | -               | -                      | **13923.26**           |
| ALE/Pong-v5           | -               | **16.69**              | -                      |
| ALE/Pooyan-v5         | -               | -                      | **5308.66**            |
| ALE/Qbert-v5          | **15460.48**    | -                      | -                      |
| ALE/Riverraid-v5      | -               | **9599.75**            | -                      |
| ALE/RoadRunner-v5     | -               | **37980.95**           | -                      |
| ALE/Robotank-v5       | **21.04**       | -                      | -                      |
| ALE/Seaquest-v5       | **1775.14**     | -                      | -                      |
| ALE/Skiing-v5         | **-28217.28**   | -                      | -                      |
| ALE/Solaris-v5        | **2212.78**     | -                      | -                      |
| ALE/SpaceInvaders-v5  | **892.49**      | -                      | -                      |
| ALE/StarGunner-v5     | -               | -                      | **49328.73**           |
| ALE/Surround-v5       | **-4.47**       | -                      | -                      |
| ALE/Tennis-v5         | -               | **-12.27**             | -                      |
| ALE/TimePilot-v5      | **4432.73**     | -                      | -                      |
| ALE/Tutankham-v5      | -               | **210.87**             | -                      |
| ALE/UpNDown-v5        | -               | **147168.80**          | -                      |
| ALE/VideoPinball-v5   | -               | -                      | **38370.30**           |
| ALE/WizardOfWor-v5    | **6100.42**     | -                      | -                      |
| ALE/YarsRevenge-v5    | **12873.91**    | -                      | -                      |
| ALE/Zaxxon-v5         | **9523.49**     | -                      | -                      |

**Legend**: **Bold** = Best score | - = Not tested

</details>

### Running Benchmarks

All games use the same spec file with variable substitution for the environment.

**Remote (recommended)** - cloud GPU via [dstack](https://dstack.ai), auto-syncs to HuggingFace:

```bash
# A2C (10M frames)
source .env && slm-lab run-remote --gpu -s env=ALE/Breakout-v5 -s max_frame=1e7 \
  slm_lab/spec/benchmark_arc/a2c/a2c_atari_arc.yaml a2c_gae_atari_arc train -n breakout-a2c

# PPO (10M frames)
source .env && slm-lab run-remote --gpu -s env=ALE/Breakout-v5 -s max_frame=1e7 \
  slm_lab/spec/benchmark_arc/ppo/ppo_atari_arc.yaml ppo_atari_lam70_arc train -n breakout-ppo

# SAC (2M frames - off-policy, more sample-efficient but slower per frame)
source .env && slm-lab run-remote --gpu -s env=ALE/Breakout-v5 \
  slm_lab/spec/benchmark_arc/sac/sac_atari_arc.yaml sac_atari_arc train -n breakout-sac
```

Remote setup: `cp .env.example .env` then set `HF_TOKEN`. See [Remote Training](/slm-lab/using-slm-lab/remote-training.md) for dstack config.

**Local** - runs on your machine (requires GPU, \~2-3 hours per game):

```bash
slm-lab run -s env=ALE/Breakout-v5 -s max_frame=1e7 slm_lab/spec/benchmark_arc/ppo/ppo_atari_arc.yaml ppo_atari_arc train
```

{% hint style="warning" %}
**GPU required for Atari.** Each game runs 10M frames and takes 2-3 hours on cloud GPU (L4/A10G). Local CPU training is not practical. Cloud GPUs via dstack are faster and often cheaper than running on local hardware.
{% endhint %}

### Download and Replay

```bash
# List Atari experiments (requires HF_REPO=SLM-Lab/benchmark in .env)
source .env && slm-lab list | grep atari

# Download a specific game
source .env && slm-lab pull ppo_atari_arc_breakout

# Replay
slm-lab run slm_lab/spec/benchmark_arc/ppo/ppo_atari_arc.yaml ppo_atari_lam70_arc enjoy@data/ppo_atari_lam70_arc_breakout_*/ppo_atari_lam70_arc_breakout_t0_spec.json
```

***

## Results

| ENV                   | Score     | SPEC\_NAME             | HF Data                                                                                                                                                                             |
| --------------------- | --------- | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ALE/AirRaid-v5        | 7042.84   | ppo\_atari\_arc        | [ppo\_atari\_arc\_airraid\_2026\_02\_13\_124015](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_airraid_2026_02_13_124015)                          |
|                       | 1832.54   | sac\_atari\_arc        | [sac\_atari\_arc\_airraid\_2026\_02\_17\_104002](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_airraid_2026_02_17_104002)                          |
|                       | 5067      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_airraid\_2026\_02\_01\_082446](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_airraid_2026_02_01_082446)                          |
| ALE/Alien-v5          | 1789.26   | ppo\_atari\_arc        | [ppo\_atari\_arc\_alien\_2026\_02\_13\_124017](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_alien_2026_02_13_124017)                              |
|                       | 833.53    | sac\_atari\_arc        | [sac\_atari\_arc\_alien\_2026\_02\_15\_200940](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_alien_2026_02_15_200940)                              |
|                       | 1488      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_alien\_2026\_02\_01\_000858](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_alien_2026_02_01_000858)                              |
| ALE/Amidar-v5         | 584.28    | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_amidar\_2026\_02\_13\_124155](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_amidar_2026_02_13_124155)               |
|                       | 185.45    | sac\_atari\_arc        | [sac\_atari\_arc\_amidar\_2026\_02\_16\_042529](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_amidar_2026_02_16_042529)                            |
|                       | 330       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_amidar\_2026\_02\_01\_082251](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_amidar_2026_02_01_082251)                            |
| ALE/Assault-v5        | 4448.16   | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_assault\_2026\_02\_13\_124219](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_assault_2026_02_13_124219)             |
|                       | 1009.42   | sac\_atari\_arc        | [sac\_atari\_arc\_assault\_2026\_02\_16\_042532](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_assault_2026_02_16_042532)                          |
|                       | 1646      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_assault\_2026\_02\_01\_082252](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_assault_2026_02_01_082252)                          |
| ALE/Asterix-v5        | 3235.46   | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_asterix\_2026\_02\_13\_124329](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_asterix_2026_02_13_124329)             |
|                       | 1504.44   | sac\_atari\_arc        | [sac\_atari\_arc\_asterix\_2026\_02\_16\_064430](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_asterix_2026_02_16_064430)                          |
|                       | 2712      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_asterix\_2026\_02\_01\_082315](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_asterix_2026_02_01_082315)                          |
| ALE/Asteroids-v5      | 1577.92   | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_asteroids\_2026\_02\_13\_171445](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_asteroids_2026_02_13_171445)         |
|                       | 1203.52   | sac\_atari\_arc        | [sac\_atari\_arc\_asteroids\_2026\_02\_16\_051747](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_asteroids_2026_02_16_051747)                      |
|                       | 2106      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_asteroids\_2026\_02\_01\_082328](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_asteroids_2026_02_01_082328)                      |
| ALE/Atlantis-v5       | 848087.19 | ppo\_atari\_arc        | [ppo\_atari\_arc\_atlantis\_2026\_02\_13\_171349](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_atlantis_2026_02_13_171349)                        |
|                       | 56787.32  | sac\_atari\_arc        | [sac\_atari\_arc\_atlantis\_2026\_02\_17\_105837](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_atlantis_2026_02_17_105837)                        |
|                       | 873365    | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_atlantis\_2026\_02\_01\_082330](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_atlantis_2026_02_01_082330)                        |
| ALE/BankHeist-v5      | 1058.25   | ppo\_atari\_arc        | [ppo\_atari\_arc\_bankheist\_2026\_02\_13\_230416](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_bankheist_2026_02_13_230416)                      |
|                       | 138.43    | sac\_atari\_arc        | [sac\_atari\_arc\_bankheist\_2026\_02\_17\_105306](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_bankheist_2026_02_17_105306)                      |
|                       | 1099      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_bankheist\_2026\_02\_01\_082403](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_bankheist_2026_02_01_082403)                      |
| ALE/BattleZone-v5     | 27176.78  | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_battlezone\_2026\_02\_13\_171436](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_battlezone_2026_02_13_171436)       |
|                       | 6906.47   | sac\_atari\_arc        | [sac\_atari\_arc\_battlezone\_2026\_02\_17\_112313](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_battlezone_2026_02_17_112313)                    |
|                       | 2437      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_battlezone\_2026\_02\_01\_082425](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_battlezone_2026_02_01_082425)                    |
| ALE/BeamRider-v5      | 2761.75   | ppo\_atari\_arc        | [ppo\_atari\_arc\_beamrider\_2026\_02\_13\_171450](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_beamrider_2026_02_13_171450)                      |
|                       | 4061.05   | sac\_atari\_arc        | [sac\_atari\_arc\_beamrider\_2026\_02\_17\_110505](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_beamrider_2026_02_17_110505)                      |
|                       | 2767      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_beamrider\_2026\_02\_01\_000921](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_beamrider_2026_02_01_000921)                      |
| ALE/Berzerk-v5        | 835.46    | ppo\_atari\_arc        | [ppo\_atari\_arc\_berzerk\_2026\_02\_13\_171449](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_berzerk_2026_02_13_171449)                          |
|                       | 313.87    | sac\_atari\_arc        | [sac\_atari\_arc\_berzerk\_2026\_02\_17\_105608](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_berzerk_2026_02_17_105608)                          |
|                       | 439       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_berzerk\_2026\_02\_01\_082540](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_berzerk_2026_02_01_082540)                          |
| ALE/Bowling-v5        | 45.02     | ppo\_atari\_arc        | [ppo\_atari\_arc\_bowling\_2026\_02\_13\_230507](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_bowling_2026_02_13_230507)                          |
|                       | 26.55     | sac\_atari\_arc        | [sac\_atari\_arc\_bowling\_2026\_02\_18\_101223](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_bowling_2026_02_18_101223)                          |
|                       | 23.96     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_bowling\_2026\_02\_01\_082529](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_bowling_2026_02_01_082529)                          |
| ALE/Boxing-v5         | 92.18     | ppo\_atari\_arc        | [ppo\_atari\_arc\_boxing\_2026\_02\_13\_230504](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_boxing_2026_02_13_230504)                            |
|                       | 44.03     | sac\_atari\_arc        | [sac\_atari\_arc\_boxing\_2026\_02\_15\_201228](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_boxing_2026_02_15_201228)                            |
|                       | 1.80      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_boxing\_2026\_02\_01\_082539](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_boxing_2026_02_01_082539)                            |
| ALE/Breakout-v5       | 326.47    | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_breakout\_2026\_02\_13\_230455](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_breakout_2026_02_13_230455)           |
|                       | 20.23     | sac\_atari\_arc        | [sac\_atari\_arc\_breakout\_2026\_02\_15\_201235](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_breakout_2026_02_15_201235)                        |
|                       | 273       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_breakout\_2026\_01\_31\_213610](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_breakout_2026_01_31_213610)                        |
|                       | 4.40      | crossq\_atari          | [crossq\_atari\_breakout\_2026\_02\_25\_030241](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_breakout_2026_02_25_030241)                           |
| ALE/Carnival-v5       | 3912.59   | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_carnival\_2026\_02\_13\_230438](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_carnival_2026_02_13_230438)           |
|                       | 3501.37   | sac\_atari\_arc        | [sac\_atari\_arc\_carnival\_2026\_02\_17\_105834](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_carnival_2026_02_17_105834)                        |
|                       | 2170      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_carnival\_2026\_02\_01\_082726](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_carnival_2026_02_01_082726)                        |
| ALE/Centipede-v5      | 4780.75   | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_centipede\_2026\_02\_13\_230434](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_centipede_2026_02_13_230434)         |
|                       | 2255.45   | sac\_atari\_arc        | [sac\_atari\_arc\_centipede\_2026\_02\_18\_101425](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_centipede_2026_02_18_101425)                      |
|                       | 1382      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_centipede\_2026\_02\_01\_082643](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_centipede_2026_02_01_082643)                      |
| ALE/ChopperCommand-v5 | 5391.30   | ppo\_atari\_arc        | [ppo\_atari\_arc\_choppercommand\_2026\_02\_13\_230448](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_choppercommand_2026_02_13_230448)            |
|                       | 1036.91   | sac\_atari\_arc        | [sac\_atari\_arc\_choppercommand\_2026\_02\_17\_110523](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_choppercommand_2026_02_17_110523)            |
|                       | 2446      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_choppercommand\_2026\_02\_01\_082626](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_choppercommand_2026_02_01_082626)            |
| ALE/CrazyClimber-v5   | 112094.03 | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_crazyclimber\_2026\_02\_13\_230445](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_crazyclimber_2026_02_13_230445)   |
|                       | 75712.12  | sac\_atari\_arc        | [sac\_atari\_arc\_crazyclimber\_2026\_02\_15\_201349](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_crazyclimber_2026_02_15_201349)                |
|                       | 96943     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_crazyclimber\_2026\_02\_01\_082625](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_crazyclimber_2026_02_01_082625)                |
| ALE/Defender-v5       | 47894.69  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_defender\_2026\_02\_14\_023317](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_defender_2026_02_14_023317)           |
|                       | 4386.79   | sac\_atari\_arc        | [sac\_atari\_arc\_defender\_2026\_02\_18\_101518](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_defender_2026_02_18_101518)                        |
|                       | 33149     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_defender\_2026\_02\_01\_082658](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_defender_2026_02_01_082658)                        |
| ALE/DemonAttack-v5    | 19370.38  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_demonattack\_2026\_02\_14\_023650](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_demonattack_2026_02_14_023650)     |
|                       | 4555.58   | sac\_atari\_arc        | [sac\_atari\_arc\_demonattack\_2026\_02\_18\_101610](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_demonattack_2026_02_18_101610)                  |
|                       | 2962      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_demonattack\_2026\_02\_01\_082717](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_demonattack_2026_02_01_082717)                  |
| ALE/DoubleDunk-v5     | -3.03     | ppo\_atari\_arc        | [ppo\_atari\_arc\_doubledunk\_2026\_02\_14\_043639](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_doubledunk_2026_02_14_043639)                    |
|                       | -18.65    | sac\_atari\_arc        | [sac\_atari\_arc\_doubledunk\_2026\_02\_17\_160707](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_doubledunk_2026_02_17_160707)                    |
|                       | -1.69     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_doubledunk\_2026\_02\_01\_082901](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_doubledunk_2026_02_01_082901)                    |
| ALE/Enduro-v5         | 986.46    | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_enduro\_2026\_02\_11\_101739](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_enduro_2026_02_11_101739)               |
|                       | 45.80     | sac\_atari\_arc        | [sac\_atari\_arc\_enduro\_2026\_02\_17\_160716](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_enduro_2026_02_17_160716)                            |
|                       | 681       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_enduro\_2026\_02\_01\_001123](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_enduro_2026_02_01_001123)                            |
| ALE/FishingDerby-v5   | 25.71     | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_fishingderby\_2026\_02\_14\_024158](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_fishingderby_2026_02_14_024158)   |
|                       | -75.82    | sac\_atari\_arc        | [sac\_atari\_arc\_fishingderby\_2026\_02\_17\_160848](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_fishingderby_2026_02_17_160848)                |
|                       | -16.38    | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_fishingderby\_2026\_02\_01\_082906](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_fishingderby_2026_02_01_082906)                |
| ALE/Freeway-v5        | 32.42     | ppo\_atari\_arc        | [ppo\_atari\_arc\_freeway\_2026\_02\_14\_023359](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_freeway_2026_02_14_023359)                          |
|                       | 0.00      | sac\_atari\_arc        | [sac\_atari\_arc\_freeway\_2026\_02\_17\_161324](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_freeway_2026_02_17_161324)                          |
|                       | 23.13     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_freeway\_2026\_02\_01\_082931](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_freeway_2026_02_01_082931)                          |
| ALE/Frostbite-v5      | 284.07    | ppo\_atari\_arc        | [ppo\_atari\_arc\_frostbite\_2026\_02\_14\_024247](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_frostbite_2026_02_14_024247)                      |
|                       | 355.80    | sac\_atari\_arc        | [sac\_atari\_arc\_frostbite\_2026\_02\_17\_160759](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_frostbite_2026_02_17_160759)                      |
|                       | 266       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_frostbite\_2026\_02\_01\_082915](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_frostbite_2026_02_01_082915)                      |
| ALE/Gopher-v5         | 6500.38   | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_gopher\_2026\_02\_14\_024237](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_gopher_2026_02_14_024237)               |
|                       | 1608.59   | sac\_atari\_arc        | [sac\_atari\_arc\_gopher\_2026\_02\_17\_161047](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_gopher_2026_02_17_161047)                            |
|                       | 984       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_gopher\_2026\_02\_01\_133323](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_gopher_2026_02_01_133323)                            |
| ALE/Gravitar-v5       | 602.58    | ppo\_atari\_arc        | [ppo\_atari\_arc\_gravitar\_2026\_02\_14\_075743](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_gravitar_2026_02_14_075743)                        |
|                       | 233.02    | sac\_atari\_arc        | [sac\_atari\_arc\_gravitar\_2026\_02\_17\_160858](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_gravitar_2026_02_17_160858)                        |
|                       | 270       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_gravitar\_2026\_02\_01\_133244](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_gravitar_2026_02_01_133244)                        |
| ALE/Hero-v5           | 22477.89  | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_hero\_2026\_02\_15\_232615](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_hero_2026_02_15_232615)                   |
|                       | 4873.09   | sac\_atari\_arc        | [sac\_atari\_arc\_hero\_2026\_02\_17\_161420](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_hero_2026_02_17_161420)                                |
|                       | 18680     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_hero\_2026\_02\_01\_175903](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_hero_2026_02_01_175903)                                |
| ALE/IceHockey-v5      | -4.05     | ppo\_atari\_arc        | [ppo\_atari\_arc\_icehockey\_2026\_02\_14\_231829](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_icehockey_2026_02_14_231829)                      |
|                       | -19.78    | sac\_atari\_arc        | [sac\_atari\_arc\_icehockey\_2026\_02\_18\_101834](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_icehockey_2026_02_18_101834)                      |
|                       | -5.92     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_icehockey\_2026\_02\_01\_175745](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_icehockey_2026_02_01_175745)                      |
| ALE/Jamesbond-v5      | 710.98    | ppo\_atari\_arc        | [ppo\_atari\_arc\_jamesbond\_2026\_02\_14\_080649](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_jamesbond_2026_02_14_080649)                      |
|                       | 328.27    | sac\_atari\_arc        | [sac\_atari\_arc\_jamesbond\_2026\_02\_17\_220305](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_jamesbond_2026_02_17_220305)                      |
|                       | 460       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_jamesbond\_2026\_02\_01\_175945](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_jamesbond_2026_02_01_175945)                      |
| ALE/JourneyEscape-v5  | -1248.98  | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_journeyescape\_2026\_02\_14\_080656](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_journeyescape_2026_02_14_080656) |
|                       | -3268.80  | sac\_atari\_arc        | [sac\_atari\_arc\_journeyescape\_2026\_02\_17\_215843](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_journeyescape_2026_02_17_215843)              |
|                       | -965      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_journeyescape\_2026\_02\_01\_084415](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_journeyescape_2026_02_01_084415)              |
| ALE/Kangaroo-v5       | 10660.35  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_kangaroo\_2026\_02\_16\_030656](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_kangaroo_2026_02_16_030656)           |
|                       | 2990.74   | sac\_atari\_arc        | [sac\_atari\_arc\_kangaroo\_2026\_02\_17\_220652](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_kangaroo_2026_02_17_220652)                        |
|                       | 322       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_kangaroo\_2026\_02\_01\_084415](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_kangaroo_2026_02_01_084415)                        |
| ALE/Krull-v5          | 7874.33   | ppo\_atari\_arc        | [ppo\_atari\_arc\_krull\_2026\_02\_14\_080657](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_krull_2026_02_14_080657)                              |
|                       | 6630.02   | sac\_atari\_arc        | [sac\_atari\_arc\_krull\_2026\_02\_17\_221656](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_krull_2026_02_17_221656)                              |
|                       | 7519      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_krull\_2026\_02\_01\_084420](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_krull_2026_02_01_084420)                              |
| ALE/KungFuMaster-v5   | 28128.04  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_kungfumaster\_2026\_02\_14\_080730](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_kungfumaster_2026_02_14_080730)   |
|                       | 9932.72   | sac\_atari\_arc        | [sac\_atari\_arc\_kungfumaster\_2026\_02\_17\_221024](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_kungfumaster_2026_02_17_221024)                |
|                       | 23006     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_kungfumaster\_2026\_02\_01\_085101](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_kungfumaster_2026_02_01_085101)                |
| ALE/MsPacman-v5       | 2330.74   | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_mspacman\_2026\_02\_14\_102435](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_mspacman_2026_02_14_102435)           |
|                       | 1336.96   | sac\_atari\_arc        | [sac\_atari\_arc\_mspacman\_2026\_02\_17\_221523](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_mspacman_2026_02_17_221523)                        |
|                       | 2110      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_mspacman\_2026\_02\_01\_001100](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_mspacman_2026_02_01_001100)                        |
|                       | 327.79    | crossq\_atari          | [crossq\_atari\_mspacman\_2026\_02\_23\_171317](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_mspacman_2026_02_23_171317)                           |
| ALE/NameThisGame-v5   | 6879.23   | ppo\_atari\_arc        | [ppo\_atari\_arc\_namethisgame\_2026\_02\_14\_103319](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_namethisgame_2026_02_14_103319)                |
|                       | 3992.71   | sac\_atari\_arc        | [sac\_atari\_arc\_namethisgame\_2026\_02\_17\_220905](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_namethisgame_2026_02_17_220905)                |
|                       | 5412      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_namethisgame\_2026\_02\_01\_132733](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_namethisgame_2026_02_01_132733)                |
| ALE/Phoenix-v5        | 13923.26  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_phoenix\_2026\_02\_14\_102636](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_phoenix_2026_02_14_102636)             |
|                       | 3958.46   | sac\_atari\_arc        | [sac\_atari\_arc\_phoenix\_2026\_02\_17\_222102](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_phoenix_2026_02_17_222102)                          |
|                       | 5635      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_phoenix\_2026\_02\_01\_085101](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_phoenix_2026_02_01_085101)                          |
| ALE/Pong-v5           | 16.69     | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_pong\_2026\_02\_14\_103722](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_pong_2026_02_14_103722)                   |
|                       | 10.89     | sac\_atari\_arc        | [sac\_atari\_arc\_pong\_2026\_02\_17\_160429](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_pong_2026_02_17_160429)                                |
|                       | 10.17     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_pong\_2026\_01\_31\_213635](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_pong_2026_01_31_213635)                                |
|                       | -20.59    | crossq\_atari          | [crossq\_atari\_pong\_2026\_02\_23\_171158](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_pong_2026_02_23_171158)                                   |
| ALE/Pooyan-v5         | 5308.66   | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_pooyan\_2026\_02\_14\_114730](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_pooyan_2026_02_14_114730)               |
|                       | 2530.78   | sac\_atari\_arc        | [sac\_atari\_arc\_pooyan\_2026\_02\_17\_220346](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_pooyan_2026_02_17_220346)                            |
|                       | 2997      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_pooyan\_2026\_02\_01\_132748](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_pooyan_2026_02_01_132748)                            |
| ALE/Qbert-v5          | 15460.48  | ppo\_atari\_arc        | [ppo\_atari\_arc\_qbert\_2026\_02\_14\_120409](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_qbert_2026_02_14_120409)                              |
|                       | 3331.98   | sac\_atari\_arc        | [sac\_atari\_arc\_qbert\_2026\_02\_17\_223117](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_qbert_2026_02_17_223117)                              |
|                       | 12619     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_qbert\_2026\_01\_31\_213720](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_qbert_2026_01_31_213720)                              |
|                       | 3189.73   | crossq\_atari          | [crossq\_atari\_qbert\_2026\_02\_25\_030458](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_qbert_2026_02_25_030458)                                 |
| ALE/Riverraid-v5      | 9599.75   | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_riverraid\_2026\_02\_14\_124700](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_riverraid_2026_02_14_124700)         |
|                       | 4744.95   | sac\_atari\_arc        | [sac\_atari\_arc\_riverraid\_2026\_02\_18\_014310](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_riverraid_2026_02_18_014310)                      |
|                       | 6558      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_riverraid\_2026\_02\_01\_132507](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_riverraid_2026_02_01_132507)                      |
| ALE/RoadRunner-v5     | 37980.95  | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_roadrunner\_2026\_02\_14\_124844](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_roadrunner_2026_02_14_124844)       |
|                       | 25975.39  | sac\_atari\_arc        | [sac\_atari\_arc\_roadrunner\_2026\_02\_18\_015052](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_roadrunner_2026_02_18_015052)                    |
|                       | 29810     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_roadrunner\_2026\_02\_01\_132509](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_roadrunner_2026_02_01_132509)                    |
| ALE/Robotank-v5       | 21.04     | ppo\_atari\_arc        | [ppo\_atari\_arc\_robotank\_2026\_02\_14\_124751](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_robotank_2026_02_14_124751)                        |
|                       | 9.01      | sac\_atari\_arc        | [sac\_atari\_arc\_robotank\_2026\_02\_18\_032313](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_robotank_2026_02_18_032313)                        |
|                       | 2.80      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_robotank\_2026\_02\_01\_132434](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_robotank_2026_02_01_132434)                        |
| ALE/Seaquest-v5       | 1775.14   | ppo\_atari\_arc        | [ppo\_atari\_arc\_seaquest\_2026\_02\_11\_095444](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_seaquest_2026_02_11_095444)                        |
|                       | 1565.44   | sac\_atari\_arc        | [sac\_atari\_arc\_seaquest\_2026\_02\_18\_020822](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_seaquest_2026_02_18_020822)                        |
|                       | 850       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_seaquest\_2026\_02\_01\_001001](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_seaquest_2026_02_01_001001)                        |
|                       | 234.63    | crossq\_atari          | [crossq\_atari\_seaquest\_2026\_02\_25\_030441](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_seaquest_2026_02_25_030441)                           |
| ALE/Skiing-v5         | -28217.28 | ppo\_atari\_arc        | [ppo\_atari\_arc\_skiing\_2026\_02\_14\_174807](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_skiing_2026_02_14_174807)                            |
|                       | -17464.22 | sac\_atari\_arc        | [sac\_atari\_arc\_skiing\_2026\_02\_18\_024444](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_skiing_2026_02_18_024444)                            |
|                       | -14235    | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_skiing\_2026\_02\_01\_132451](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_skiing_2026_02_01_132451)                            |
| ALE/Solaris-v5        | 2212.78   | ppo\_atari\_arc        | [ppo\_atari\_arc\_solaris\_2026\_02\_14\_124751](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_solaris_2026_02_14_124751)                          |
|                       | 1803.74   | sac\_atari\_arc        | [sac\_atari\_arc\_solaris\_2026\_02\_18\_031943](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_solaris_2026_02_18_031943)                          |
|                       | 2224      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_solaris\_2026\_02\_01\_212137](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_solaris_2026_02_01_212137)                          |
| ALE/SpaceInvaders-v5  | 892.49    | ppo\_atari\_arc        | [ppo\_atari\_arc\_spaceinvaders\_2026\_02\_14\_131114](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_spaceinvaders_2026_02_14_131114)              |
|                       | 507.33    | sac\_atari\_arc        | [sac\_atari\_arc\_spaceinvaders\_2026\_02\_18\_033139](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_spaceinvaders_2026_02_18_033139)              |
|                       | 784       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_spaceinvaders\_2026\_02\_01\_000950](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_spaceinvaders_2026_02_01_000950)              |
|                       | 404.50    | crossq\_atari          | [crossq\_atari\_spaceinvaders\_2026\_02\_25\_030410](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_atari_spaceinvaders_2026_02_25_030410)                 |
| ALE/StarGunner-v5     | 49328.73  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_stargunner\_2026\_02\_14\_131149](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_stargunner_2026_02_14_131149)       |
|                       | 4295.97   | sac\_atari\_arc        | [sac\_atari\_arc\_stargunner\_2026\_02\_18\_033151](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_stargunner_2026_02_18_033151)                    |
|                       | 8665      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_stargunner\_2026\_02\_01\_132406](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_stargunner_2026_02_01_132406)                    |
| ALE/Surround-v5       | -4.47     | ppo\_atari\_arc        | [ppo\_atari\_arc\_surround\_2026\_02\_14\_132941](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_surround_2026_02_14_132941)                        |
|                       | -9.87     | sac\_atari\_arc        | [sac\_atari\_arc\_surround\_2026\_02\_18\_034423](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_surround_2026_02_18_034423)                        |
|                       | -9.72     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_surround\_2026\_02\_01\_132215](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_surround_2026_02_01_132215)                        |
| ALE/Tennis-v5         | -12.27    | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_tennis\_2026\_02\_14\_173639](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_tennis_2026_02_14_173639)               |
|                       | -397.44   | sac\_atari\_arc        | [sac\_atari\_arc\_tennis\_2026\_02\_18\_032540](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_tennis_2026_02_18_032540)                            |
|                       | -2873     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_tennis\_2026\_02\_01\_175829](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_tennis_2026_02_01_175829)                            |
| ALE/TimePilot-v5      | 4432.73   | ppo\_atari\_arc        | [ppo\_atari\_arc\_timepilot\_2026\_02\_14\_173642](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_timepilot_2026_02_14_173642)                      |
|                       | 3164.97   | sac\_atari\_arc        | [sac\_atari\_arc\_timepilot\_2026\_02\_18\_102038](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_timepilot_2026_02_18_102038)                      |
|                       | 3376      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_timepilot\_2026\_02\_01\_175930](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_timepilot_2026_02_01_175930)                      |
| ALE/Tutankham-v5      | 210.87    | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_tutankham\_2026\_02\_14\_173722](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_tutankham_2026_02_14_173722)         |
|                       | 147.25    | sac\_atari\_arc        | [sac\_atari\_arc\_tutankham\_2026\_02\_18\_102729](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_tutankham_2026_02_18_102729)                      |
|                       | 167       | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_tutankham\_2026\_02\_01\_132347](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_tutankham_2026_02_01_132347)                      |
| ALE/UpNDown-v5        | 147168.80 | ppo\_atari\_lam85\_arc | [ppo\_atari\_lam85\_arc\_upndown\_2026\_02\_15\_232448](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam85_arc_upndown_2026_02_15_232448)             |
|                       | 3351.89   | sac\_atari\_arc        | [sac\_atari\_arc\_upndown\_2026\_02\_18\_135442](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_upndown_2026_02_18_135442)                          |
|                       | 57099     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_upndown\_2026\_02\_01\_132435](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_upndown_2026_02_01_132435)                          |
| ALE/VideoPinball-v5   | 38370.30  | ppo\_atari\_lam70\_arc | [ppo\_atari\_lam70\_arc\_videopinball\_2026\_02\_14\_173728](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_lam70_arc_videopinball_2026_02_14_173728)   |
|                       | 21088.68  | sac\_atari\_arc        | [sac\_atari\_arc\_videopinball\_2026\_02\_18\_141245](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_videopinball_2026_02_18_141245)                |
|                       | 25310     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_videopinball\_2026\_02\_01\_083457](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_videopinball_2026_02_01_083457)                |
| ALE/WizardOfWor-v5    | 6100.42   | ppo\_atari\_arc        | [ppo\_atari\_arc\_wizardofwor\_2026\_02\_14\_173945](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_wizardofwor_2026_02_14_173945)                  |
|                       | 1241.92   | sac\_atari\_arc        | [sac\_atari\_arc\_wizardofwor\_2026\_02\_18\_140750](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_wizardofwor_2026_02_18_140750)                  |
|                       | 2682      | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_wizardofwor\_2026\_02\_01\_132449](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_wizardofwor_2026_02_01_132449)                  |
| ALE/YarsRevenge-v5    | 12873.91  | ppo\_atari\_arc        | [ppo\_atari\_arc\_yarsrevenge\_2026\_02\_14\_174019](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_yarsrevenge_2026_02_14_174019)                  |
|                       | 13710.18  | sac\_atari\_arc        | [sac\_atari\_arc\_yarsrevenge\_2026\_02\_18\_134921](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_yarsrevenge_2026_02_18_134921)                  |
|                       | 24371     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_yarsrevenge\_2026\_02\_01\_132224](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_yarsrevenge_2026_02_01_132224)                  |
| ALE/Zaxxon-v5         | 9523.49   | ppo\_atari\_arc        | [ppo\_atari\_arc\_zaxxon\_2026\_02\_14\_174806](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_atari_arc_zaxxon_2026_02_14_174806)                            |
|                       | 3205.98   | sac\_atari\_arc        | [sac\_atari\_arc\_zaxxon\_2026\_02\_18\_135502](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_atari_arc_zaxxon_2026_02_18_135502)                            |
|                       | 29.46     | a2c\_gae\_atari\_arc   | [a2c\_gae\_atari\_zaxxon\_2026\_02\_01\_131758](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/a2c_gae_atari_zaxxon_2026_02_01_131758)                            |

**Skipped**: Adventure, MontezumaRevenge, Pitfall, PrivateEye, Venture (hard exploration), ElevatorAction (deprecated env)

### Training Curves

A2C vs PPO vs SAC mean returns (moving average) vs training frames. Shaded regions show standard deviation across 4 sessions.

|                                                                                                                                                                           |                                                                                                                                                                             |                                                                                                                                                                         |
| :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|       ![AirRaid](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/AirRaid-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |          ![Alien](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Alien-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)          |       ![Amidar](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Amidar-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |
|       ![Assault](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Assault-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |        ![Asterix](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Asterix-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        |    ![Asteroids](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Asteroids-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)    |
|      ![Atlantis](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Atlantis-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |      ![BankHeist](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/BankHeist-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |   ![BattleZone](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/BattleZone-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |
|     ![BeamRider](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/BeamRider-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     |        ![Berzerk](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Berzerk-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        |      ![Bowling](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Bowling-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |
|        ![Boxing](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Boxing-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        |       ![Breakout](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Breakout-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |     ![Carnival](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Carnival-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     |
|     ![Centipede](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Centipede-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     | ![ChopperCommand](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/ChopperCommand-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218) | ![CrazyClimber](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/CrazyClimber-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218) |
|      ![Defender](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Defender-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |    ![DemonAttack](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/DemonAttack-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)    |   ![DoubleDunk](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/DoubleDunk-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |
|        ![Enduro](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Enduro-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        |   ![FishingDerby](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/FishingDerby-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |      ![Freeway](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Freeway-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |
|     ![Frostbite](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Frostbite-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     |         ![Gopher](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Gopher-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)         |     ![Gravitar](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Gravitar-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     |
|          ![Hero](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Hero-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)          |      ![IceHockey](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/IceHockey-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |    ![Jamesbond](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Jamesbond-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)    |
| ![JourneyEscape](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/JourneyEscape-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218) |       ![Kangaroo](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Kangaroo-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |        ![Krull](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Krull-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        |
|  ![KungFuMaster](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/KungFuMaster-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)  |       ![MsPacman](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/MsPacman-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       | ![NameThisGame](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/NameThisGame-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218) |
|       ![Phoenix](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Phoenix-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |           ![Pong](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Pong-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)           |       ![Pooyan](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Pooyan-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |
|         ![Qbert](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Qbert-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)         |      ![Riverraid](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Riverraid-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |   ![RoadRunner](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/RoadRunner-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |
|      ![Robotank](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Robotank-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |       ![Seaquest](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Seaquest-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |       ![Skiing](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Skiing-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |
|       ![Solaris](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Solaris-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |  ![SpaceInvaders](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/SpaceInvaders-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)  |   ![StarGunner](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/StarGunner-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |
|      ![Surround](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Surround-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)      |         ![Tennis](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Tennis-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)         |    ![TimePilot](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/TimePilot-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)    |
|     ![Tutankham](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Tutankham-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)     |        ![UpNDown](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/UpNDown-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)        | ![VideoPinball](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/VideoPinball-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218) |
|   ![WizardOfWor](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/WizardOfWor-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)   |    ![YarsRevenge](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/YarsRevenge-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)    |       ![Zaxxon](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Zaxxon-v5_multi_trial_graph_mean_returns_ma_vs_frames.png?v=20260218)       |

***

## Historical Results (v4)

<details>

<summary>OpenAI Gym Atari Results (v4) - click to expand</summary>

{% hint style="warning" %}
**Deprecated Environments:** These v4 results used OpenAI Gym `NoFrameskip-v4` environments (no sticky actions). Gymnasium ALE v5 environments are harder due to sticky action probability. Results are not directly comparable.
{% endhint %}

* [Upload PR #427](https://github.com/kengz/SLM-Lab/pull/427)
* [Google Drive data: DQN](https://drive.google.com/file/d/1taFdNmrL535zJ4V7wRNORwkH_mgSoiYz/view?usp=sharing)
* [Google Drive data: DDQN+PER](https://drive.google.com/file/d/1PrMn-qvh51szKm-AbFdBphXDicDoRO0d/view?usp=sharing)
* [Google Drive data: A2C (GAE)](https://drive.google.com/file/d/10T7ehim0cGfWxWZkMHjNG5m1sph0OuHb/view?usp=sharing)
* [Google Drive data: A2C (n-step)](https://drive.google.com/file/d/17v_PVkxucFtVzm2MW9nWAQv7mn76Ukc3/view?usp=sharing)
* [Google Drive data: PPO](https://drive.google.com/file/d/1CQdF_jBZKNL58cDIMvIlvLLY-LPj01nv/view?usp=sharing)
* [Google Drive data: all Atari Graphs](https://drive.google.com/file/d/11g9pC-MEzIuRYOoqvyqIxn4GtXpIvdau/view?usp=sharing)

|      Env. \ Alg. |   DQN   |  DDQN+PER  |  A2C (GAE)  | A2C (n-step) |     PPO    |
| ---------------: | :-----: | :--------: | :---------: | :----------: | :--------: |
|        Adventure |  -0.94  |    -0.92   |    -0.77    |     -0.85    |  **-0.3**  |
|          AirRaid |   1876  |    3974    |   **4202**  |     3557     |    4028    |
|            Alien |   822   |    1574    |     1519    |   **1627**   |    1413    |
|           Amidar |  90.95  |     431    |     577     |      418     |   **795**  |
|          Assault |   1392  |    2567    |     3366    |     3312     |  **3619**  |
|          Asterix |   1253  |  **6866**  |     5559    |     5223     |    6132    |
|        Asteroids |   439   |     426    |   **2951**  |     2147     |    2186    |
|         Atlantis |  68679  |   644810   | **2747371** |    2259733   |   2148077  |
|        BankHeist |   131   |     623    |     855     |     1170     |  **1183**  |
|       BattleZone |   6564  |    6395    |     4336    |     4533     |  **13649** |
|        BeamRider |   2799  |  **5870**  |     2659    |     4139     |    4299    |
|          Berzerk |   319   |     401    |   **1073**  |      763     |     860    |
|          Bowling |  30.29  |  **39.5**  |    24.51    |     23.75    |    31.64   |
|           Boxing |  72.11  |    90.98   |     1.57    |     1.26     |  **96.53** |
|         Breakout |  80.88  |     182    |     377     |      398     |   **443**  |
|         Carnival |   4280  |  **4773**  |     2473    |     1827     |    4566    |
|        Centipede |   1899  |    2153    |     3909    |     4202     |  **5003**  |
|   ChopperCommand |   1083  |  **4020**  |     3043    |     1280     |    3357    |
|     CrazyClimber |  46984  |    88814   |    106256   |    109998    | **116820** |
|         Defender |  281999 |   313018   |  **665609** |    657823    |   534639   |
|      DemonAttack |   1705  |    19856   |    23779    |     19615    | **121172** |
|       DoubleDunk |  -21.44 |   -22.38   |  **-5.15**  |     -13.3    |    -6.01   |
|   ElevatorAction |  32.62  |    17.91   |   **9966**  |     8818     |    6471    |
|           Enduro |   437   |     959    |     787     |      0.0     |  **1926**  |
|     FishingDerby |  -88.14 |    -1.7    |    16.54    |     1.65     |  **36.03** |
|          Freeway |  24.46  |    30.49   |    30.97    |      0.0     |  **32.11** |
|        Frostbite |   98.8  |  **2497**  |     277     |      261     |    1062    |
|           Gopher |   1095  |  **7562**  |     929     |     1545     |    2933    |
|         Gravitar |  87.34  |     258    |     313     |    **433**   |     223    |
|             Hero |   1051  |    12579   |    16502    |   **19322**  |    17412   |
|        IceHockey |  -14.96 |   -14.24   |  **-5.79**  |     -6.06    |    -6.43   |
|        Jamesbond |  44.87  |   **702**  |     521     |      453     |     561    |
|    JourneyEscape |  -4818  |    -2003   |   **-921**  |     -2032    |    -1094   |
|         Kangaroo |   1965  |  **8897**  |    67.62    |      554     |    4989    |
|            Krull |   5522  |    6650    |     7785    |     6642     |  **8477**  |
|     KungFuMaster |   2288  |    16547   |    31199    |     25554    |  **34523** |
| MontezumaRevenge |   0.0   |    0.02    |     0.08    |     0.19     |  **1.08**  |
|         MsPacman |   1175  |    2215    |     1965    |     2158     |  **2350**  |
|     NameThisGame |   3915  |    4474    |     5178    |     5795     |  **6386**  |
|          Phoenix |   2909  |    8179    |    16345    |     13586    |  **30504** |
|          Pitfall |  -68.83 |   -73.65   |     -101    |  **-31.13**  |   -35.93   |
|             Pong |  18.48  |    20.5    |    19.31    |     19.56    |  **20.58** |
|           Pooyan |   1958  |    2741    |     2862    |     2531     |  **6799**  |
|       PrivateEye | **784** |     303    |    93.22    |     78.07    |    50.12   |
|            Qbert |   5494  |    11426   |    12405    |   **13590**  |    13460   |
|        Riverraid |   953   |  **10492** |     8308    |     7565     |    9636    |
|       RoadRunner |  15237  |    29047   |    30152    |     31030    |  **32956** |
|         Robotank |   3.43  |  **9.05**  |     2.98    |     2.27     |    2.27    |
|         Seaquest |   1185  |  **4405**  |     1070    |     1684     |    1715    |
|           Skiing |  -14094 | **-12883** |    -19481   |    -14234    |   -24713   |
|          Solaris |   612   |    1396    |     2115    |   **2236**   |    1892    |
|    SpaceInvaders |   451   |     670    |     733     |      750     |   **797**  |
|       StarGunner |   3565  |    38238   |    44816    |     48410    |  **60579** |
|           Tennis |  -23.78 | **-10.33** |    -22.42   |    -19.06    |   -11.52   |
|        TimePilot |   2819  |    1884    |     3331    |     3440     |  **4398**  |
|        Tutankham |  35.03  |     159    |     161     |      175     |   **211**  |
|          UpNDown |   2043  |    11632   |    89769    |     18878    | **262208** |
|          Venture |   4.56  |    9.61    |     0.0     |      0.0     |  **11.84** |
|     VideoPinball |   8056  |  **79730** |    35371    |     40423    |    58096   |
|      WizardOfWor |   869   |     328    |     1516    |     1247     |  **4283**  |
|      YarsRevenge |   5816  |    15698   |  **27097**  |     11742    |    10114   |
|           Zaxxon |   442   |    54.28   |    64.72    |     24.7     |   **641**  |

> The table above presents results for 62 Atari games. All agents were trained for 10M frames (40M including skipped frames). Reported results are the episode score at the end of training, averaged over the previous 100 evaluation checkpoints with each checkpoint averaged over 4 Sessions.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://slm-lab.gitbook.io/slm-lab/benchmark-results/atari-benchmark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
