# Public Benchmark Data

## Overview

All SLM Lab benchmark results are publicly available on HuggingFace:

{% embed url="<https://huggingface.co/datasets/SLM-Lab/benchmark>" %}

Each experiment includes:

* **Trained models** — PyTorch checkpoints (`*_ckpt-best_net_model.pt`)
* **Training curves** — Full learning history (`*_session_df_{train,eval}.csv`)
* **Specs** — Exact configurations for reproduction (`*_spec.yaml`)
* **Graphs** — Plotly visualizations (PNG and HTML)

## Algorithm Coverage

Which algorithms are benchmarked in each environment category. ✓ = benchmarked.

| Algorithm     | Classic Control | Box2D | MuJoCo | Atari | Playground |
| ------------- | :-------------: | :---: | :----: | :---: | :--------: |
| **REINFORCE** |        ✓        |       |        |       |            |
| **SARSA**     |        ✓        |       |        |       |            |
| **DQN**       |        ✓        |   ✓   |        |       |            |
| **DDQN+PER**  |        ✓        |   ✓   |        |       |            |
| **A2C**       |        ✓        |   ✓   |        |   ✓   |            |
| **PPO**       |        ✓        |   ✓   |    ✓   |   ✓   |      ✓     |
| **SAC**       |        ✓        |   ✓   |    ✓   |   ✓   |            |
| **CrossQ**    |        ✓        |   ✓   |    ✓   |   ✓   |            |

| Category            | Environments                                                         | Algorithms                                             |
| ------------------- | -------------------------------------------------------------------- | ------------------------------------------------------ |
| **Classic Control** | CartPole-v1, Acrobot-v1, Pendulum-v1                                 | REINFORCE, SARSA, DQN, DDQN+PER, A2C, PPO, SAC, CrossQ |
| **Box2D**           | LunarLander-v3 (discrete & continuous)                               | DQN, DDQN+PER, A2C, PPO, SAC, CrossQ                   |
| **MuJoCo**          | 11 environments (Hopper, HalfCheetah, Humanoid, etc.)                | PPO, SAC, CrossQ                                       |
| **Atari**           | 57 games (6 hard-exploration skipped)                                | A2C, PPO, SAC, CrossQ                                  |
| **Playground**      | 54 MuJoCo Playground environments (DM Control, Robots, Manipulation) | PPO                                                    |

### Detailed Results

| Benchmark       | Page                                                                       | Environments                                                     |
| --------------- | -------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| Classic + Box2D | [Discrete Benchmark](/slm-lab/benchmark-results/discrete-benchmark.md)     | CartPole, Acrobot, Pendulum, LunarLander                         |
| MuJoCo          | [Continuous Benchmark](/slm-lab/benchmark-results/continuous-benchmark.md) | Hopper, HalfCheetah, Humanoid, etc.                              |
| Atari           | [Atari Benchmark](/slm-lab/benchmark-results/atari-benchmark.md)           | 57 games                                                         |
| Playground      | [Playground Benchmark](/slm-lab/benchmark-results/playground-benchmark.md) | 54 MuJoCo Playground envs (DM Control, Locomotion, Manipulation) |

## Accessing Results

### List and Download

```bash
# Set HuggingFace repo
echo 'HF_REPO=SLM-Lab/benchmark' >> .env

# List all experiments
source .env && slm-lab list

# Download an experiment
source .env && slm-lab pull ppo_hopper
```

{% hint style="info" %}
**No token needed for read-only access.** `HF_TOKEN` is only required for uploading to your own repo.
{% endhint %}

### Replay a Trained Agent

```bash
slm-lab run SPEC_FILE SPEC_NAME enjoy@data/FOLDER/SPEC_NAME_t0_spec.yaml
```

### Browse on HuggingFace

Direct links to experiment folders (example):

* [ppo\_cartpole\_arc\_2026\_02\_11](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_cartpole_arc_2026_02_11_144029)
* [ppo\_hopper\_arc\_2026\_01\_31](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_hopper_2026_01_31_105438)

## Methodology

### How Scores Are Reported

1. **Trial** = 4 Sessions with different random seeds
2. **Session** = One complete training run
3. **Score** = Final 100-checkpoint moving average (`total_reward_ma`)

The trial score is the mean across 4 sessions.

### Environment Settings

Standardized settings for fair comparison:

| Category        | num\_envs | max\_frame | log\_frequency | ASHA grace\_period |
| --------------- | --------- | ---------- | -------------- | ------------------ |
| Classic Control | 4         | 2e5-3e5    | 500            | 1e4                |
| Box2D           | 8         | 3e5        | 1000           | 5e4                |
| MuJoCo          | 16        | 1e6-10e6   | 1e4            | 1e5-1e6            |
| Atari           | 16        | 10e6       | 10000          | 5e5                |
| Playground      | 2048      | 100e6      | 1e4            | —                  |

### Hardware Requirements

| Category        | GPU Required | Typical Runtime | Recommendation         |
| --------------- | ------------ | --------------- | ---------------------- |
| Classic Control | No           | Minutes         | Local CPU is fine      |
| Box2D           | Optional     | 10-30 min       | Local or remote        |
| MuJoCo          | Yes          | 1-4 hours       | Use `run-remote --gpu` |
| Atari           | Yes          | 2-3 hours       | Use `run-remote --gpu` |
| Playground      | Yes (CUDA)   | 1-6 hours       | Use `run-remote --gpu` |

{% hint style="info" %}
**Cloud GPUs recommended for MuJoCo and Atari.** Cloud L4/A10G via [dstack](https://dstack.ai) is faster and often cheaper than local training. See [Remote Training](/slm-lab/using-slm-lab/remote-training.md) for setup.
{% endhint %}

## Contributing Benchmarks

Follow these steps when adding or updating benchmark results.

### 1. Audit Spec Settings

Ensure your `spec.yaml` matches the **Settings** line in the [benchmark tables](https://github.com/kengz/SLM-Lab/blob/master/docs/BENCHMARKS.md). Example: `max_frame 3e5 | num_envs 4 | max_session 4 | log_frequency 500`.

### 2. Run Benchmark and Commit Specs

```bash
# Local (Classic Control: minutes)
slm-lab run SPEC_FILE SPEC_NAME train

# Remote (MuJoCo, Atari: hours)
source .env && slm-lab run-remote --gpu SPEC_FILE SPEC_NAME train -n NAME
```

Always commit the `spec.yaml` file to the repo after a successful run.

### 3. Record Scores and Plots

* Extract `total_reward_ma` from logs (`trial_metrics`)
* Add HuggingFace folder link to the benchmark table
* Generate plots:

```bash
slm-lab plot -t "CartPole-v1" -f ppo_cartpole_2026...,dqn_cartpole_2026...
```

### Hyperparameter Search

When an algorithm fails to reach target, run search before the final validation:

```bash
slm-lab run SPEC_FILE SPEC_NAME search                                        # local
source .env && slm-lab run-remote --gpu SPEC_FILE SPEC_NAME search -n NAME    # remote
```

| Stage    | Mode     | Config                                      | Purpose                              |
| -------- | -------- | ------------------------------------------- | ------------------------------------ |
| ASHA     | `search` | `max_session=1`, `search_scheduler` enabled | Wide exploration with early stopping |
| Multi    | `search` | `max_session=4`, NO `search_scheduler`      | Robust validation with averaging     |
| Validate | `train`  | Final spec                                  | Confirmation run                     |

Search budget: \~3-4 trials per dimension (8 trials = 2-3 dims, 16 = 3-4 dims).

{% hint style="warning" %}
Only use final validation runs (not search results) for benchmark tables. Search is for hyperparameter discovery; validation confirms with committed specs.
{% endhint %}

## Reproducibility

Every experiment can be exactly reproduced:

```bash
# 1. Download the experiment
source .env && slm-lab pull ppo_hopper

# 2. Check the spec for settings and git SHA
cat data/ppo_hopper_arc_*/ppo_hopper_arc_t0_spec.yaml

# 3. Replay the trained model
slm-lab run slm_lab/spec/benchmark_arc/ppo/ppo_hopper_arc.yaml ppo_hopper_arc enjoy@data/ppo_hopper_arc_*/ppo_hopper_arc_t0_spec.yaml
```

For exact code version, checkout the git SHA recorded in the spec file.

## Using Your Own HuggingFace Repo

```bash
# .env
HF_TOKEN=hf_xxxxxxxxxxxx
HF_REPO=your-username/your-repo
```

```bash
source .env && slm-lab push data/my_experiment_2026_01_30_221924
```

## Historical Data (v4)

v4 benchmarks used OpenAI Gym and Roboschool (both deprecated). Available for historical reference:

* [All v4 benchmark data (Google Drive)](https://drive.google.com/drive/folders/1fUB3jRvXr8ySZMSW5w0GPWJe3QmM7tb3?usp=sharing)

{% hint style="warning" %}
**Not directly comparable:** v4 and v5 use different environment versions with different reward scales and physics. See [Changelog](/slm-lab/resources/changelog.md) for details.
{% endhint %}

## Terminology

| Abbreviation | Meaning                            |
| ------------ | ---------------------------------- |
| A2C          | Advantage Actor-Critic             |
| CrossQ       | Cross-batch Normalized Q-learning  |
| DDQN         | Double Deep Q-Network              |
| DQN          | Deep Q-Network                     |
| GAE          | Generalized Advantage Estimation   |
| MA           | Moving Average                     |
| MJWarp       | Warp-accelerated MJX (GPU physics) |
| PER          | Prioritized Experience Replay      |
| PPO          | Proximal Policy Optimization       |
| SAC          | Soft Actor-Critic                  |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://slm-lab.gitbook.io/slm-lab/benchmark-results/public-benchmark-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
