# Continuous Benchmark

## MuJoCo Benchmark Results

SLM Lab v5.2 validates PPO, SAC, and CrossQ on [Gymnasium MuJoCo environments](https://gymnasium.farama.org/environments/mujoco/). MuJoCo (Multi-Joint dynamics with Contact) provides physics simulation for continuous control tasks ranging from simple pendulums to complex humanoid locomotion.

Results below are from January–March 2026 benchmark runs using MuJoCo v5 environments.

All trained models and metrics are publicly available on [HuggingFace](https://huggingface.co/datasets/SLM-Lab/benchmark).

### Methodology

Results show **Trial-level** performance:

1. **Trial** = 4 Sessions with different random seeds
2. **Session** = One complete training run
3. **Score** = Final 100-checkpoint moving average (`total_reward_ma`)

The trial score is the mean across 4 sessions, providing statistically meaningful results.

### Standardized Settings

| Category | num\_envs | max\_frame | log\_frequency | ASHA grace\_period |
| -------- | --------- | ---------- | -------------- | ------------------ |
| MuJoCo   | 16        | 4e6-10e6   | 10000          | 1e5-1e6            |

The `grace_period` is the minimum frames before ASHA early stopping can terminate underperforming trials.

**Algorithms**: PPO, SAC, and CrossQ. Network: MLP \[256,256], orthogonal init. PPO uses tanh activation; SAC and CrossQ use relu. CrossQ uses Batch Renormalization in critics (no target networks).

**Note on frame budgets**: SAC uses higher update-to-data ratios, making it more sample-efficient but slower per frame than PPO (1-4M frames vs PPO's 4-10M). CrossQ uses UTD=1 (like PPO) but eliminates target network overhead, achieving \~700 fps — its frame budgets (3-7.5M) reflect this speed advantage. Scores may still be improving at cutoff.

{% hint style="warning" %}
**v5 vs v4 Difficulty:** Gymnasium MuJoCo v5 environments are significantly harder than v4. Key changes include:

* Updated physics engine with more accurate contact dynamics
* Revised reward functions with stricter success criteria
* Termination conditions more closely match real-world failure modes

Expect **10-30% lower scores** compared to v4 benchmarks. See [Gymnasium Migration Guide](https://gymnasium.farama.org/content/migration-guide/) for details.
{% endhint %}

### Spec Files

**Spec Files** (one file per algorithm, all envs via YAML anchors):

* **PPO**: [ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml)
* **SAC**: [sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml)
* **CrossQ**: [crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)

**Spec Variants**: Each file has a base config (shared via YAML anchors) with per-env overrides:

| SPEC\_NAME                    | Envs                                           | Key Config                                         |
| ----------------------------- | ---------------------------------------------- | -------------------------------------------------- |
| ppo\_mujoco\_arc              | HalfCheetah, Walker, Humanoid, HumanoidStandup | Base: gamma=0.99, lam=0.95, lr=3e-4                |
| ppo\_mujoco\_longhorizon\_arc | Reacher, Pusher                                | gamma=0.997, lam=0.97, lr=2e-4, entropy=0.001      |
| ppo\_{env}\_arc               | Ant, Hopper, Swimmer, IP, IDP                  | Per-env tuned (gamma, lam, lr)                     |
| sac\_mujoco\_arc              | (generic, use with -s flags)                   | Base: gamma=0.99, iter=4, lr=3e-4, \[256,256]      |
| sac\_{env}\_arc               | All 11 envs                                    | Per-env tuned (iter, gamma, lr, net size)          |
| crossq\_mujoco                | (generic base)                                 | Base: gamma=0.99, iter=1, lr=1e-3, policy\_delay=3 |
| crossq\_{env}                 | All 11 envs                                    | Per-env tuned (critic width, actor LN)             |

### Running Benchmarks

**Reproduce**: Copy `SPEC_NAME` and `MAX_FRAME` from the table below.

```bash
# PPO: env and max_frame are parameterized via -s flags
source .env && slm-lab run-remote --gpu -s env=ENV -s max_frame=MAX_FRAME \
  slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml SPEC_NAME train -n NAME

# SAC: env and max_frame are hardcoded per spec — no -s flags needed
source .env && slm-lab run-remote --gpu \
  slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml SPEC_NAME train -n NAME

# CrossQ: env and max_frame are hardcoded per spec — no -s flags needed
source .env && slm-lab run-remote --gpu \
  slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml SPEC_NAME train -n NAME
```

| ENV                       | SPEC\_NAME                           | MAX\_FRAME |
| ------------------------- | ------------------------------------ | ---------- |
| Ant-v5                    | ppo\_ant\_arc                        | 10e6       |
|                           | sac\_ant\_arc                        | 2e6        |
|                           | crossq\_ant                          | 3e6        |
| HalfCheetah-v5            | ppo\_mujoco\_arc                     | 10e6       |
|                           | sac\_halfcheetah\_arc                | 4e6        |
| Hopper-v5                 | ppo\_hopper\_arc                     | 4e6        |
|                           | sac\_hopper\_arc                     | 3e6        |
| Humanoid-v5               | ppo\_mujoco\_arc                     | 10e6       |
|                           | sac\_humanoid\_arc                   | 1e6        |
| HumanoidStandup-v5        | ppo\_mujoco\_arc                     | 4e6        |
|                           | sac\_humanoid\_standup\_arc          | 1e6        |
| InvertedDoublePendulum-v5 | ppo\_inverted\_double\_pendulum\_arc | 10e6       |
|                           | sac\_inverted\_double\_pendulum\_arc | 2e6        |
| InvertedPendulum-v5       | ppo\_inverted\_pendulum\_arc         | 4e6        |
|                           | sac\_inverted\_pendulum\_arc         | 2e6        |
| Pusher-v5                 | ppo\_mujoco\_longhorizon\_arc        | 4e6        |
|                           | sac\_pusher\_arc                     | 1e6        |
| Reacher-v5                | ppo\_mujoco\_longhorizon\_arc        | 4e6        |
|                           | sac\_reacher\_arc                    | 1e6        |
| Swimmer-v5                | ppo\_swimmer\_arc                    | 4e6        |
|                           | sac\_swimmer\_arc                    | 2e6        |
| Walker2d-v5               | ppo\_mujoco\_arc                     | 10e6       |
|                           | sac\_walker2d\_arc                   | 3e6        |

Remote setup: `cp .env.example .env` then set `HF_TOKEN`. See [Remote Training](/slm-lab/using-slm-lab/remote-training.md) for dstack config.

{% hint style="warning" %}
**GPU strongly recommended for MuJoCo.** These benchmarks run 4M-10M frames and take 1-4 hours on cloud GPU (L4/A10G). Local CPU training is not practical. Cloud GPUs via dstack are faster and often cheaper than running on local hardware.
{% endhint %}

### Download and Replay

```bash
# List all available experiments (requires HF_REPO=SLM-Lab/benchmark in .env)
source .env && slm-lab list

# Download a specific experiment
source .env && slm-lab pull ppo_ant_arc

# Replay the trained agent
slm-lab run slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml ppo_ant_arc enjoy@data/ppo_ant_arc_ant_2026_02_12_190644/ppo_ant_arc_t0_spec.json
```

***

## Results

### Ant-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/ant/) | State: Box(105) | Action: Box(8) | Target: >2000

**Settings**: max\_frame 10e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME    | HF Data                                                                                                                                        |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 2138.28 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_ant\_arc | [ppo\_ant\_arc\_ant\_2026\_02\_12\_190644](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_ant_arc_ant_2026_02_12_190644) |
| SAC       | ✅      | 4942.91 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_ant\_arc | [sac\_ant\_arc\_2026\_02\_11\_225529](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_ant_arc_2026_02_11_225529)          |
| CrossQ    | ✅      | 4517.00 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_ant   | [crossq\_ant\_2026\_03\_01\_102428](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_ant_2026_03_01_102428)             |

![Ant-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Ant-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### HalfCheetah-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/half_cheetah/) | State: Box(17) | Action: Box(6) | Target: >5000

**Settings**: max\_frame 10e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME            | HF Data                                                                                                                                                              |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 6240.68 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_arc      | [ppo\_mujoco\_arc\_halfcheetah\_2026\_02\_12\_195553](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_arc_halfcheetah_2026_02_12_195553) |
| SAC       | ✅      | 9815.16 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_halfcheetah\_arc | [sac\_halfcheetah\_4m\_i2\_arc\_2026\_02\_14\_185522](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_halfcheetah_4m_i2_arc_2026_02_14_185522)  |
| CrossQ    | ✅      | 8616.52 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_halfcheetah   | [crossq\_halfcheetah\_2026\_03\_01\_101317](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_halfcheetah_2026_03_01_101317)                   |

![HalfCheetah-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/HalfCheetah-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Hopper-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/hopper/) | State: Box(11) | Action: Box(3) | Target: \~2000

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME       | HF Data                                                                                                                                                    |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ⚠️     | 1653.74 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_hopper\_arc | [ppo\_hopper\_arc\_hopper\_2026\_02\_12\_222206](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_hopper_arc_hopper_2026_02_12_222206) |
| SAC       | ⚠️     | 1416.52 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_hopper\_arc | [sac\_hopper\_3m\_i4\_arc\_2026\_02\_14\_185434](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_hopper_3m_i4_arc_2026_02_14_185434)  |
| CrossQ    | ⚠️     | 1168.53 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_hopper   | [crossq\_hopper\_2026\_02\_21\_101148](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_hopper_2026_02_21_101148)                   |

![Hopper-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Hopper-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Humanoid-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/humanoid/) | State: Box(348) | Action: Box(17) | Target: >1000

**Settings**: max\_frame 10e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME         | HF Data                                                                                                                                                        |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 2661.26 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_arc   | [ppo\_mujoco\_arc\_humanoid\_2026\_02\_12\_185439](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_arc_humanoid_2026_02_12_185439) |
| SAC       | ✅      | 1989.65 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_humanoid\_arc | [sac\_humanoid\_arc\_2026\_02\_12\_020016](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_humanoid_arc_2026_02_12_020016)                |
| CrossQ    | ✅      | 1755.29 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_humanoid   | [crossq\_humanoid\_2026\_03\_01\_165208](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_humanoid_2026_03_01_165208)                   |

![Humanoid-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Humanoid-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### HumanoidStandup-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/humanoid_standup/) | State: Box(348) | Action: Box(17) | Target: >100k

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA        | SPEC\_FILE                                                                                                                                                | SPEC\_NAME                  | HF Data                                                                                                                                                                      |
| --------- | ------ | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 150104.59 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_arc            | [ppo\_mujoco\_arc\_humanoidstandup\_2026\_02\_12\_115050](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_arc_humanoidstandup_2026_02_12_115050) |
| SAC       | ✅      | 137357.00 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_humanoid\_standup\_arc | [sac\_humanoid\_standup\_arc\_2026\_02\_12\_225150](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_humanoid_standup_arc_2026_02_12_225150)             |
| CrossQ    | ✅      | 150912.66 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_humanoid\_standup   | [crossq\_humanoid\_standup\_2026\_02\_28\_184305](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_humanoid_standup_2026_02_28_184305)                |

![HumanoidStandup-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/HumanoidStandup-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### InvertedDoublePendulum-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/inverted_double_pendulum/) | State: Box(9) | Action: Box(1) | Target: \~8000

**Settings**: max\_frame 10e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME                           | HF Data                                                                                                                                                                                                                          |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 8383.76 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_inverted\_double\_pendulum\_arc | [ppo\_inverted\_double\_pendulum\_arc\_inverteddoublependulum\_2026\_02\_12\_225231](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_inverted_double_pendulum_arc_inverteddoublependulum_2026_02_12_225231) |
| SAC       | ✅      | 9032.67 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_inverted\_double\_pendulum\_arc | [sac\_inverted\_double\_pendulum\_arc\_2026\_02\_12\_025206](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_inverted_double_pendulum_arc_2026_02_12_025206)                                                |
| CrossQ    | ✅      | 8027.38 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_inverted\_double\_pendulum   | [crossq\_inverted\_double\_pendulum\_2026\_03\_01\_101354](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_inverted_double_pendulum_2026_03_01_101354)                                                   |

![InvertedDoublePendulum-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/InvertedDoublePendulum-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### InvertedPendulum-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/inverted_pendulum/) | State: Box(4) | Action: Box(1) | Target: \~1000

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA     | SPEC\_FILE                                                                                                                                                | SPEC\_NAME                   | HF Data                                                                                                                                                                                               |
| --------- | ------ | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 949.94 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_inverted\_pendulum\_arc | [ppo\_inverted\_pendulum\_arc\_invertedpendulum\_2026\_02\_12\_062037](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_inverted_pendulum_arc_invertedpendulum_2026_02_12_062037) |
| SAC       | ✅      | 928.43 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_inverted\_pendulum\_arc | [sac\_inverted\_pendulum\_arc\_2026\_02\_12\_225503](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_inverted_pendulum_arc_2026_02_12_225503)                                    |
| CrossQ    | ⚠️     | 877.83 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_inverted\_pendulum   | [crossq\_inverted\_pendulum\_2026\_02\_28\_184348](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_inverted_pendulum_2026_02_28_184348)                                       |

![InvertedPendulum-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/InvertedPendulum-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Pusher-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/pusher/) | State: Box(23) | Action: Box(7) | Target: >-50

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA     | SPEC\_FILE                                                                                                                                                | SPEC\_NAME                    | HF Data                                                                                                                                                                             |
| --------- | ------ | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | -49.59 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_longhorizon\_arc | [ppo\_mujoco\_longhorizon\_arc\_pusher\_2026\_02\_12\_222228](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_longhorizon_arc_pusher_2026_02_12_222228) |
| SAC       | ✅      | -43.00 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_pusher\_arc              | [sac\_pusher\_arc\_2026\_02\_12\_053603](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_pusher_arc_2026_02_12_053603)                                         |
| CrossQ    | ✅      | -37.08 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_pusher                | [crossq\_pusher\_2026\_02\_21\_134637](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_pusher_2026_02_21_134637)                                            |

![Pusher-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Pusher-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Reacher-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/reacher/) | State: Box(10) | Action: Box(2) | Target: >-10

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA    | SPEC\_FILE                                                                                                                                                | SPEC\_NAME                    | HF Data                                                                                                                                                                               |
| --------- | ------ | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | -5.03 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_longhorizon\_arc | [ppo\_mujoco\_longhorizon\_arc\_reacher\_2026\_02\_12\_115033](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_longhorizon_arc_reacher_2026_02_12_115033) |
| SAC       | ✅      | -6.31 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_reacher\_arc             | [sac\_reacher\_arc\_2026\_02\_12\_055200](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_reacher_arc_2026_02_12_055200)                                         |
| CrossQ    | ✅      | -5.65 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_reacher               | [crossq\_reacher\_2026\_02\_28\_184304](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_reacher_2026_02_28_184304)                                            |

![Reacher-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Reacher-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Swimmer-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/swimmer/) | State: Box(8) | Action: Box(2) | Target: >200

**Settings**: max\_frame 4e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA     | SPEC\_FILE                                                                                                                                                | SPEC\_NAME        | HF Data                                                                                                                                                        |
| --------- | ------ | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 282.44 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_swimmer\_arc | [ppo\_swimmer\_arc\_swimmer\_2026\_02\_12\_100445](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_swimmer_arc_swimmer_2026_02_12_100445) |
| SAC       | ✅      | 301.34 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_swimmer\_arc | [sac\_swimmer\_arc\_2026\_02\_12\_054349](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_swimmer_arc_2026_02_12_054349)                  |
| CrossQ    | ✅      | 221.12 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_swimmer   | [crossq\_swimmer\_2026\_02\_21\_184204](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_swimmer_2026_02_21_184204)                     |

![Swimmer-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Swimmer-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

### Walker2d-v5

[Docs](https://gymnasium.farama.org/environments/mujoco/walker2d/) | State: Box(17) | Action: Box(6) | Target: >3500

**Settings**: max\_frame 10e6 | num\_envs 16 | max\_session 4 | log\_frequency 1e4

| Algorithm | Status | MA      | SPEC\_FILE                                                                                                                                                | SPEC\_NAME         | HF Data                                                                                                                                                        |
| --------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| PPO       | ✅      | 4378.62 | [slm\_lab/spec/benchmark\_arc/ppo/ppo\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/ppo/ppo_mujoco_arc.yaml) | ppo\_mujoco\_arc   | [ppo\_mujoco\_arc\_walker2d\_2026\_02\_12\_190312](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/ppo_mujoco_arc_walker2d_2026_02_12_190312) |
| SAC       | ⚠️     | 3123.66 | [slm\_lab/spec/benchmark\_arc/sac/sac\_mujoco\_arc.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark_arc/sac/sac_mujoco_arc.yaml) | sac\_walker2d\_arc | [sac\_walker2d\_3m\_i4\_arc\_2026\_02\_14\_185550](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/sac_walker2d_3m_i4_arc_2026_02_14_185550)  |
| CrossQ    | ✅      | 4389.62 | [slm\_lab/spec/benchmark/crossq/crossq\_mujoco.yaml](https://github.com/kengz/SLM-Lab/blob/master/slm_lab/spec/benchmark/crossq/crossq_mujoco.yaml)       | crossq\_walker2d   | [crossq\_walker2d\_2026\_02\_28\_184343](https://huggingface.co/datasets/SLM-Lab/benchmark/tree/main/data/crossq_walker2d_2026_02_28_184343)                   |

![Walker2d-v5](https://huggingface.co/datasets/SLM-Lab/benchmark/resolve/v5.2.0/docs/plots/Walker2d-v5_multi_trial_graph_mean_returns_ma_vs_frames.png)

**Legend:** ✅ Solved | ⚠️ Close (>80%) | ❌ Failed

***

## CrossQ Wall-Clock Speedup vs SAC

CrossQ eliminates target networks via cross batch normalization, enabling UTD=1 at \~700 fps — 3.5–6.7x faster than SAC on the same hardware.

| Env                | CrossQ FPS | SAC FPS | Speedup |
| ------------------ | ---------- | ------- | ------- |
| HalfCheetah-v5     | 705        | 200     | 3.5x    |
| Hopper-v5          | 693        | 104     | 6.7x    |
| Walker2d-v5        | \~700      | 104     | 6.7x    |
| Ant-v5             | \~700      | 200     | 3.5x    |
| Humanoid-v5        | \~350      | 53      | 6.6x    |
| HumanoidStandup-v5 | 340        | 53      | 6.4x    |

> Measured on RTX 3090. CrossQ achieves comparable scores at significantly lower wall-clock time.

***

## Historical Results (v4)

<details>

<summary>Roboschool Results (v4) - click to expand</summary>

{% hint style="warning" %}
**Deprecated:** Roboschool is abandoned (MuJoCo became free in 2022). These v4 results are preserved for historical reference only. Use Gymnasium MuJoCo environments for new work.

Environment mapping: `RoboschoolHopper-v1` → `Hopper-v5`, `RoboschoolHalfCheetah-v1` → `HalfCheetah-v5`, etc.
{% endhint %}

* [Upload PR #427](https://github.com/kengz/SLM-Lab/pull/427)
* [Google Drive data](https://drive.google.com/file/d/1_rVeXPuZoifXuJmyzY8vC5yregH4DTXo/view?usp=sharing)

| Env. \ Alg.                      | A2C (GAE) | A2C (n-step) | PPO   | SAC        |
| -------------------------------- | --------- | ------------ | ----- | ---------- |
| RoboschoolAnt                    | 787       | 1396         | 1843  | **2915**   |
| RoboschoolHalfCheetah            | 712       | 439          | 1960  | **2497**   |
| RoboschoolHopper                 | 710       | 285          | 2042  | **2045**   |
| RoboschoolInvertedDoublePendulum | 996       | 4410         | 8076  | **8085**   |
| RoboschoolInvertedPendulum       | **995**   | 978          | 986   | 941        |
| RoboschoolReacher                | 12.9      | 10.16        | 19.51 | **19.99**  |
| RoboschoolWalker2d               | 280       | 220          | 1660  | **1894**   |
| RoboschoolHumanoid               | 99.31     | 54.58        | 2388  | **2621**\* |

> Episode score at the end of training. Reported scores are the average over the last 100 checkpoints, averaged over 4 Sessions. Results marked with `*` required 50M-100M frames using async SAC.

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://slm-lab.gitbook.io/slm-lab/benchmark-results/continuous-benchmark.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
