🎯Discrete Benchmark

Classic Control & Box2D Results (v5)

SLM Lab v5 validates algorithms on Gymnasium discrete environments. These benchmarks cover:

Classic Control: CartPole, Acrobot, Pendulum—simple physics tasks ideal for algorithm validation
Box2D: LunarLander—2D physics with more complex dynamics

Results below are from January 2026 benchmark reruns using Gymnasium v5 environments.

All trained models and metrics are publicly available on HuggingFace.

Methodology

Results show Trial-level performance:

Trial = 4 Sessions with different random seeds
Session = One complete training run
Score = Final 100-checkpoint moving average (total_reward_ma)

The trial score is the mean across 4 sessions, providing statistically meaningful results.

Standardized Settings

Running Benchmarks

Local - runs on your machine (Classic Control completes in minutes on CPU):

slm-lab run slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole train
slm-lab run slm_lab/spec/benchmark/dqn/ddqn_per_lunar.json ddqn_per_concat_lunar train

Remote - cloud GPU via dstack, auto-syncs to HuggingFace:

source .env && slm-lab run-remote slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole train -n cartpole
source .env && slm-lab run-remote slm_lab/spec/benchmark/dqn/ddqn_per_lunar.json ddqn_per_concat_lunar train -n lunar

Remote setup: cp .env.example .env then set HF_TOKEN. See Remote Training for dstack config.

GPU not required for Classic Control. These environments train fast on CPU. Box2D (LunarLander) benefits from GPU but still runs fine locally.

Download and Replay

# List all available experiments (requires HF_REPO=SLM-Lab/benchmark in .env)
source .env && slm-lab list

# Download a specific experiment
source .env && slm-lab pull ppo_cartpole

# Replay the trained agent
slm-lab run slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole enjoy@data/ppo_cartpole_*/ppo_cartpole_t0_spec.json

Results

Classic Control

CartPole-v1

Docs | State: Box(4) | Action: Discrete(2) | Target: >400

Settings: max_frame 2e5 | num_envs 4 | max_session 4 | log_frequency 500

Algorithm

Status

Spec

HuggingFace

REINFORCE

✅

469.7

reinforce_cartpole.json

reinforce_cartpole_2026_01_30

SARSA

✅

421.6

sarsa_cartpole.json

sarsa_boltzmann_cartpole_2026_01_30

DQN

⚠️

188.1

dqn_cartpole.json

dqn_boltzmann_cartpole_2026_01_30

DDQN+PER

✅

432.9

dqn_cartpole.json

ddqn_per_boltzmann_cartpole_2026_01_30

A2C

✅

499.7

a2c_gae_cartpole.json

a2c_gae_cartpole_2026_01_30

PPO

✅

495.6

ppo_cartpole.json

ppo_cartpole_2026_02_08

SAC

✅

415.0

sac_cartpole.json

sac_cartpole_2026_02_08

Acrobot-v1

Docs | State: Box(6) | Action: Discrete(3) | Target: >-100

Settings: max_frame 3e5 | num_envs 4 | max_session 4 | log_frequency 500

Algorithm

Status

Spec

HuggingFace

DQN

✅

-94.8

dqn_acrobot.json

dqn_boltzmann_acrobot_2026_01_30

DDQN+PER

✅

-85.2

ddqn_per_acrobot.json

ddqn_per_acrobot_2026_01_30

A2C

✅

-83.8

a2c_gae_acrobot.json

a2c_gae_acrobot_2026_01_30

PPO

✅

-81.4

ppo_acrobot.json

ppo_acrobot_2026_01_30

SAC

✅

-90.3

sac_acrobot.json

sac_acrobot_2026_02_08

Pendulum-v1

Docs | State: Box(3) | Action: Box(1) | Target: >-200

Settings: max_frame 3e5 | num_envs 4 | max_session 4 | log_frequency 500

Algorithm

Status

Spec

HuggingFace

A2C

❌

-553

a2c_gae_pendulum.json

a2c_gae_pendulum_2026_01_30

PPO

✅

-168.3

ppo_pendulum.json

ppo_pendulum_2026_01_30

SAC

✅

-148.7

sac_pendulum.json

sac_pendulum_2026_02_08

Box2D

LunarLander-v3 (Discrete)

Docs | State: Box(8) | Action: Discrete(4) | Target: >200

Settings: max_frame 3e5 | num_envs 8 | max_session 4 | log_frequency 1000

Algorithm

Status

Spec

HuggingFace

DQN

⚠️

183.6

dqn_lunar.json

dqn_concat_lunar_2026_01_30

DDQN+PER

✅

261.5

ddqn_per_lunar.json

ddqn_per_concat_lunar_2026_01_30

A2C

❌

9.5

a2c_gae_lunar.json

a2c_gae_lunar_2026_01_30

PPO

⚠️

159.0

ppo_lunar.json

ppo_lunar_2026_01_30

SAC

⚠️

134.5

sac_lunar.json

sac_lunar_2026_02_08

LunarLander-v3 (Continuous)

Docs | State: Box(8) | Action: Box(2) | Target: >200

Settings: max_frame 3e5 | num_envs 8 | max_session 4 | log_frequency 1000

Algorithm

Status

Spec

HuggingFace

A2C

❌

-38.2

a2c_gae_lunar.json

a2c_gae_lunar_continuous_2026_01_30

PPO

⚠️

165.5

ppo_lunar.json

ppo_lunar_continuous_2026_01_31

SAC

⚠️

179.4

sac_lunar.json

sac_lunar_continuous_2026_02_08

Legend: ✅ Solved | ⚠️ Close (>80%) | ❌ Failed

Historical Results (v4)

OpenAI Gym Results (v4) - click to expand

These results from SLM Lab v4 used OpenAI Gym environments (now deprecated). Environment versions differ from current Gymnasium versions. Unity environments are no longer included in the core package.

Env. \ Alg.

DQN

DDQN+PER

A2C (GAE)

A2C (n-step)

PPO

SAC

Breakout

80.88

182

377

398

443

3.51*

Pong

18.48

20.5

19.31

19.56

20.58

19.87*

Qbert

5494

11426

12405

13590

13460

923*

Seaquest

1185

4405

1070

1684

1715

171*

LunarLander

192

233

25.21

68.23

214

276

Episode score at the end of training. Reported scores are the average over the last 100 checkpoints, averaged over 4 Sessions. Results marked with * used async SAC.

For the full Atari benchmark, see Atari Benchmark.

PreviousPublic Benchmark Data NextContinuous Benchmark

Last updated 7 hours ago

Was this helpful?

hashtagClassic Control & Box2D Results (v5)

hashtagMethodology

hashtagStandardized Settings

hashtagRunning Benchmarks

hashtagDownload and Replay

hashtagResults

hashtagClassic Control

hashtagCartPole-v1

hashtagAcrobot-v1

hashtagPendulum-v1

hashtagBox2D

hashtagLunarLander-v3 (Discrete)

hashtagLunarLander-v3 (Continuous)

hashtagHistorical Results (v4)

Classic Control & Box2D Results (v5)

Methodology

Standardized Settings

Running Benchmarks

Download and Replay

Results

Classic Control

CartPole-v1

Acrobot-v1

Pendulum-v1

Box2D

LunarLander-v3 (Discrete)

LunarLander-v3 (Continuous)

Historical Results (v4)