Score = Final 100-checkpoint moving average (total_reward_ma)
The trial score is the mean across 4 sessions, providing statistically meaningful results.
Training Details
Setting
Value
Sessions per trial
4 (different random seeds)
Checkpoint frequency
Varies by env (see below)
Moving average window
100 checkpoints
Hardware
Cloud GPUs (L4/A10G via dstack)
Environment Settings
Standardized settings for fair comparison across environment categories:
Category
num_envs
max_frame
log_frequency
ASHA grace_period
Classic Control
4
2e5-3e5
500
1e4
Box2D
8
3e5
1000
5e4
MuJoCo
16
4e6-10e6
10000
1e5-1e6
Atari
16
10e6
10000
5e5
The grace_period is the minimum frames before ASHA can terminate underperforming trials. Set it high enough for meaningful learning signal (typically 5-10% of max_frame).
Hardware Requirements
Category
GPU Required
Typical Runtime
Recommendation
Classic Control
No
Minutes
Local CPU is fine
Box2D
Optional
10-30 min
Local or remote
MuJoCo
Yes
1-4 hours
Use run-remote --gpu
Atari
Yes
2-3 hours
Use run-remote --gpu
Cloud GPUs recommended for MuJoCo and Atari. Cloud L4/A10G via dstack is faster and often cheaper than local training. See Remote Training for setup.
Contributing Benchmark Results
When adding or updating benchmarks:
Audit spec settings: Ensure your spec.json matches the Settings line in the benchmark table
Run and commit: Execute the benchmark, then commit the spec file to the repo
Record scores: Extract total_reward_ma from logs and add HuggingFace folder link
Generate plots: Use slm-lab plot -t "EnvName" -f folder1,folder2,...
Only use final validation runs (not search results) for benchmark tables. Search is for hyperparameter discovery; validation confirms with committed specs.
Reproducibility
Every experiment can be exactly reproduced:
For exact code version, checkout the git SHA in the spec file.
Historical Data
v4 Results (Google Drive)
v4 benchmarks used OpenAI Gym and Roboschool (both deprecated). Available for historical reference:
slm-lab run slm_lab/spec/benchmark/ppo/ppo_hopper.json ppo_hopper enjoy@data/ppo_hopper_2026_01_31_105438/ppo_hopper_t0_spec.json
# 1. Download the experiment
source .env && slm-lab pull ppo_hopper
# 2. Check the spec for settings and git SHA
cat data/ppo_hopper_*/ppo_hopper_t0_spec.json
# 3. Replay the trained model
slm-lab run slm_lab/spec/benchmark/ppo/ppo_hopper.json ppo_hopper enjoy@data/ppo_hopper_*/ppo_hopper_t0_spec.json