This tutorial shows how to configure the env spec for continuous control environments. We'll train PPO on HalfCheetahโa MuJoCo locomotion task.
The Env Spec
The environment is specified using the env key in a spec file:
{"spec_name": {"agent":{...},"env":{ // Environment name (must be in gymnasium registry)"name":str, // Number of parallel environment instances"num_envs":int, // Maximum timesteps per episode (null = use environment default)"max_t":int|null, // Total training frames"max_frame":int, // Optional: Online state normalization (recommended for MuJoCo)"normalize_obs":bool, // Optional: Online reward normalization (recommended for MuJoCo)"normalize_reward":bool, // Optional: Clip observations to [-bound, bound] (default: 10.0 if normalize_obs)"clip_obs":float, // Optional: Clip rewards to [-bound, bound] (default: 10.0 if normalize_reward)"clip_reward":float},...}}
Supported Environments
SLM Lab uses Gymnasium (the maintained fork of OpenAI Gym):
Any gymnasium-compatible environment worksโjust specify its name in the spec.
Environment-Specific Settings
Category
num_envs
max_frame
Normalization
GPU
Classic Control
4
2e5-3e5
No
Optional
Box2D
8
3e5
No
Optional
MuJoCo
16
1e6-10e6
normalize_obs, normalize_reward
Optional
Atari
16
10e6
No
Recommended
See Benchmark Specs for complete spec files for each environment.
Example: PPO on HalfCheetah
HalfCheetah-v5 is a classic MuJoCo benchmarkโa 2D cheetah robot that learns to run forward. It has a 17-dimensional observation space and 6-dimensional continuous action space.
To use a different environment, find a spec for that environment category and modify env.name. Spec files are organized by algorithm in slm_lab/spec/benchmark/:
Hopper-v5, HalfCheetah-v5, Walker2d-v5, Ant-v5, Humanoid-v5, Swimmer-v5, etc.
ppo_mujoco.json, sac_mujoco.json
Atari
54 games (ALE/Qbert-v5, ALE/MsPacman-v5, etc.)
ppo_atari.json, dqn_atari.json
Switching Environments
Find a spec for your target environment category
Change env.name to the Gymnasium environment name
Adjust settings as needed (num_envs, max_frame, normalization)
Exampleโuse the same algorithm on different environments:
Template Specs with Variable Substitution
Some specs use ${var} placeholders for flexibility. Use -s var=value to substitute:
Finding Environment Specs
Any Gymnasium environment works. Just set env.name to a valid Gymnasium environment ID. Use the benchmark specs as starting points for hyperparameters.
Standard Settings for Fair Comparison
When comparing algorithms, use consistent environment settings. Different num_envs or max_frame values make comparisons invalid.
Recommended Settings by Category
Category
num_envs
max_frame
log_frequency
Notes
Classic Control
4
2e5-3e5
500
Fast training
Box2D
8
3e5
1000
Medium complexity
MuJoCo
16
4e6-10e6
10000
Use normalization
Atari
16
10e6
10000
GPU recommended
What to Keep Consistent
When comparing algorithms on the same environment:
Parameter
Keep Same?
Why
num_envs
Yes
Affects data collection rate and batch statistics
max_frame
Yes
Total training budget must match
max_t
Yes
Episode length affects learning signal
normalize_obs
Yes
Changes observation distribution
normalize_reward
Yes
Changes reward scale
Example: Fair Algorithm Comparison
To compare DQN vs PPO on LunarLander fairly:
Check that both specs have matching env settings before comparing results.
Benchmark specs are pre-configured. The specs in slm_lab/spec/benchmark/ use standardized settings for each environment category. When creating custom specs, match these settings for comparable results.
Advanced Env Options
Environment Kwargs
Any additional keys in the env spec are passed to gymnasium.make():
Normalization Details
The normalization wrappers maintain running statistics:
Option
What It Does
When to Use
normalize_obs
Centers observations, scales to unit variance
MuJoCo, continuous control
normalize_reward
Scales rewards using running std
Environments with varying reward scales
Gymnasium API: SLM Lab v5 uses Gymnasium's (obs, reward, terminated, truncated, info) return format. This correctly distinguishes task completion (terminated) from time limits (truncated)โimportant for proper value estimation.
Next, we'll use GPU to train on Atari games where image processing is the bottleneck.
# Dev mode (quick test with rendering)
slm-lab run -s env=HalfCheetah-v5 -s max_frame=1e5 slm_lab/spec/benchmark/ppo/ppo_mujoco.json ppo_mujoco dev
# Full training (4M frames)
slm-lab run -s env=HalfCheetah-v5 -s max_frame=4e6 slm_lab/spec/benchmark/ppo/ppo_mujoco.json ppo_mujoco train
# PPO on CartPole (Classic Control)
slm-lab run slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole train
# PPO on LunarLander (Box2D)
slm-lab run slm_lab/spec/benchmark/ppo/ppo_lunar.json ppo_lunar train
# PPO on HalfCheetah (MuJoCo) - uses variable substitution
slm-lab run -s env=HalfCheetah-v5 -s max_frame=4e6 slm_lab/spec/benchmark/ppo/ppo_mujoco.json ppo_mujoco train
# PPO on Breakout (Atari) - uses variable substitution
slm-lab run -s env=ALE/Breakout-v5 slm_lab/spec/benchmark/ppo/ppo_atari.json ppo_atari train
# MuJoCo template - works for any MuJoCo environment
slm-lab run -s env=Hopper-v5 -s max_frame=2e6 slm_lab/spec/benchmark/ppo/ppo_mujoco.json ppo_mujoco train
slm-lab run -s env=Walker2d-v5 -s max_frame=5e6 slm_lab/spec/benchmark/ppo/ppo_mujoco.json ppo_mujoco train
# Atari template - works for any ALE game
slm-lab run -s env=ALE/Qbert-v5 slm_lab/spec/benchmark/ppo/ppo_atari.json ppo_atari train
slm-lab run -s env=ALE/MsPacman-v5 slm_lab/spec/benchmark/ppo/ppo_atari.json ppo_atari train
# List all benchmark specs
ls slm_lab/spec/benchmark/
# Find specs for a specific environment
grep -r "CartPole" slm_lab/spec/benchmark/
grep -r "LunarLander" slm_lab/spec/benchmark/
grep -r "HalfCheetah" slm_lab/spec/benchmark/
# Both use: num_envs=8, max_frame=3e5, max_session=4
slm-lab run slm_lab/spec/benchmark/dqn/dqn_lunar.json dqn_concat_lunar train
slm-lab run slm_lab/spec/benchmark/ppo/ppo_lunar.json ppo_lunar train