๐Core Concepts
After running your first training, you'll notice SLM Lab creates folders with names like ppo_cartpole_t0_s0. This page explains SLM Lab's experiment hierarchy.
Sessions, Trials, and Experiments
SLM Lab organizes training into three levels:
Session
One training run with a fixed random seed
Train PPO on CartPole once
Trial
Multiple sessions with different seeds
Train PPO on CartPole 4 times for reliable results
Experiment
Multiple trials with different hyperparameters
Find the best learning rate for PPO on CartPole
Why multiple seeds? Deep RL results can vary significantly between runs due to random initialization, environment stochasticity, and exploration noise. Running 4 sessions with different random seeds gives you a reliable average rather than a lucky (or unlucky) single result.

Mapping to Lab Modes
When using the lab command, different lab modes correspond to different levels:
enjoy
Session
1
1
Replay trained model
dev
Trial
1
1
Quick debugging with rendering
train
Trial
4 (default)
1
Full training run
search
Experiment
1-4
N
Hyperparameter tuning
Concrete Example
Let's trace what happens when you run:
Trial starts: Creates 4 Sessions (controlled by
meta.max_session)Each Session:
Gets a different random seed
Creates its own Agent and Env
Runs until
max_frame(200,000 frames)Saves checkpoints and metrics
Trial completes: Aggregates results across all Sessions
Output: Trial graph showing mean ยฑ std across sessions
The output folder structure:
Naming Convention
Output files follow: {spec_name}_t{trial}_s{session}_{type}.{ext}
t0= Trial 0,s0= Session 0ckpt-best= Best checkpoint (highesttotal_reward_ma)No prefix = final checkpoint
Reproducibility
Every experiment in SLM Lab can be reproduced exactly. When you run an experiment, SLM Lab saves:
Trial spec file (
*_t0_spec.json) - all hyperparameters with fixed values (no search ranges)Git SHA - the exact code version, saved in the trial spec's
meta.git_shaRandom seeds - deterministic per session
Reproducing Results
The trial spec (ppo_cartpole_t0_spec.json) includes the git SHA for exact reproduction:
To reproduce:
Published Benchmark Results
SLM Lab auto-uploads results to HuggingFace when env vars are configured (see Train: PPO on CartPole). You can download and replay any published experiment:
The Spec File
A spec file is a JSON file that completely defines an experiment. From ppo_cartpole.json:
Key Spec Sections
agent.algorithm
RL algorithm settings (gamma, lam, etc.)
agent.memory
Experience storage configuration
agent.net
Neural network architecture and optimizer
env
Environment name and training budget
meta
Experiment settings (sessions, trials)
What's Next
Continue with these tutorials:
Agent Spec - Configure algorithms and networks
Env Spec - Configure environments for MuJoCo
GPU Training - Train on Atari with GPU
Hyperparameter Search - Find optimal settings with ASHA
Last updated
Was this helpful?