๐ŸŽ“Core Concepts

After running your first training, you'll notice SLM Lab creates folders with names like ppo_cartpole_t0_s0. This page explains SLM Lab's experiment hierarchy.

Sessions, Trials, and Experiments

SLM Lab organizes training into three levels:

Level
What It Is
Example

Session

One training run with a fixed random seed

Train PPO on CartPole once

Trial

Multiple sessions with different seeds

Train PPO on CartPole 4 times for reliable results

Experiment

Multiple trials with different hyperparameters

Find the best learning rate for PPO on CartPole

Why multiple seeds? Deep RL results can vary significantly between runs due to random initialization, environment stochasticity, and exploration noise. Running 4 sessions with different random seeds gives you a reliable average rather than a lucky (or unlucky) single result.

The graphs for Session, Trial, and Experiment.

Mapping to Lab Modes

When using the lab command, different lab modes correspond to different levels:

Mode
Level
Sessions
Trials
Use Case

enjoy

Session

1

1

Replay trained model

dev

Trial

1

1

Quick debugging with rendering

train

Trial

4 (default)

1

Full training run

search

Experiment

1-4

N

Hyperparameter tuning

Concrete Example

Let's trace what happens when you run:

  1. Trial starts: Creates 4 Sessions (controlled by meta.max_session)

  2. Each Session:

    • Gets a different random seed

    • Creates its own Agent and Env

    • Runs until max_frame (200,000 frames)

    • Saves checkpoints and metrics

  3. Trial completes: Aggregates results across all Sessions

  4. Output: Trial graph showing mean ยฑ std across sessions

The output folder structure:

Naming Convention

Output files follow: {spec_name}_t{trial}_s{session}_{type}.{ext}

  • t0 = Trial 0, s0 = Session 0

  • ckpt-best = Best checkpoint (highest total_reward_ma)

  • No prefix = final checkpoint

Reproducibility

Every experiment in SLM Lab can be reproduced exactly. When you run an experiment, SLM Lab saves:

  1. Trial spec file (*_t0_spec.json) - all hyperparameters with fixed values (no search ranges)

  2. Git SHA - the exact code version, saved in the trial spec's meta.git_sha

  3. Random seeds - deterministic per session

Reproducing Results

The trial spec (ppo_cartpole_t0_spec.json) includes the git SHA for exact reproduction:

To reproduce:

Published Benchmark Results

SLM Lab auto-uploads results to HuggingFacearrow-up-right when env vars are configured (see Train: PPO on CartPole). You can download and replay any published experiment:

The Spec File

A spec file is a JSON file that completely defines an experiment. From ppo_cartpole.jsonarrow-up-right:

Key Spec Sections

Section
Purpose

agent.algorithm

RL algorithm settings (gamma, lam, etc.)

agent.memory

Experience storage configuration

agent.net

Neural network architecture and optimizer

env

Environment name and training budget

meta

Experiment settings (sessions, trials)

What's Next

Continue with these tutorials:

Last updated

Was this helpful?