โšกQuick Start

Test your installation with a quick demo.

Command Format

Check the run command options:

slm-lab run --help
Usage: slm-lab run [OPTIONS] [SPEC_FILE] [SPEC_NAME] [MODE]

Arguments:
  spec_file   JSON spec file path [default: slm_lab/spec/benchmark/ppo/ppo_cartpole.json]
  spec_name   Spec name within the file [default: ppo_cartpole]
  mode        Execution mode: dev|train|search|enjoy [default: dev]

Options:
  --set, -s   Set spec variables: KEY=VALUE (can be used multiple times)
  --render    Enable environment rendering

The full command format is:

slm-lab run spec.json spec_name mode

So slm-lab run is equivalent to:

slm-lab run slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole dev

Run the Demo

This runs PPO on CartPole with visualization (single session, slower due to rendering). CartPole is a classic RL benchmark: balance a pole on a cart by moving left or right. The agent receives +1 reward per timestep the pole stays upright (max 500 per episode).

What to Expect

Terminal output:

Key metrics:

  • frame: Total environment steps processed

  • total_reward: Episode reward at this checkpoint

  • total_reward_ma: Moving average over 100 checkpoints (the primary success metric)

  • loss: Training loss (should decrease over time)

Rendering window:

CartPole demo

Early in training, the pole falls quickly. As total_reward_ma climbs toward 400-500, you'll see the agent balance for longer periods.

Success Criteria

Random actions score ~20-30. PPO solves CartPole (450+ reward) within 50,000-100,000 frames.

Stopping the Demo

Press Ctrl+C to stop. In dev mode, partial results are not saved.

circle-check
circle-exclamation

Next Steps

  1. Train: PPO on CartPole - Full training with saved results

  2. Understanding Experiments - Sessions, Trials, and Experiments

Last updated

Was this helpful?