# Quick Start

## Command Format

Check the run command options:

```bash
slm-lab run --help
```

```
Usage: slm-lab run [OPTIONS] [SPEC_FILE] [SPEC_NAME] [MODE]

Arguments:
  spec_file   JSON spec file path [default: slm_lab/spec/benchmark/ppo/ppo_cartpole.json]
  spec_name   Spec name within the file [default: ppo_cartpole]
  mode        Execution mode: dev|train|search|enjoy [default: dev]

Options:
  --set, -s   Set spec variables: KEY=VALUE (can be used multiple times)
  --render    Enable environment rendering
```

The full command format is:

```bash
slm-lab run spec.json spec_name mode
```

So `slm-lab run` is equivalent to:

```bash
slm-lab run slm_lab/spec/benchmark/ppo/ppo_cartpole.json ppo_cartpole dev
```

## Run the Demo

```bash
slm-lab run --render
```

This runs PPO on CartPole with visualization (single session, slower due to rendering). **CartPole** is a classic RL benchmark: balance a pole on a cart by moving left or right. The agent receives +1 reward per timestep the pole stays upright (max 500 per episode).

### What to Expect

**Terminal output:**

```
[2026-01-15 12:00:00] INFO: Starting ppo_cartpole trial t0
[2026-01-15 12:00:05] INFO: frame: 1000 | total_reward: 23.5 | total_reward_ma: 23.5 | loss: 0.012
[2026-01-15 12:00:10] INFO: frame: 2000 | total_reward: 45.2 | total_reward_ma: 34.3 | loss: 0.008
...
[2026-01-15 12:02:30] INFO: frame: 50000 | total_reward: 500.0 | total_reward_ma: 487.2 | loss: 0.002
```

Key metrics:

* **frame**: Total environment steps processed
* **total\_reward**: Episode reward at this checkpoint
* **total\_reward\_ma**: Moving average over 100 checkpoints (the primary success metric)
* **loss**: Training loss (should decrease over time)

**Rendering window:**

![CartPole demo](/files/d9aCWEVvaySsHjcf2sl7)

Early in training, the pole falls quickly. As `total_reward_ma` climbs toward 400-500, you'll see the agent balance for longer periods.

### Success Criteria

Random actions score \~20-30. PPO solves CartPole (450+ reward) within 50,000-100,000 frames.

### Stopping the Demo

Press `Ctrl+C` to stop. In dev mode, partial results are not saved.

{% hint style="success" %}
If you see rewards climbing, SLM Lab is working correctly. Continue to [Train: PPO on CartPole](/slm-lab/using-slm-lab/train-ppo-cartpole.md) for a full training run.
{% endhint %}

{% hint style="warning" %}
**Troubleshooting:** If you encounter errors, see [Help](/slm-lab/resources/help.md) for common issues and solutions.
{% endhint %}

## Next Steps

1. [**Train: PPO on CartPole**](/slm-lab/using-slm-lab/train-ppo-cartpole.md) - Full training with saved results
2. [**Understanding Experiments**](/slm-lab/using-slm-lab/lab-organization.md) - Sessions, Trials, and Experiments


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://slm-lab.gitbook.io/slm-lab/setup/quick-start.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
