For the complete documentation index, see llms.txt. This page is also available as Markdown.

โš™๏ธMeta Spec

The meta spec controls experiment-level settings: how many sessions to run, how often to checkpoint, and more.

The Meta Spec Structure

The meta spec is specified using the meta key in a spec file:

{
  "spec_name": {
    "agent": {...},
    "env": {...},
    "meta": {
      "max_session": 4,           // Sessions per trial (different random seeds)
      "max_trial": 1,             // Trials per experiment (for search mode)
      "log_frequency": 1000,      // Log training metrics every N frames
      "eval_frequency": 1000,     // Evaluate agent every N frames
      "distributed": false,       // Hogwild! parallel training
      "rigorous_eval": null       // Separate eval environments (slower but precise)
    }
  }
}

Meta Spec Options

Option
Type
Default
Description

max_session

int

4

Sessions per trial (each with different random seed)

max_trial

int

1

Trials per experiment (used in search mode)

log_frequency

int

1000

Frames between training checkpoints

eval_frequency

int

1000

Frames between evaluation checkpoints

distributed

bool/str

false

Enable Hogwild! distributed training

rigorous_eval

int/null

null

Spawn separate eval environments

Session and Trial Counts

Setting
Train Mode
Search Mode

max_session: 1

Single run (fast, less reliable)

Required for ASHA early stopping

max_session: 4

Standard (4 seeds, statistically robust)

For validation after ASHA

max_trial: 1

Single configuration

โ€”

max_trial: 8-16

โ€”

Typical ASHA search budget

Checkpoint Frequencies

Environment

Typical log_frequency

Typical eval_frequency

Classic Control

500

500

Box2D

1000

1000

MuJoCo

10000

10000

Atari

10000

10000

Frequency = frames, not episodes. With num_envs=16, a frequency of 10000 means ~625 steps per environment between checkpoints.

Distributed Training (Hogwild!)

Value
Behavior

false

Standard sequential training

"shared"

Sessions share network parameters continuously

"synced"

Sessions sync parameters after each training step

Rigorous Evaluation

Value
Behavior
Use Case

null or 0

Infer eval scores from training (fast)

Default for most environments

8

Spawn 8 separate eval environments

When train/eval behavior differs

Common Configurations

Quick Development

For fast iteration during development:

Standard Training

For reliable results with statistical significance:

For efficient search with early stopping:

See Search Spec for details on ASHA configuration.

Example: Atari Configuration

Atari games have multiple lives per episode. SLM Lab's TrackReward wrapper tracks true episodic rewards across lives, so you can use fast evaluation:

eval_frequency vs log_frequency: Both are independent. For Atari, they're typically set equal since TrackReward handles multi-life scoring automatically.

When to Adjust Meta Settings

Scenario
Adjustment

Training too slow

Increase log_frequency and eval_frequency

Need more checkpoints

Decrease frequencies (more disk usage)

Results inconsistent

Increase max_session to 4 or 8

Hyperparameter tuning

Set max_session: 1, increase max_trial

Very long training

Set high eval_frequency to reduce overhead

Finding Meta Configurations

All benchmark specs have pre-configured meta settings. Use them as references:

Meta Settings by Environment Type

Environment

max_session

log_frequency

eval_frequency

Why

Classic Control

4

500

500

Fast training, frequent checkpoints useful

Box2D

4

1000

1000

Medium training time

MuJoCo

4

10000

10000

Long training, reduce I/O overhead

Atari

4

10000

10000

Very long training (10M frames)

Benchmark specs are pre-configured. The specs in slm_lab/spec/benchmark/ use appropriate meta settings for each environment. Copy and modify them for your experiments.

Last updated

Was this helpful?