๐Ÿ—๏ธArchitecture

This page describes SLM-Lab's architecture and control flow.

Control Hierarchy

SLM-Lab organizes training into a hierarchical structure:

CLI (slm-lab command)
 โ””โ”€โ”€ Experiment (hyperparameter search)
      โ””โ”€โ”€ Trial (one configuration)
           โ””โ”€โ”€ Session (one random seed)
                โ”œโ”€โ”€ Agent
                โ”‚    โ”œโ”€โ”€ Algorithm
                โ”‚    โ””โ”€โ”€ Memory
                โ””โ”€โ”€ Env
                     โ””โ”€โ”€ MetricsTracker
Level
Purpose
Configured By

Experiment

Orchestrates hyperparameter search via Ray Tune

meta.max_trial, search block

Trial

Runs multiple sessions with one configuration

meta.max_session

Session

Single training run, owns Agent and Env

Random seed

In most cases, you run a single Trial (which creates multiple Sessions). Experiments are used for hyperparameter tuning.

Agent Components

The Agent is a container that wires together three components:

Algorithm

Implements the RL algorithm: network architecture, action selection, and gradient updates.

Class Hierarchy:

Each algorithm extends its parent, adding only the differences:

Algorithm
Parent
Key Difference

VanillaDQN

SARSA

Neural network Q-function

DQNBase

VanillaDQN

Adds target network infrastructure

DQN

DQNBase

Periodic target updates

DoubleDQN

DQN

Uses online network for action selection

ActorCritic

Reinforce

Adds value function (critic), supports GAE and n-step

PPO

ActorCritic

Adds clipped surrogate objective, minibatch training

SAC

ActorCritic

Adds entropy regularization, twin Q-networks

See Class Inheritance: A2C > PPO for a deep dive.

Key Algorithm Methods:

Memory

Stores and retrieves experience for training.

Type
Algorithms
Behavior
Key Config

OnPolicyBatchReplay

PPO, A2C

Fixed-size buffer, cleared after use

training_frequency

Replay

DQN, SAC

Ring buffer with random sampling

batch_size, max_size

PrioritizedReplay

DDQN+PER

Samples by TD-error priority

alpha, epsilon

Memory Interface:

Network

Neural network architectures configured in the net spec:

Type
Use Case
Input
Architecture

MLPNet

Low-dimensional states

Vector

Fully-connected layers

ConvNet

Image observations

Images

CNN + FC layers

RecurrentNet

Partial observability

Sequences

LSTM/GRU + FC

Network Configuration Example:

Training Loop

A Session runs this loop until max_frame:

Training Frequency

How often training happens depends on the algorithm:

Algorithm Type
Training Trigger
Example

On-policy (PPO, A2C)

Every time_horizon steps

PPO trains every 128 steps

Off-policy (DQN, SAC)

Every training_frequency steps

DQN trains every 4 steps

Environment

SLM-Lab uses Gymnasiumarrow-up-right environments with automatic vectorization:

Environment Creation Flow

Key Wrappers

Wrapper
Purpose
Applied To

ClockWrapper

Track frame/episode counts

All envs

AtariPreprocessing

Grayscale, resize, frame skip

ALE envs

FrameStackObservation

Stack N frames

ALE envs

NormalizeObservation

Running mean/std normalization

MuJoCo (optional)

NormalizeReward

Running reward normalization

MuJoCo (optional)

TrackReward

Track true episode rewards

ALE envs

Vectorized Environments

num_envs controls parallelization:

With num_envs=4:

  • Each env.step() returns batched data: (4, state_dim), (4,), etc.

  • Frame count increments by 4 per step

  • Useful for on-policy algorithms that need diverse samples

Spec System

JSON specs fully configure experiments:

Variable Substitution

Specs support ${var} placeholders:

Substitute at runtime:

Search Blocks

Define hyperparameter ranges for Ray Tune:

Extending SLM-Lab

Adding a New Algorithm

  1. Create slm_lab/agent/algorithm/your_algo.py

  2. Inherit from appropriate base (Algorithm, ActorCritic, DQN, etc.)

  3. Override necessary methods:

    • init_algorithm_params() - Set hyperparameter defaults

    • init_nets() - Create networks

    • act() - Action selection

    • train() - Training step

  4. Register in slm_lab/agent/algorithm/__init__.py

  5. Create a spec file for testing

Example: Custom DQN Variant

Adding Environment Support

SLM-Lab works with any gymnasium-compatible environment:

For custom environments, ensure gymnasium API compliance:

Register and use:

Last updated

Was this helpful?