SLM Lab

Modular Deep Reinforcement Learning framework in PyTorch.

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book Foundations of Deep Reinforcement Learningarrow-up-right. GitHubarrow-up-right ยท Benchmark Results

circle-info

v5.0 updates to Gymnasium, uv tooling, and modern dependencies with ARM supportโ€”see Changelog.

Book readers: use git checkout v4.1.1 for the book code. See book website and errataarrow-up-right.

ppo beamrider

ppo breakout

ppo kungfumaster

ppo mspacman

BeamRider

Breakout

KungFuMaster

MsPacman

ppo pong

ppo qbert

ppo seaquest

ppo spaceinvaders

Pong

Qbert

Seaquest

Sp.Invaders

sac ant

sac halfcheetah

sac hopper

sac humanoid

Ant

HalfCheetah

Hopper

Humanoid

sac doublependulum

sac pendulum

sac reacher

sac walker

Inv.DoublePendulum

InvertedPendulum

Reacher

Walker

SLM Lab is a software framework for reinforcement learning (RL) research and application in PyTorch. RL trains agents to make decisions by learning from trial and errorโ€”like teaching a robot to walk or an AI to play games.

What SLM Lab Offers

Feature
Description

Ready-to-use algorithms

PPO, SAC, DQN, A2C, REINFORCEโ€”validated on 70+ environments

Easy configuration

JSON spec files fully define experimentsโ€”no code changes needed

Reproducibility

Every run saves its spec + git SHA for exact reproduction

Automatic analysis

Training curves, metrics, and TensorBoard logging out of the box

Cloud integration

dstack for GPU training, HuggingFace for sharing results

Algorithms

SLM Lab implements the canonical RL algorithms with a taxonomy-based inheritance design:

Algorithm
Type
Best For
Validated Environments

REINFORCE

On-policy

Learning/teaching

Classic

SARSA

On-policy

Tabular-like

Classic

DQN/DDQN+PER

Off-policy

Discrete actions

Classic, Box2D, Atari

A2C

On-policy

Fast iteration

Classic, Box2D, Atari

PPO

On-policy

General purpose

Classic, Box2D, MuJoCo (11), Atari (54)

SAC

Off-policy

Continuous control

Classic, Box2D, MuJoCo

See Benchmark Results for detailed performance data.

Environments

SLM Lab uses Gymnasiumarrow-up-right (the maintained fork of OpenAI Gym):

Category
Examples
Difficulty
Docs

Classic Control

CartPole, Pendulum, Acrobot

Easy

Box2D

LunarLander, BipedalWalker

Medium

MuJoCo

Hopper, HalfCheetah, Humanoid

Hard

Atari

Qbert, MsPacman, and 54 more

Varied

Any gymnasium-compatible environment worksโ€”just specify its name in the spec.

Citation

If you use SLM Lab in your publication, please cite:

License

This project is licensed under the MIT Licensearrow-up-right.

Last updated

Was this helpful?