๐Ÿ“Changelog

This page documents major framework releases.

For detailed code changes, see the GitHub releasesarrow-up-right and CHANGELOG.mdarrow-up-right in the code repository.


SLM-Lab v5.0.0

Modernization release for the current RL ecosystem. This release updates SLM Lab to work with the modern Python RL stack while maintaining backward compatibility with the book Foundations of Deep Reinforcement Learning.

circle-info

Book readers: For exact code from Foundations of Deep Reinforcement Learning, use git checkout v4.1.1

Critical: Atari v5 Sticky Actions

SLM-Lab uses Gymnasium ALE v5 defaults. v5 default repeat_action_probability=0.25 (sticky actions) randomly repeats agent actions to simulate console stochasticity, making evaluation harder but more realistic than v4 default 0.0 used by most benchmarks (CleanRL, SB3, RL Zoo). This follows Machado et al. (2018)arrow-up-right research best practices. See ALE version historyarrow-up-right.

Why v5?

The RL ecosystem has evolved significantly since SLM Lab v4:

  1. OpenAI Gym โ†’ Gymnasium: OpenAI deprecated Gym in 2022. Gymnasium (by Farama Foundation) is the maintained fork with better API design

  2. Roboschool โ†’ MuJoCo: Roboschool was abandoned. MuJoCo became free in 2022 and is now the standard for continuous control

  3. Conda โ†’ uv: Modern Python dependency management is faster and more reliable with uv

  4. Simpler specs: Removed legacy multi-agent abstractions that added complexity without benefit

Key Changes

Category
v4
v5

Package manager

conda

uv

Environment library

OpenAI Gym

Gymnasium

Continuous control

Roboschool

MuJoCo

Entry point

python run_lab.py

slm-lab run

Spec format

Arrays with body

Simple objects

Migration Summary

v4
v5

conda activate lab && python run_lab.py

slm-lab run

CartPole-v0

CartPole-v1

Acrobot-v1

Acrobot-v1 (unchanged)

Pendulum-v0

Pendulum-v1

LunarLander-v2

LunarLander-v3

PongNoFrameskip-v4

ALE/Pong-v5

BreakoutNoFrameskip-v4

ALE/Breakout-v5

RoboschoolHopper-v1

Hopper-v5 (MuJoCo)

RoboschoolHalfCheetah-v1

HalfCheetah-v5 (MuJoCo)

RoboschoolHumanoid-v1

Humanoid-v5 (MuJoCo)

agent: [{...}], env: [{...}], body: {...}

agent: {...}, env: {...}

Gymnasium API Change

The most significant change is how episode endings are handled. v5 uses the modern Gymnasium API which separates episode endings into two distinct signals:

What's the difference?

  • terminated: Episode ended due to the task itself (goal reached, agent died, game over)

  • truncated: Episode ended due to external limits (time limit, max steps reached)

Why does this matter?

This distinction is critical for correct value bootstrapping in RL algorithms:

In v4, algorithms had to guess whether done=True meant a real ending or just a time limit. This led to subtle bugs and inconsistent behavior. All SLM Lab v5 algorithms handle this correctly.

New v5 Features

Algorithm improvements:

  • PPO: normalize_v_targets for running statistics normalization, symlog_transform (from DreamerV3), clip_vloss (CleanRL-style)

  • SAC: Discrete action support uses exact expectation (Christodoulou 2019). Target entropy auto-calculated.

  • Networks: Optional layer_norm for MLP hidden layers

  • life_loss_info: Proper Atari game-over handling (continue after life loss)

Infrastructure:

  • Ray Tune ASHA search for efficient hyperparameter tuning

  • dstack integration for cloud GPU training

  • HuggingFace integration for experiment storage and sharing

Benchmarks:

All algorithms validated on Gymnasium. Full results in Benchmark Results.

Category
REINFORCE
SARSA
DQN
DDQN+PER
A2C
PPO
SAC

Classic Control

โœ…

โœ…

โœ…

โœ…

โœ…

โœ…

โœ…

Box2D

โ€”

โ€”

โœ…

โœ…

โš ๏ธ

โœ…

โœ…

MuJoCo (11 envs)

โ€”

โ€”

โ€”

โ€”

โš ๏ธ

โœ… All

โœ… All

Atari (54 games)

โ€”

โ€”

โ€”

โ€”

โœ…

โœ…

โ€”

Atari benchmarks use ALE v5 with sticky actions (repeat_action_probability=0.25). PPO tested with lambda variants (0.95, 0.85, 0.70) to optimize per-game performance. A2C uses GAE with lambda 0.95.

Trained models available on HuggingFacearrow-up-right.

Deprecations

  • Roboschool โ†’ Use Gymnasium MuJoCo (Hopper-v5, HalfCheetah-v5, etc.)

  • Unity ML-Agents / VizDoom โ†’ Removed from core; use their gymnasium wrappers

  • Multi-agent specs โ†’ Simplified to single-agent single-env

Upgrading Specs

v4 spec format:

v5 spec format:

Key differences:

  1. Remove array wrappers [{...}] โ†’ {...}

  2. Remove body section entirely

  3. Update environment names to Gymnasium versions

See Installation for full setup instructions.

Last updated

Was this helpful?