๐Changelog
This page documents major framework releases.
For detailed code changes, see the GitHub releases and CHANGELOG.md in the code repository.
SLM-Lab v5.0.0
Modernization release for the current RL ecosystem. This release updates SLM Lab to work with the modern Python RL stack while maintaining backward compatibility with the book Foundations of Deep Reinforcement Learning.
Book readers: For exact code from Foundations of Deep Reinforcement Learning, use git checkout v4.1.1
Critical: Atari v5 Sticky Actions
SLM-Lab uses Gymnasium ALE v5 defaults. v5 default repeat_action_probability=0.25 (sticky actions) randomly repeats agent actions to simulate console stochasticity, making evaluation harder but more realistic than v4 default 0.0 used by most benchmarks (CleanRL, SB3, RL Zoo). This follows Machado et al. (2018) research best practices. See ALE version history.
Why v5?
The RL ecosystem has evolved significantly since SLM Lab v4:
OpenAI Gym โ Gymnasium: OpenAI deprecated Gym in 2022. Gymnasium (by Farama Foundation) is the maintained fork with better API design
Roboschool โ MuJoCo: Roboschool was abandoned. MuJoCo became free in 2022 and is now the standard for continuous control
Conda โ uv: Modern Python dependency management is faster and more reliable with
uvSimpler specs: Removed legacy multi-agent abstractions that added complexity without benefit
Key Changes
Package manager
conda
uv
Environment library
OpenAI Gym
Gymnasium
Continuous control
Roboschool
MuJoCo
Entry point
python run_lab.py
slm-lab run
Spec format
Arrays with body
Simple objects
Migration Summary
conda activate lab && python run_lab.py
slm-lab run
CartPole-v0
CartPole-v1
Acrobot-v1
Acrobot-v1 (unchanged)
Pendulum-v0
Pendulum-v1
LunarLander-v2
LunarLander-v3
PongNoFrameskip-v4
ALE/Pong-v5
BreakoutNoFrameskip-v4
ALE/Breakout-v5
RoboschoolHopper-v1
Hopper-v5 (MuJoCo)
RoboschoolHalfCheetah-v1
HalfCheetah-v5 (MuJoCo)
RoboschoolHumanoid-v1
Humanoid-v5 (MuJoCo)
agent: [{...}], env: [{...}], body: {...}
agent: {...}, env: {...}
Gymnasium API Change
The most significant change is how episode endings are handled. v5 uses the modern Gymnasium API which separates episode endings into two distinct signals:
What's the difference?
terminated: Episode ended due to the task itself (goal reached, agent died, game over)
truncated: Episode ended due to external limits (time limit, max steps reached)
Why does this matter?
This distinction is critical for correct value bootstrapping in RL algorithms:
In v4, algorithms had to guess whether done=True meant a real ending or just a time limit. This led to subtle bugs and inconsistent behavior. All SLM Lab v5 algorithms handle this correctly.
New v5 Features
Algorithm improvements:
PPO:
normalize_v_targetsfor running statistics normalization,symlog_transform(from DreamerV3),clip_vloss(CleanRL-style)SAC: Discrete action support uses exact expectation (Christodoulou 2019). Target entropy auto-calculated.
Networks: Optional
layer_normfor MLP hidden layerslife_loss_info: Proper Atari game-over handling (continue after life loss)
Infrastructure:
Ray Tune ASHA search for efficient hyperparameter tuning
dstack integration for cloud GPU training
HuggingFace integration for experiment storage and sharing
Benchmarks:
All algorithms validated on Gymnasium. Full results in Benchmark Results.
Classic Control
โ
โ
โ
โ
โ
โ
โ
Box2D
โ
โ
โ
โ
โ ๏ธ
โ
โ
MuJoCo (11 envs)
โ
โ
โ
โ
โ ๏ธ
โ All
โ All
Atari (54 games)
โ
โ
โ
โ
โ
โ
โ
Atari benchmarks use ALE v5 with sticky actions (repeat_action_probability=0.25). PPO tested with lambda variants (0.95, 0.85, 0.70) to optimize per-game performance. A2C uses GAE with lambda 0.95.
Trained models available on HuggingFace.
Deprecations
Roboschool โ Use Gymnasium MuJoCo (
Hopper-v5,HalfCheetah-v5, etc.)Unity ML-Agents / VizDoom โ Removed from core; use their gymnasium wrappers
Multi-agent specs โ Simplified to single-agent single-env
Upgrading Specs
v4 spec format:
v5 spec format:
Key differences:
Remove array wrappers
[{...}]โ{...}Remove
bodysection entirelyUpdate environment names to Gymnasium versions
See Installation for full setup instructions.
Last updated
Was this helpful?