Deep RL research requires comparing algorithms fairly. If two implementations differ in subtle ways beyond the algorithm itselfโnetwork initialization, gradient clipping, learning rate schedulesโperformance differences become meaningless.
SLM Lab solves this with modular design: algorithms share the same base code, networks, memory, and training loops. When you compare PPO to A2C in SLM Lab, the only differences are the algorithmic ones.
Benefits:
Benefit
How SLM Lab Achieves It
Fair comparison
Algorithms share base classes; only algorithmic differences matter
Code reuse
New algorithms inherit tested components
Fewer bugs
Less code to review and maintain
Easy extension
Add new algorithms by overriding specific methods
Core Components
SLM Lab is built around three base classes:
Agent
โโโ Algorithm Handles interaction with env, computes loss, runs training
โ โโโ Net Neural network function approximator
โโโ Memory Data storage and retrieval for training
SLM Lab Class Diagram
Algorithm
Implements the RL algorithm logic:
Action selection (exploration vs exploitation)
Loss computation (policy loss, value loss)
Training step (gradient updates)
Key methods:
Memory
Handles experience storage:
Store transitions (s, a, r, s', done)
Sample batches for training
Signal when it's time to train
Types:
Replay - Off-policy ring buffer
PrioritizedReplay - Priority-based sampling
OnPolicyBatchReplay - On-policy with fixed batch size
Components can be mixed and matched via the spec file:
Example compositions:
Combination
What It Creates
DQN + Replay + MLPNet
Standard DQN
DoubleDQN + PrioritizedReplay + MLPNet
DDQN+PER
DQN + Replay + RecurrentNet
DRQN (recurrent DQN)
PPO + OnPolicyBatchReplay + ConvNet
PPO for Atari
Why This Design?
Henderson et al. (2017) showed that implementation details dramatically affect RL results. Two "correct" implementations of the same algorithm can have vastly different performance due to:
Random seed selection
Network initialization
Hyperparameter defaults
Gradient clipping values
Frame preprocessing
SLM Lab addresses this by:
Shared base code - Algorithms inherit common functionality
Explicit configuration - All hyperparameters in spec files
Reproducibility - Spec + git SHA fully defines an experiment
Benchmarking - Validated implementations with published results
Extending SLM Lab
To add a new algorithm:
Then register in slm_lab/agent/algorithm/__init__.py and create a spec file.