Public Benchmark Data
Last updated
Last updated
SLM Lab provides a set of benchmark results that are periodically updated with new feature releases. All the result data is uploaded from a Pull Request and made public on Dropbox.
The data can be downloaded and unzipped into SLM Lab's data/
folder and rerun in enjoy mode.
For reference, the image based environment benchmarks are run on AWS GPU box p2.16xlarge
, and the non-image based environments are run on AWS CPU box m5a.24xlarge
.
The benchmark tables in this page show the Trial
level final_return_ma
from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial
is ran with 4 Session
s with different random seeds, and their final_return_ma
are averaged on the Trial
level.
The specs for these are contained in the slm_lab/spec/benchmark
folder, descriptively named {algorithm}_{environment}.json
. They can be exactly reproduced as described in Lab Organization.
SLM Lab's benchmark includes environments from the following offerings:
OpenAI gym Atari environments offers a wrapper for the Atari Learning Environment (ALE)
Deep RL algorithms use a lot of abbreviations. Here's a list to help us navigate:
A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation
A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation
A3C: Asynchronous Advantage Actor-Critic
CER: Combined Experience Replay
DDQN: Double Deep Q-Network
Async: Asynchronous
DQN: Deep Q-Network
GAE: Generalized Advantage Estimation
PER: Prioritized Experience Replay
PPO: Proximal Policy Optimization
SAC: Soft Actor-Critic
SIL: Self Imitation Learning
Read on to see the benchmark result tables and plots.