Public Benchmark Data
📂 Public Data
SLM Lab provides a set of benchmark results that are periodically updated with new feature releases. All the result data is uploaded from a Pull Request and made public on Google Drive.
Public benchmark data has been moved from Dropbox to Google Drive as of Aug 2022.
The data can be downloaded and unzipped into SLM Lab's data/
folder and rerun in enjoy mode.
📌 Benchmark Information
Hardware
For reference, the image based environment benchmarks are run on AWS GPU box p2.16xlarge
, and the non-image based environments are run on AWS CPU box m5a.24xlarge
.
Reproducibility
The benchmark tables in this page show the Trial
level final_return_ma
from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial
is ran with 4 Session
s with different random seeds, and their final_return_ma
are averaged on the Trial
level.
The specs for these are contained in the slm_lab/spec/benchmark
folder, descriptively named {algorithm}_{environment}.json
. They can be exactly reproduced as described in Lab Organization.
Environments
SLM Lab's benchmark includes environments from the following offerings:
OpenAI gym Atari environments offers a wrapper for the Atari Learning Environment (ALE)
Terminology
Deep RL algorithms use a lot of abbreviations. Here's a list to help us navigate:
A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation
A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation
A3C: Asynchronous Advantage Actor-Critic
CER: Combined Experience Replay
DDQN: Double Deep Q-Network
Async: Asynchronous
DQN: Deep Q-Network
GAE: Generalized Advantage Estimation
PER: Prioritized Experience Replay
PPO: Proximal Policy Optimization
SAC: Soft Actor-Critic
SIL: Self Imitation Learning
Read on to see the benchmark result tables and plots.
Last updated