Public Benchmark Data

📂 Public Data

SLM Lab provides a set of benchmark results that are periodically updated with new feature releases. All the result data is uploaded from a Pull Request and made public on Google Drive.

Public benchmark data has been moved from Dropbox to Google Drive as of Aug 2022.

The data can be downloaded and unzipped into SLM Lab's data/ folder and rerun in enjoy mode.

📌 Benchmark Information

Hardware

For reference, the image based environment benchmarks are run on AWS GPU box p2.16xlarge, and the non-image based environments are run on AWS CPU box m5a.24xlarge.

Reproducibility

The benchmark tables in this page show the Trial level final_return_ma from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial is ran with 4 Sessions with different random seeds, and their final_return_ma are averaged on the Trial level.

The specs for these are contained in the slm_lab/spec/benchmark folder, descriptively named {algorithm}_{environment}.json. They can be exactly reproduced as described in Lab Organization.

Environments

SLM Lab's benchmark includes environments from the following offerings:

OpenAI gym default environments
OpenAI gym Atari environments offers a wrapper for the Atari Learning Environment (ALE)
OpenAI Roboschool
Unity ML Agents

Terminology

Deep RL algorithms use a lot of abbreviations. Here's a list to help us navigate:

A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation
A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation
A3C: Asynchronous Advantage Actor-Critic
CER: Combined Experience Replay
DDQN: Double Deep Q-Network
Async: Asynchronous
DQN: Deep Q-Network
GAE: Generalized Advantage Estimation
PER: Prioritized Experience Replay
PPO: Proximal Policy Optimization
SAC: Soft Actor-Critic
SIL: Self Imitation Learning

Read on to see the benchmark result tables and plots.

PreviousPerformance Metrics NextDiscrete Environment Benchmark

Last updated 3 years ago

Was this helpful?