Public Benchmark Data

SLM Lab provides a set of benchmark results that are periodically updated with new feature releases. All the result data is uploaded from a Pull Request and made public on Google Drive.

Public benchmark data has been moved from Dropbox to Google Drive as of Aug 2022.

The data can be downloaded and unzipped into SLM Lab's data/ folder and rerun in enjoy mode.

Hardware

For reference, the image based environment benchmarks are run on AWS GPU box p2.16xlarge, and the non-image based environments are run on AWS CPU box m5a.24xlarge.

Reproducibility

The benchmark tables in this page show the Trial level final_return_ma from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial is ran with 4 Sessions with different random seeds, and their final_return_ma are averaged on the Trial level.

The specs for these are contained in the slm_lab/spec/benchmark folder, descriptively named {algorithm}_{environment}.json. They can be exactly reproduced as described in Lab Organization.

Environments

SLM Lab's benchmark includes environments from the following offerings:

Terminology

Deep RL algorithms use a lot of abbreviations. Here's a list to help us navigate:

  • A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation

  • A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation

  • A3C: Asynchronous Advantage Actor-Critic

  • CER: Combined Experience Replay

  • DDQN: Double Deep Q-Network

  • Async: Asynchronous

  • DQN: Deep Q-Network

  • GAE: Generalized Advantage Estimation

  • PER: Prioritized Experience Replay

  • PPO: Proximal Policy Optimization

  • SAC: Soft Actor-Critic

  • SIL: Self Imitation Learning

Read on to see the benchmark result tables and plots.

Last updated