Public Benchmark Data

SLM Lab provides a set of benchmark results that are periodically updated with new feature releases. All the result data is uploaded from a Pull Request and made public on Dropbox.

The data can be downloaded and unzipped into SLM Lab's data/ folder and rerun in enjoy mode.

Hardware

For reference, the image based environment benchmarks are run on AWS GPU box p2.16xlarge, and the non-image based environments are run on AWS CPU box m5a.24xlarge.

Reproducibility

The benchmark tables in this page show the Trial level final_return_ma from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial is ran with 4 Sessions with different random seeds, and their final_return_ma are averaged on the Trial level.

The specs for these are contained in the slm_lab/spec/benchmark folder, descriptively named {algorithm}_{environment}.json. They can be exactly reproduced as described in Lab Organization.

Environments

SLM Lab's benchmark includes environments from the following offerings:

Terminology

Deep RL algorithms use a lot of abbreviations. Here's a list to help us navigate:

  • A2C (GAE): Advantage Actor-Critic with GAE as advantage estimation

  • A2C (n-step): Advantage Actor-Critic with n-step return as advantage estimation

  • A3C: Asynchronous Advantage Actor-Critic

  • CER: Combined Experience Replay

  • DDQN: Double Deep Q-Network

  • Async: Asynchronous

  • DQN: Deep Q-Network

  • GAE: Generalized Advantage Estimation

  • PER: Prioritized Experience Replay

  • PPO: Proximal Policy Optimization

  • SAC: Soft Actor-Critic

  • SIL: Self Imitation Learning

Read on to see the benchmark result tables and plots.

Last updated