SLM Lab
v4.1.1
v4.1.1
  • SLM Lab
  • 🖥Setup
    • Installation
    • Quick Start
  • 🚀Using SLM Lab
    • Lab Command
    • Lab Organization
    • Train and Enjoy: REINFORCE CartPole
    • Agent Spec: DDQN+PER on LunarLander
    • Env Spec: A2C on Pong
    • GPU Usage: PPO on Pong
    • Parallelizing Training: Async SAC on Humanoid
    • Experiment and Search Spec: PPO on Breakout
    • Run Benchmark: A2C on Atari Games
    • Meta Spec: High Level Specifications
    • Post-Hoc Analysis
    • TensorBoard: Visualizing Models and Actions
    • Using SLM Lab In Your Project
  • 📈Analyzing Results
    • Data Locations
    • Graphs and Data
    • Performance Metrics
  • 🥇Benchmark Results
    • Public Benchmark Data
    • Discrete Environment Benchmark
    • Continuous Environment Benchmark
    • Atari Environment Benchmark
    • RL GIFs
  • 🔧Development
    • Modular Design
      • Algorithm Taxonomy
      • Class Inheritance: A2C > PPO
    • Algorithm
      • DQN
      • REINFORCE
      • Actor Critic
    • Memory
      • Replay
      • PrioritizedReplay
      • OnPolicyReplay
      • OnPolicyBatchReplay
    • Net
      • MLP
      • CNN
      • RNN
    • Profiling SLM Lab
  • 📖Publications and Talks
    • Book: Foundations of Deep Reinforcement Learning
    • Talks and Presentations
  • 🤓Resources
    • Deep RL Resources
    • Contributing
    • Motivation
    • Help
    • Contact
Powered by GitBook
On this page
  • Algorithm API
  • Algorithm Spec

Was this helpful?

  1. 🔧Development

Algorithm

PreviousClass Inheritance: A2C > PPONextDQN

Last updated 5 years ago

Was this helpful?

Algorithm API

Code:

Algorithm is the main class which implements an RL algorithm. This includes declaring its networks and variables, acting, sampling from memory, and training. It initializes its networks and memory by simply calling the and classes with their specs. The loss functions for the algorithms is also implemented here.

Each algorithm comes with a number of hyperparameters that can be specified through a .

Algorithm Spec

{
    ...
    "agent": [{
      "name": str,
      "algorithm": {
        "name": str,
        "action_pdtype": str,
        "action_policy": str,
        "gamma": float,
        ...,
      },
      ...
    }],
    ...
}
  • gamma ∈[0,1]\in[0,1]∈[0,1] how much to discount the future for the returns. 0 corresponds to complete myopia, the agent only cares about the current time step. 1 corresponds to no discounting. Each future state matters as much as the current state.

name: name of an implemented algorithm class. This must be a class that conforms to the and is saved in a .py file under

action_pdtype: specifies the probability distribution that actions are sampled from. For example, "Argmax" or "Categorical" for discrete action spaces, or "Normal", "MultivariateNormal", and "Gumbel" for continuous action spaces. These are declared in

action_policy: specifies how the agent should act. e.g. "epsilon_greedy". These are declared in

Other algorithm spec hyperparameters are specific to algorithm implementations. For those, refer to the class documentation of algorithms in .

For more concrete examples of algorithm spec specific to algorithms, refer to the existing .

To learn more about algorithms, check out .

The subpages to follow showcase a subset of algorithms in SLM Lab. See for the list of implemented algorithms in SLM Lab.

🏗️
slm_lab/agent/algorithm
Memory
Net
agent spec file
algorithm api
slm_lab/agent/algorithm
slm_lab/agent/algorithm/policy_util.py#L18-L24
slm_lab/agent/algorithm/policy_util.py#L133
slm_lab/agent/algorithm
spec files
Deep RL Resources
here