SLM Lab
v4.1.1
v4.1.1
  • SLM Lab
  • 🖥Setup
    • Installation
    • Quick Start
  • 🚀Using SLM Lab
    • Lab Command
    • Lab Organization
    • Train and Enjoy: REINFORCE CartPole
    • Agent Spec: DDQN+PER on LunarLander
    • Env Spec: A2C on Pong
    • GPU Usage: PPO on Pong
    • Parallelizing Training: Async SAC on Humanoid
    • Experiment and Search Spec: PPO on Breakout
    • Run Benchmark: A2C on Atari Games
    • Meta Spec: High Level Specifications
    • Post-Hoc Analysis
    • TensorBoard: Visualizing Models and Actions
    • Using SLM Lab In Your Project
  • 📈Analyzing Results
    • Data Locations
    • Graphs and Data
    • Performance Metrics
  • 🥇Benchmark Results
    • Public Benchmark Data
    • Discrete Environment Benchmark
    • Continuous Environment Benchmark
    • Atari Environment Benchmark
    • RL GIFs
  • 🔧Development
    • Modular Design
      • Algorithm Taxonomy
      • Class Inheritance: A2C > PPO
    • Algorithm
      • DQN
      • REINFORCE
      • Actor Critic
    • Memory
      • Replay
      • PrioritizedReplay
      • OnPolicyReplay
      • OnPolicyBatchReplay
    • Net
      • MLP
      • CNN
      • RNN
    • Profiling SLM Lab
  • 📖Publications and Talks
    • Book: Foundations of Deep Reinforcement Learning
    • Talks and Presentations
  • 🤓Resources
    • Deep RL Resources
    • Contributing
    • Motivation
    • Help
    • Contact
Powered by GitBook
On this page

Was this helpful?

  1. 🔧Development
  2. Modular Design

Class Inheritance: A2C > PPO

PreviousAlgorithm TaxonomyNextAlgorithm

Last updated 5 years ago

Was this helpful?

REINFORCE > A2C (Actor-Critic) > PPO

To showcase of the taxonomy-based/class inheritance implementation in SLM Lab, Proximal Policy Optimization (PPO) is a good example. When considered as a stand alone algorithm, PPO has a number of different components. However, it differs from the Actor-Critic algorithm only in how it computes the policy loss, runs the training loop, and by needing to maintain an additional actor network during training. Figure below shows how this similarity is reflected in the SLM Lab implementation of PPO:

The result is that the PPO class in SLM Lab has five overridden methods and contains only about . Implementing it was straightforward once the was already implemented and thoroughly tested. More importantly, we can be sure that the performance difference between Actor-Critic and PPO observed in experiments using SLM Lab are due to something in the 140 lines of code that differentiate ActorCritic and PPO, and not to other implementation differences.

140 lines of code (excluding comments)
ActorCritic class
🌿
(Schulman et al., 2017)