SLM Lab
v4.2.0
v4.2.0
  • SLM Lab
  • 🖥Setup
    • Installation
    • Quick Start
  • 🚀Using SLM Lab
    • Lab Command
    • Lab Organization
    • Train: REINFORCE CartPole
    • Resume and Enjoy: REINFORCE CartPole
    • Agent Spec: DDQN+PER on LunarLander
    • Env Spec: A2C on Pong
    • GPU Usage: PPO on Pong
    • Parallelizing Training: Async SAC on Humanoid
    • Experiment and Search Spec: PPO on Breakout
    • Run Benchmark: A2C on Atari Games
    • Meta Spec: High Level Specifications
    • Post-Hoc Analysis
    • TensorBoard: Visualizing Models and Actions
    • Using SLM Lab In Your Project
  • 📈Analyzing Results
    • Data Locations
    • Graphs and Data
    • Performance Metrics
  • 🥇Benchmark Results
    • Public Benchmark Data
    • Discrete Environment Benchmark
    • Continuous Environment Benchmark
    • Atari Environment Benchmark
    • RL GIFs
  • 🔧Development
    • Modular Design
      • Algorithm Taxonomy
      • Class Inheritance: A2C > PPO
    • Algorithm
      • DQN
      • REINFORCE
      • Actor Critic
    • Memory
      • Replay
      • PrioritizedReplay
      • OnPolicyReplay
      • OnPolicyBatchReplay
    • Net
      • MLP
      • CNN
      • RNN
    • Profiling SLM Lab
  • 📖Publications and Talks
    • Book: Foundations of Deep Reinforcement Learning
    • Talks and Presentations
  • 🤓Resources
    • Deep RL Resources
    • Contributing
    • Motivation
    • Help
    • Contact
Powered by GitBook
On this page
  • GPU for Network Training
  • GPU Monitoring
  • Agent Spec for Network Using GPU
  • Running PPO on Pong
  • Using Multiple GPUs
  • Automatic GPU Rotation
  • Using CUDA_OFFSET

Was this helpful?

  1. 🚀Using SLM Lab

GPU Usage: PPO on Pong

PreviousEnv Spec: A2C on PongNextParallelizing Training: Async SAC on Humanoid

Last updated 5 years ago

Was this helpful?

GPU for Network Training

This tutorial requires a machine with a GPU with CUDA enabled. The default PyTorch installation supports GPU, so we don't need to do anything else.

If you are installing NVIDIA CUDA driver on your own hardware and encounter issues, consult .

Training a convolutional network is slow on a CPU primarily due to the large network size. When training a large network, we can use a GPU to speed up the process. In this simple tutorial we will train PPO on Pong using a GPU.

GPU does not always accelerate your training. For instance, if we use GPU to train a feedforward network with 2 layers for LunarLander, the speedup is not enough to counteract the data transfer overhead to a GPU, so the training becomes slower overall. Use GPU only for a large network.

GPU Monitoring

We can easily monitor the CPU and RAM consumption using . To monitor GPU usage, simply install an additional plugin nvidia-ml-py3.

slm_lab/spec/benchmark/ppo/ppo_pong.json
{
  "ppo_pong": {
    "agent": [{
      "name": "PPO",
      "algorithm": {
        "name": "PPO",
        ...
      },
      "memory": {
        "name": "OnPolicyBatchReplay",
      },
      "net": {
        "type": "ConvNet",
        ...
        "gpu": true
      }
    }],
    "env": [{
      "name": "PongNoFrameskip-v4",
      ...
    }],

Once your machine is set up for GPU, then using it for training is as simple as specifying "gpu": true in the agent net spec.

Let's now run a Trial using the spec file above.

python run_lab.py slm_lab/spec/benchmark/ppo/ppo_pong.json ppo_pong train

We should now see a speed up in the fps (frame per second) logged in the terminal during training. The trial should take a few hours to finish. It will then save its data to data/ppo_pong_{ts}. The trial graphs should look like the following:

We can also smoothen the trial graph by plotting its moving average over a window of 100 to obtain the graph below.

Using Multiple GPUs

Automatic GPU Rotation

If your hardware has multiple GPUs, then SLM Lab will automatically cycle through the GPU devices available when running the sessions in each trial. For example, if a trial has 4 sessions and your machine has 2 GPUs, then the sessions will get assigned:

  • session 0: GPU 0

  • session 1: GPU 1

  • session 2: GPU 0

  • session 3: GPU 1

Using CUDA_OFFSET

Sometimes it is useful to offset the GPU that a trial starts cycling through. This can be achieved by passing the shell environment variable CUDA_OFFSET=4 for example. Let's say a machine has 8 GPUs and we are running 2 trials of 4 sessions each, we'd want to utilize all the GPUs evenly. Suppose we are running PPO on Pong and PPO on QBert. Then we can do the following:

python run_lab.py slm_lab/spec/benchmark/ppo/ppo_pong.json ppo_pong train

This first trial will use GPUs 0, 1, 2, 3 for its four sessions. Next, we run the second trial using:

CUDA_OFFSET=4 python run_lab.py slm_lab/spec/benchmark/ppo/ppo_qbert.json ppo_qbert train

The second trial will then use GPUs 4, 5, 6, 7 for its four sessions. This way we can fully utilize all the 8 GPUs.

SLM Lab automatically cycle through GPUs within a single run time. This means that when running search or benchmark that involves multiple trials, it will automatically cycle through the GPUs for all the trials and sessions, so we do not need to deal with CUDA_OFFSET manually.

Agent Spec for Network Using GPU

We now look at an example spec with GPU enabled for PPO on Pong from .

Running PPO on Pong

✍️
🚀
slm_lab/spec/benchmark/ppo/ppo_pong.json
Help
glances
https://glances.readthedocs.io/en/stable/aoa/gpu.htmlglances.readthedocs.io