โšกAsync Training: Hogwild!

This tutorial covers asynchronous training using Hogwild!โ€”a technique for parallelizing network training across multiple processes with shared parameters.

circle-info

Educational Purpose: Hogwild! is included primarily for learning about async RL architectures. For production training, use PPO with vectorized environments (num_envs)โ€”it's simpler and more efficient.

How Hogwild! Works

Hogwild!arrow-up-right enables lock-free parallel training by having multiple workers update shared network parameters simultaneously. SLM Lab implements this using PyTorch multiprocessingarrow-up-right with shared memory.

Worker 1 โ”€โ”ฌโ”€โ†’ Shared Global Network โ†โ”€โ”ฌโ”€ Worker 3
Worker 2 โ”€โ”˜        (CPU)              โ””โ”€ Worker 4

Each worker:

  1. Collects experience from its own environment

  2. Computes gradients on its local network

  3. Pushes gradients to the shared global network

  4. Pulls updated weights from global network

circle-exclamation

Meta Spec for Hogwild!

Enable distributed training in the meta spec:

{
  "meta": {
    "distributed": "synced",  // or "shared"
    "max_session": 4          // Number of parallel workers
  }
}

Distributed Modes

Mode
Behavior
Use Case

"synced"

Sync parameters after each training step

A3C (on-policy)

"shared"

Continuous parameter sharing

Async SAC (off-policy)

false

Disabled (default)

Standard training

Key Requirements

  • GlobalAdam or GlobalRMSprop โ€” Optimizers that support shared state across processes

  • max_session > 1 โ€” Number of parallel workers

A3C on Pong

A3C (Mnih et al., 2016arrow-up-right) uses "synced" mode for on-policy training.

Spec: slm_lab/spec/benchmark/a3c/a3c_gae_pong.jsonarrow-up-right

Run:

Async SAC on Humanoid

For off-policy algorithms like SAC, use "shared" mode for continuous parameter sharing.

Spec: slm_lab/spec/benchmark/async_sac/async_sac_mujoco.jsonarrow-up-right

Run:

With 16 parallel sessions, a 50M frame run completes much faster than sequential training.

circle-info

Frame counting: The x-axis shows per-session frames. Total frames = per-session ร— max_session.

Historical Results (v4)

These graphs are from v4 async SAC training:

Async SAC Humanoid returns
Async SAC Humanoid moving average

For validated v5 Humanoid results using synchronous PPO, see Continuous Benchmarkโ€”PPO achieves 3774 on Humanoid-v5.

Comparison: Async vs Vectorized

For most use cases, vectorized environments are simpler and faster:

Aspect

Vectorized (num_envs)

Hogwild! (distributed)

Parallelism

Environment stepping

Network training

Complexity

Simple

Complex (multiprocessing)

GPU

Full GPU acceleration

Global nets on CPU

Use case

Production

Learning, CPU-bound

When Hogwild! Helps

Hogwild! can help when:

  • Network training is the bottleneck (not environment stepping)

  • You have many CPU cores available

  • Learning about async RL architectures

For most RL workloads, environment stepping is the bottleneck, so vectorized environments (num_envs) are more effective.

Historical Context

A3C was groundbreaking when GPUs were expensive and CPU parallelism was the main scaling strategy. Today, GPU-accelerated vectorized training (PPO, A2C) is more practical for most use cases.

SLM Lab includes async training for:

  • Understanding async RL architectures

  • Reproducing classic papers

  • CPU-only training scenarios

Last updated

Was this helpful?