๐ŸงฌNet

Overview

Net classes implement neural network architectures used as function approximators in RL algorithms. SLM Lab provides flexible, swappable networks that work with any algorithm.

Code: slm_lab/agent/netarrow-up-right

Network Types

Type
Input
Use Case
Example Environments

Vectors

Low-dimensional states

CartPole, LunarLander, MuJoCo

Images

Pixel observations

Atari games

Sequences

Partial observability

POMDPs

HydraMLPNet

Multiple vectors

Multi-head architectures

Multi-task learning

DuelingMLPNet

Vectors

Q-learning

LunarLander (value decomposition)

DuelingConvNet

Images

Q-learning

Atari (value decomposition)

Quick Selection Guide

Is your observation an image?
โ”œโ”€โ”€ Yes โ†’ ConvNet
โ””โ”€โ”€ No โ†’ Is there partial observability?
    โ”œโ”€โ”€ Yes โ†’ RecurrentNet
    โ””โ”€โ”€ No โ†’ MLPNet

For Q-learning algorithms (DQN family), consider Dueling variants for better value estimation.

Network Spec

Configure networks in the agent spec:

Common Parameters

Architecture

Parameter
Description
Typical Values

type

Network class

"MLPNet", "ConvNet", "RecurrentNet"

hid_layers

Hidden layer sizes

[64, 64] (simple), [256, 256] (complex)

hid_layers_activation

Activation function

"relu", "tanh", "leaky_relu"

out_layer_activation

Output activation

null (none), "tanh"

init_fn

Weight initialization

"orthogonal_", "xavier_uniform_"

Actor-Critic Networks

Parameter
Description
Typical Values

shared

Share weights between actor/critic

true (Atari), false (MuJoCo)

use_same_optim

Use same optimizer for both

true, false

actor_optim_spec

Actor optimizer

{"name": "Adam", "lr": 3e-4}

critic_optim_spec

Critic optimizer

{"name": "Adam", "lr": 3e-4}

Training

Parameter
Description
Typical Values

clip_grad_val

Gradient clipping norm

0.5-10.0

loss_spec

Loss function

{"name": "MSELoss"}, {"name": "SmoothL1Loss"}

lr_scheduler_spec

Learning rate schedule

See below

ConvNet Parameters

Parameter
Description
Default

normalize

Normalize pixel input by dividing by 255

false

batch_norm

Apply batch normalization after conv layers

true (ConvNet), false (DuelingConvNet)

Device

Parameter
Description
Values

gpu

GPU usage

"auto" (detect), true (force), false (CPU only)

Learning Rate Schedules

Decay learning rate during training:

Available schedules:

  • "LinearToZero" - Linear decay from initial LR to 0

  • "StepLR" - Step decay at fixed intervals

  • "ExponentialLR" - Exponential decay

Example Specs

MLP for CartPole/MuJoCo

ConvNet for Atari (Nature CNN)

RecurrentNet for POMDPs

DQN Target Network

For DQN algorithms, a separate target network is created automatically:

Network Architecture Tips

CartPole / Simple Control

  • Small networks work well

  • tanh activation is common for bounded outputs

MuJoCo / Continuous Control

  • Larger networks for complex dynamics

  • Orthogonal initialization helps with gradients

Atari / Image-Based

  • Nature CNN architecture is standard

  • shared: true for actor-critic

Box2D (LunarLander, BipedalWalker)

  • Medium-sized networks

  • relu works well

GPU Usage

SLM Lab handles GPU placement automatically:

For multi-GPU setups, see GPU Usage: PPO on Pong.

Last updated

Was this helpful?