OnPolicyBatchReplay

Code: slm_lab/agent/memory/onpolicy.py

Sometimes for on-policy training, an agent trains at a fixed frequency, e.g. every 200 steps, instead of at the end of an episode. The signal for training also comes from the memory class in memory.to_train. For batched training schedule, OnPolicyBatchReplay extends OnPolicyReplay to set self.to_train = true based on agent's training frequency and the number of experiences collected.

Suitable for on-policy algorithms.

Source Documentation

Refer to the class documentation and example memory spec from the source: slm_lab/agent/memory/onpolicy.py#L100-L110

Example Memory Spec

The spec for this memory has no parameters, since it automatically flushes at the end of a batch after training. The batch_size is the training_frequency provided by algorithm_spec.

{
    ...
    "agent": [{
      "memory": {
        "name": "OnPolicyBatchReplay"
      }
    }],
    ...
}

For more concrete examples of memory spec specific to algorithms, refer to the existing spec files.

PreviousOnPolicyReplay NextNet

Last updated 3 years ago

Was this helpful?