Comment on page
OnPolicyBatchReplay
Sometimes for on-policy training, an agent trains at a fixed frequency, e.g. every 200 steps, instead of at the end of an episode. The signal for training also comes from the memory class in
memory.to_train
. For batched training schedule, OnPolicyBatchReplay extends OnPolicyReplay to set self.to_train = true
based on agent's training frequency and the number of experiences collected.Suitable for on-policy algorithms.
Refer to the class documentation and example memory spec from the source: slm_lab/agent/memory/onpolicy.py#L100-L110
The spec for this memory has no parameters, since it automatically flushes at the end of a batch after training. The
batch_size
is the training_frequency
provided by algorithm_spec.{
...
"agent": [{
"memory": {
"name": "OnPolicyBatchReplay"
}
}],
...
}
Last modified 2yr ago