Comment on page
Sometimes for on-policy training, an agent trains at a fixed frequency, e.g. every 200 steps, instead of at the end of an episode. The signal for training also comes from the memory class in
memory.to_train. For batched training schedule, OnPolicyBatchReplay extends OnPolicyReplay to set
self.to_train = truebased on agent's training frequency and the number of experiences collected.
Suitable for on-policy algorithms.
The spec for this memory has no parameters, since it automatically flushes at the end of a batch after training. The
training_frequencyprovided by algorithm_spec.