Comment on page
OnPolicyReplay differs from the plain Replay in that when
memory.sample()is called in a training step, it returns all of its stored elements, then flushes its storage clean. By default, on-policy training happens at the end of every episode.
Suitable for on-policy algorithms.
The spec for this memory has no parameters, since it automatically flushes at the end of an episode after training.