OnPolicyReplay
OnPolicyReplay
Code: slm_lab/agent/memory/onpolicy.py
OnPolicyReplay differs from the plain Replay in that when memory.sample()
is called in a training step, it returns all of its stored elements, then flushes its storage clean. By default, on-policy training happens at the end of every episode.
Suitable for on-policy algorithms.
Source Documentation
Refer to the class documentation and example memory spec from the source: slm_lab/agent/memory/onpolicy.py#L14-L35
Example Memory Spec
The spec for this memory has no parameters, since it automatically flushes at the end of an episode after training.
For more concrete examples of memory spec specific to algorithms, refer to the existing spec files.
Last updated