SLM Lab
Search…
OnPolicyReplay

OnPolicyReplay

OnPolicyReplay differs from the plain Replay in that when memory.sample()is called in a training step, it returns all of its stored elements, then flushes its storage clean. By default, on-policy training happens at the end of every episode.
Suitable for on-policy algorithms.

Source Documentation

Refer to the class documentation and example memory spec from the source: slm_lab/agent/memory/onpolicy.py#L14-L35​

Example Memory Spec

The spec for this memory has no parameters, since it automatically flushes at the end of an episode after training.
1
{
2
...
3
"agent": [{
4
"memory": {
5
"name": "OnPolicyReplay"
6
}
7
}],
8
...
9
}
Copied!
For more concrete examples of memory spec specific to algorithms, refer to the existing spec files.