algorithm
action_policy
string specifying which policy to use to act. For example, "Categorical" (for discrete action spaces), "Normal" (for continuous actions spaces with one dimension), or "default" to automatically switch between the two depending on the environment.training_frequency
how many episodes of data to collect before each training iteration. A common value is 1.entropy
whether to add entropy to the entropy_coef
coefficient to multiply the entropy of the distribution with when adding it to memory
batch_size
number of examples to collect before training. Only relevant for batch on policy memory: "OnPolicyBatchReplay"net
net