memory.to_train
. For batched training schedule, OnPolicyBatchReplay extends OnPolicyReplay to set self.to_train = true
based on agent's training frequency and the number of experiences collected.batch_size
is the training_frequency
provided by algorithm_spec.