action_policystring specifying which policy to use to act. For example, "Categorical" (for discrete action spaces), "Normal" (for continuous actions spaces with one dimension), or "default" to automatically switch between the two depending on the environment.
training_frequencyhow many episodes of data to collect before each training iteration. A common value is 1.
entropywhether to add entropy to the
entropy_coefcoefficient to multiply the entropy of the distribution with when adding it to