REINFORCE
The agent then uses the updated policy to act in the environment, and the training process repeats.
REINFORCE is an on policy algorithm. Only data that is gathered using the current policy can be used to update the parameters. Once the policy parameters have been updated all previous data gathered must be discarded and the collection process started again with the new policy.
Algorithm: REINFORCE with baseline
See reinforce.json for example specs of variations of the REINFORCE algorithm.
Basic Parameters
algorithm
name
general paramaction_pdtype
general paramaction_policy
string specifying which policy to use to act. For example, "Categorical" (for discrete action spaces), "Normal" (for continuous actions spaces with one dimension), or "default" to automatically switch between the two depending on the environment.gamma
general paramtraining_frequency
how many episodes of data to collect before each training iteration. A common value is 1.
memory
name
general param. Compatible types; "OnPolicyReplay", "OnPolicyBatchReplay"batch_size
number of examples to collect before training. Only relevant for batch on policy memory: "OnPolicyBatchReplay"
net
type
general param. Compatible types; all networks.hid_layers
general paramhid_layers_activation
general paramoptim_spec
general param
Advanced Parameters
net
rnn_hidden_size
general paramrnn_num_layers
general paramseq_len
general paramclip_grad
: general paramclip_grad_val
: general paramlr_decay
: general paramlr_decay_frequency
: general paramlr_decay_min_timestep
: general paramlr_anneal_timestep
: general paramgpu
: general param
Last updated