REINFORCE

The agent then uses the updated policy to act in the environment, and the training process repeats.

REINFORCE is an on policy algorithm. Only data that is gathered using the current policy can be used to update the parameters. Once the policy parameters have been updated all previous data gathered must be discarded and the collection process started again with the new policy.

Algorithm: REINFORCE with baseline

See reinforce.json for example specs of variations of the REINFORCE algorithm.

Basic Parameters

    "agent": [{
      "name": str,
      "algorithm": {
        "name": str,
        "action_pdtype": str,
        "action_policy": str,
        "gamma": float,
        "training_frequency": int,
        "add_entropy": bool,
        "entropy_coef": float,
      },
      "memory": {
        "name": str,
        "max_size": int
        "batch_size": int
      },
      "net": {
        "type": str,
        "hid_layers": list,
        "hid_layers_activation": str,
        "optim_spec": dict,
      }
    }],
    ...
}
  • algorithm

    • action_pdtype general param

    • action_policy string specifying which policy to use to act. For example, "Categorical" (for discrete action spaces), "Normal" (for continuous actions spaces with one dimension), or "default" to automatically switch between the two depending on the environment.

    • training_frequency how many episodes of data to collect before each training iteration. A common value is 1.

  • memory

  • net

Advanced Parameters

    "agent": [{
      "net": {
        "rnn_hidden_size": int,
        "rnn_num_layers": int,
        "seq_len": int,
        "clip_grad": bool,
        "clip_grad_val": float,
        "lr_decay": str,
        "lr_decay_frequency": int,
        "lr_decay_min_timestep": int,
        "lr_anneal_timestep": int,
        "gpu": int

      }
    }],
    ...
}

Last updated