Algorithm is the main class which implements an RL algorithm. This includes declaring its networks and variables, acting, sampling from memory, and training. It initializes its networks and memory by simply calling the Memory and Net classes with their specs. The loss functions for the algorithms is also implemented here.
- action_pdtype: specifies the probability distribution that actions are sampled from. For example, "Argmax" or "Categorical" for discrete action spaces, or "Normal", "MultivariateNormal", and "Gumbel" for continuous action spaces. These are declared in slm_lab/agent/algorithm/policy_util.py#L18-L24
- gammahow much to discount the future for the returns. 0 corresponds to complete myopia, the agent only cares about the current time step. 1 corresponds to no discounting. Each future state matters as much as the current state.