Last updated
Last updated
In this tutorial, we will learn how to run an experiment to study the following example question:
What values of lambda of PPO provide the fastest, most stable solution for Atari Breakout, if the other variables are held constant?
In SLM Lab, we can easily run experiments to answer questions about deep RL. An Experiment in SLM Lab runs a number of Trials using a search spec by generating different sets of hyperparameters to search over (using ) and running a Trial for each one.
The search spec has the following format:
That is, the search spec can contain any components of a spec file. To perform search over a spec variable, simply:
mirror the spec for that variable
append a search mode to the variable key
define the config space for the search mode
For example:
"explore_anneal_epi__randint": [10, 60]
will sample integers uniformly from 10 to 60 for explore_anneal_epi
"lr__uniform": [0.001, 0.1]
will sample lr
using np.random.uniform(0.001, 0.1)
When constructing a new Trial, an Experiment samples an instance from the config space, then updates the original spec with the sampled values before passing it to the Trial constructor.
By default, an Experiment will run search for as many Trials as specified by "max_trial" in meta spec using Random sampling from the full config space. If any key uses grid_search
, it will be combined exhaustively in combination with other random sampling, e.g. for max_trial = 1 with one grid search of 4 elements, this will yield 4 x 1 = 1 total trials.
As an example, let's try to answer the question:
What values of lambda of PPO provide the fastest, most stable solution for Atari Breakout, if the other variables are held constant?
This file defines the spec for PPO and Breakout as usual. Corresponding to the question, we are interested in finding out the effect of different values of agent[0].algorithm.lam
. The search spec specifies a grid search over it, and we set "meta.max_trial" to 1 since we are only doing a grid search.
Let's run an Experiment using the spec file above by using the search lab mode:
At the end of the experiment, we will obtain the usual trial graphs. Additionally, it will also produce an experiment graph, as shown below:
Just as how we can plot the moving average version of a trial graph, we can do the same for experiment graph:
From the experiment graph, we can observe that trial 1 (red) with lam: 0.7 performs the best on Breakout with the fastest convergence and the best final result.
Essentially, search spec using "{key}__{space_type}": {v}
, where {space_type}
is grid_search
of ray.tune
, or any function name of np.random
:
Let's look at the search spec for PPO on Breakout from .
This will spawn 6 trials in queue using , which are then dequeued to run . Since we specify a trial to run 4 sessions, it will take up 4 CPUs and 4 GPUs. If we run this on a machine with 32 CPUs and 8 GPUs, the experiment will run 2 trials at any given time.
space_type
v
v type
grid_search
[value1, value2, ...]
str|int|float
choice
[value1, value2, ...]
str|int|float
randint
[low, high)
int
uniform
[low, high)
float
normal
[low, high)
float