What values of lambda of PPO provide the fastest, most stable solution for Atari Breakout, if the other variables are held constant?
"{key}__{space_type}": {v}
, where {space_type}
is grid_search
of ray.tune
, or any function name of np.random
:[value1, value2, ...]
str|int|float
[value1, value2, ...]
str|int|float
[low, high)
int
[low, high)
float
[low, high)
float
"explore_anneal_epi__randint": [10, 60]
will sample integers uniformly from 10 to 60 for explore_anneal_epi
"lr__uniform": [0.001, 0.1]
will sample lr
using np.random.uniform(0.001, 0.1)
grid_search
, it will be combined exhaustively in combination with other random sampling, e.g. for max_trial = 1 with one grid search of 4 elements, this will yield 4 x 1 = 1 total trials.What values of lambda of PPO provide the fastest, most stable solution for Atari Breakout, if the other variables are held constant?
agent[0].algorithm.lam
. The search spec specifies a grid search over it, and we set "meta.max_trial" to 1 since we are only doing a grid search.