p2.16xlarge
, and the non-image based environments are run on AWS CPU box m5a.24xlarge
.Trial
level final_return_ma
from SLM Lab. This is final value of the 100-ckpt moving average of the return (total rewards) from evaluation. Each Trial
is ran with 4 Session
s with different random seeds, and their final_return_ma
are averaged on the Trial
level.slm_lab/spec/benchmark
folder, descriptively named {algorithm}_{environment}.json
. They can be exactly reproduced as described in Lab Organization.