Trial
with 4 Sessions
of different random seeds to average the results. Wait for it to run until completion, which should take about 10-20 minutes. Meanwhile, check the metrics logged in the terminal. In particular, the total_reward
and its moving average total_reward_ma
(with a window of 100 episodes) should climb up gradually.total_reward_ma
reaches close to the maximum of 200, although a score of over 100 will do for this tutorial.data/reinforce_cartpole_2020_04_13_232521/
. Among other things, SLM Lab also automatically saves the final and the best model files in the model folder. The model files can be used for easy playback in enjoy mode.