Continuous Environment Benchmark
Last updated
Last updated
Env. \ Alg.
A2C (GAE)
A2C (n-step)
PPO
SAC
RoboschoolAnt
787
1396
1843
2915
RoboschoolAtlasForwardWalk
59.87
88.04
172
800
RoboschoolHalfCheetah
712
439
1960
2497
RoboschoolHopper
710
285
2042
2045
RoboschoolInvertedDoublePendulum
996
4410
8076
8085
RoboschoolInvertedPendulum
995
978
986
941
RoboschoolReacher
12.9
10.16
19.51
19.99
RoboschoolWalker2d
280
220
1660
1894
RoboschoolHumanoid
99.31
54.58
2388
2621*
RoboschoolHumanoidFlagrun
73.57
178
2014
2056*
RoboschoolHumanoidFlagrunHarder
-429
253
680
280*
Unity3DBall
33.48
53.46
78.24
98.44
Unity3DBallHard
62.92
71.92
91.41
97.06
Episode score at the end of training attained by SLM Lab implementations on continuous control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with
*
require 50M-100M frames, so we use the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.