Continuous Environment Benchmark
🥇 Continuous Environment Benchmark Result
Env. \ Alg. | A2C (GAE) | A2C (n-step) | PPO | SAC |
RoboschoolAnt | 787 | 1396 | 1843 | 2915 |
RoboschoolAtlasForwardWalk | 59.87 | 88.04 | 172 | 800 |
RoboschoolHalfCheetah | 712 | 439 | 1960 | 2497 |
RoboschoolHopper | 710 | 285 | 2042 | 2045 |
RoboschoolInvertedDoublePendulum | 996 | 4410 | 8076 | 8085 |
RoboschoolInvertedPendulum | 995 | 978 | 986 | 941 |
RoboschoolReacher | 12.9 | 10.16 | 19.51 | 19.99 |
RoboschoolWalker2d | 280 | 220 | 1660 | 1894 |
RoboschoolHumanoid | 99.31 | 54.58 | 2388 | 2621* |
RoboschoolHumanoidFlagrun | 73.57 | 178 | 2014 | 2056* |
RoboschoolHumanoidFlagrunHarder | -429 | 253 | 680 | 280* |
Unity3DBall | 33.48 | 53.46 | 78.24 | 98.44 |
Unity3DBallHard | 62.92 | 71.92 | 91.41 | 97.06 |
Episode score at the end of training attained by SLM Lab implementations on continuous control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with
*
require 50M-100M frames, so we use the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.
📈 Continuous Environment Benchmark Result Plots
Plot Legend
Last updated