Continuous Environment Benchmark

🥇 Continuous Environment Benchmark Result

Env. \ Alg.

A2C (GAE)

A2C (n-step)

PPO

SAC

RoboschoolAnt

787

1396

1843

2915

RoboschoolAtlasForwardWalk

59.87

88.04

172

800

RoboschoolHalfCheetah

712

439

1960

2497

RoboschoolHopper

710

285

2042

2045

RoboschoolInvertedDoublePendulum

996

4410

8076

8085

RoboschoolInvertedPendulum

995

978

986

941

RoboschoolReacher

12.9

10.16

19.51

19.99

RoboschoolWalker2d

280

220

1660

1894

RoboschoolHumanoid

99.31

54.58

2388

2621*

RoboschoolHumanoidFlagrun

73.57

178

2014

2056*

RoboschoolHumanoidFlagrunHarder

-429

253

680

280*

Unity3DBall

33.48

53.46

78.24

98.44

Unity3DBallHard

62.92

71.92

91.41

97.06

Episode score at the end of training attained by SLM Lab implementations on continuous control problems. Reported episode scores are the average over the last 100 checkpoints, and then averaged over 4 Sessions. Results marked with * require 50M-100M frames, so we use the hybrid synchronous/asynchronous version of SAC to parallelize and speed up training time.