๐ฎPlayground Benchmark
MuJoCo Playground PPO Benchmark Results
SLM Lab v5.3 validates PPO on MuJoCo Playground โ Google DeepMind's GPU-accelerated simulation platform. MuJoCo Playground uses the MJWarp backend (Warp-accelerated MJX) for physics, enabling massively parallel training with 2048 environments on GPU.
SLM Lab wraps Playground environments as gymnasium.VectorEnv with DLPack zero-copy JAXโPyTorch transfer. All 54 environments use the playground/ prefix in specs.
Results below are from March 2026 benchmark runs. All trained models and metrics are publicly available on HuggingFace.
Methodology
Results show Trial-level performance:
Trial = 4 Sessions with different random seeds
Session = One complete training run
Score = Final 100-checkpoint moving average (
total_reward_ma)
The trial score is the mean across 4 sessions.
Standardized Settings
Playground
2048
100e6
10000
Spec File
Spec file: ppo_playground.yaml โ all envs via -s env=playground/ENV
Running Benchmarks
Installation
This adds JAX, MuJoCo Playground, and MJWarp dependencies. Requires a CUDA GPU.
Phase 5.1: DM Control Suite (25 envs)
Classic control and locomotion tasks from the DeepMind Control Suite, ported to MJWarp GPU simulation.
playground/AcrobotSwingupSparse
146.98
ppo_playground_vnorm
playground/CartpoleBalance
968.23
ppo_playground_vnorm
playground/CartpoleBalanceSparse
995.34
ppo_playground_constlr
playground/CartpoleSwingup
729.09
ppo_playground_constlr
playground/CartpoleSwingupSparse
521.98
ppo_playground_constlr
playground/FingerSpin
713.35
ppo_playground_fingerspin
playground/FingerTurnHard
590.43
ppo_playground_vnorm_constlr
playground/FishSwim
580.57
ppo_playground_vnorm_constlr_clip03
playground/HumanoidRun
18.83
ppo_playground_humanoid
playground/HumanoidStand
114.86
ppo_playground_humanoid
playground/HumanoidWalk
47.01
ppo_playground_humanoid
playground/PendulumSwingup
637.46
ppo_playground_pendulum
playground/PointMass
868.09
ppo_playground_vnorm_constlr
playground/SwimmerSwimmer6
591.13
ppo_playground_vnorm_constlr

























Phase 5.2: Locomotion Robots (19 envs)
Real-world robot locomotion โ quadrupeds (Go1, Spot, Barkour) and humanoids (H1, G1, T1, Op3, Apollo, BerkeleyHumanoid) on flat and rough terrain.
playground/ApolloJoystickFlatTerrain
17.44
ppo_playground_loco_precise
playground/BarkourJoystick
0.0
ppo_playground_loco
playground/BerkeleyHumanoidJoystickFlatTerrain
32.29
ppo_playground_loco_precise
playground/BerkeleyHumanoidJoystickRoughTerrain
21.25
ppo_playground_loco_precise
playground/G1JoystickFlatTerrain
1.85
ppo_playground_loco_precise
playground/G1JoystickRoughTerrain
-2.75
ppo_playground_loco_precise
playground/Go1Footstand
23.48
ppo_playground_loco_precise
playground/Go1Handstand
17.88
ppo_playground_loco_precise
playground/Go1JoystickFlatTerrain
0.0
ppo_playground_loco
playground/Go1JoystickRoughTerrain
0.00
ppo_playground_loco
playground/H1InplaceGaitTracking
11.95
ppo_playground_loco_precise
playground/H1JoystickGaitTracking
31.11
ppo_playground_loco_precise
playground/SpotFlatTerrainJoystick
48.58
ppo_playground_loco_precise
playground/SpotJoystickGaitTracking
36.90
ppo_playground_loco
playground/T1JoystickFlatTerrain
13.42
ppo_playground_loco_precise
playground/T1JoystickRoughTerrain
2.58
ppo_playground_loco_precise






![]()
![]()



![]()






Phase 5.3: Manipulation (10 envs)
Robotic manipulation โ Panda arm pick/place, Aloha bimanual, Leap dexterous hand, and AeroCube orientation tasks.
playground/AeroCubeRotateZAxis
-3.09
ppo_playground_loco
playground/AlohaSinglePegInsertion
220.93
ppo_playground_manip_aloha_peg
playground/LeapCubeReorient
74.68
ppo_playground_loco
playground/LeapCubeRotateZAxis
91.65
ppo_playground_loco
playground/PandaOpenCabinet
11081.51
ppo_playground_loco
playground/PandaPickCube
4586.13
ppo_playground_loco
playground/PandaPickCubeCartesian
10.58
ppo_playground_loco
playground/PandaPickCubeOrientation
4281.66
ppo_playground_loco
playground/PandaRobotiqPushCube
1.31
ppo_playground_loco










Last updated
Was this helpful?