๐ŸŽฎPlayground Benchmark

MuJoCo Playground PPO Benchmark Results

SLM Lab v5.3 validates PPO on MuJoCo Playgroundarrow-up-right โ€” Google DeepMind's GPU-accelerated simulation platform. MuJoCo Playground uses the MJWarp backend (Warp-accelerated MJX) for physics, enabling massively parallel training with 2048 environments on GPU.

SLM Lab wraps Playground environments as gymnasium.VectorEnv with DLPack zero-copy JAXโ†’PyTorch transfer. All 54 environments use the playground/ prefix in specs.

Results below are from March 2026 benchmark runs. All trained models and metrics are publicly available on HuggingFacearrow-up-right.

Methodology

Results show Trial-level performance:

  1. Trial = 4 Sessions with different random seeds

  2. Session = One complete training run

  3. Score = Final 100-checkpoint moving average (total_reward_ma)

The trial score is the mean across 4 sessions.

Standardized Settings

Category
num_envs
max_frame
log_frequency

Playground

2048

100e6

10000

Spec File

Spec file: ppo_playground.yamlarrow-up-right โ€” all envs via -s env=playground/ENV

Running Benchmarks

Installation

This adds JAX, MuJoCo Playground, and MJWarp dependencies. Requires a CUDA GPU.


Phase 5.1: DM Control Suite (25 envs)

Classic control and locomotion tasks from the DeepMind Control Suite, ported to MJWarp GPU simulation.

ENV
MA
SPEC_NAME
HF Data

playground/AcrobotSwingup

253.24

ppo_playground_vnorm

playground/AcrobotSwingupSparse

146.98

ppo_playground_vnorm

playground/BallInCup

942.44

ppo_playground_vnorm

playground/CartpoleBalance

968.23

ppo_playground_vnorm

playground/CartpoleBalanceSparse

995.34

ppo_playground_constlr

playground/CartpoleSwingup

729.09

ppo_playground_constlr

playground/CartpoleSwingupSparse

521.98

ppo_playground_constlr

playground/CheetahRun

883.44

ppo_playground_vnorm

playground/FingerSpin

713.35

ppo_playground_fingerspin

playground/FingerTurnEasy

663.58

ppo_playground_vnorm

playground/FingerTurnHard

590.43

ppo_playground_vnorm_constlr

playground/FishSwim

580.57

ppo_playground_vnorm_constlr_clip03

playground/HopperHop

22.00

ppo_playground_vnorm

playground/HopperStand

237.15

ppo_playground_vnorm

playground/HumanoidRun

18.83

ppo_playground_humanoid

playground/HumanoidStand

114.86

ppo_playground_humanoid

playground/HumanoidWalk

47.01

ppo_playground_humanoid

playground/PendulumSwingup

637.46

ppo_playground_pendulum

playground/PointMass

868.09

ppo_playground_vnorm_constlr

playground/ReacherEasy

955.08

ppo_playground_vnorm

playground/ReacherHard

946.99

ppo_playground_vnorm

playground/SwimmerSwimmer6

591.13

ppo_playground_vnorm_constlr

playground/WalkerRun

759.71

ppo_playground_vnorm

playground/WalkerStand

948.35

ppo_playground_vnorm

playground/WalkerWalk

945.31

ppo_playground_vnorm

AcrobotSwingup

AcrobotSwingupSparse

BallInCup

CartpoleBalance

CartpoleBalanceSparse

CartpoleSwingup

CartpoleSwingupSparse

CheetahRun

FingerSpin

FingerTurnEasy

FingerTurnHard

FishSwim

HopperHop

HopperStand

HumanoidRun

HumanoidStand

HumanoidWalk

PendulumSwingup

PointMass

ReacherEasy

ReacherHard

SwimmerSwimmer6

WalkerRun

WalkerStand

WalkerWalk


Phase 5.2: Locomotion Robots (19 envs)

Real-world robot locomotion โ€” quadrupeds (Go1, Spot, Barkour) and humanoids (H1, G1, T1, Op3, Apollo, BerkeleyHumanoid) on flat and rough terrain.

ENV
MA
SPEC_NAME
HF Data

playground/ApolloJoystickFlatTerrain

17.44

ppo_playground_loco_precise

playground/BarkourJoystick

0.0

ppo_playground_loco

playground/BerkeleyHumanoidJoystickFlatTerrain

32.29

ppo_playground_loco_precise

playground/BerkeleyHumanoidJoystickRoughTerrain

21.25

ppo_playground_loco_precise

playground/G1JoystickFlatTerrain

1.85

ppo_playground_loco_precise

playground/G1JoystickRoughTerrain

-2.75

ppo_playground_loco_precise

playground/Go1Footstand

23.48

ppo_playground_loco_precise

playground/Go1Getup

18.16

ppo_playground_loco_go1

playground/Go1Handstand

17.88

ppo_playground_loco_precise

playground/Go1JoystickFlatTerrain

0.0

ppo_playground_loco

playground/Go1JoystickRoughTerrain

0.00

ppo_playground_loco

playground/H1InplaceGaitTracking

11.95

ppo_playground_loco_precise

playground/H1JoystickGaitTracking

31.11

ppo_playground_loco_precise

playground/Op3Joystick

0.00

ppo_playground_loco

playground/SpotFlatTerrainJoystick

48.58

ppo_playground_loco_precise

playground/SpotGetup

19.39

ppo_playground_loco

playground/SpotJoystickGaitTracking

36.90

ppo_playground_loco

playground/T1JoystickFlatTerrain

13.42

ppo_playground_loco_precise

playground/T1JoystickRoughTerrain

2.58

ppo_playground_loco_precise

ApolloJoystickFlatTerrain

BarkourJoystick

BerkeleyHumanoidJoystickFlatTerrain

G1JoystickFlatTerrain

Go1Footstand

Go1Handstand

H1InplaceGaitTracking

H1JoystickGaitTracking

Op3Joystick

SpotFlatTerrainJoystick

SpotGetup

SpotJoystickGaitTracking

BerkeleyHumanoidJoystickRoughTerrain

Go1Getup

Go1JoystickFlatTerrain

Go1JoystickRoughTerrain

T1JoystickFlatTerrain

T1JoystickRoughTerrain


Phase 5.3: Manipulation (10 envs)

Robotic manipulation โ€” Panda arm pick/place, Aloha bimanual, Leap dexterous hand, and AeroCube orientation tasks.

ENV
MA
SPEC_NAME
HF Data

playground/AeroCubeRotateZAxis

-3.09

ppo_playground_loco

playground/AlohaHandOver

3.65

ppo_playground_loco

playground/AlohaSinglePegInsertion

220.93

ppo_playground_manip_aloha_peg

playground/LeapCubeReorient

74.68

ppo_playground_loco

playground/LeapCubeRotateZAxis

91.65

ppo_playground_loco

playground/PandaOpenCabinet

11081.51

ppo_playground_loco

playground/PandaPickCube

4586.13

ppo_playground_loco

playground/PandaPickCubeCartesian

10.58

ppo_playground_loco

playground/PandaPickCubeOrientation

4281.66

ppo_playground_loco

playground/PandaRobotiqPushCube

1.31

ppo_playground_loco

AeroCubeRotateZAxis

AlohaHandOver

AlohaSinglePegInsertion

LeapCubeReorient

LeapCubeRotateZAxis

PandaOpenCabinet

PandaPickCube

PandaPickCubeCartesian

PandaPickCubeOrientation

PandaRobotiqPushCube

Last updated

Was this helpful?