The search spec enables hyperparameter optimization using Ray Tune with ASHA (Asynchronous Successive Halving Algorithm). ASHA terminates underperforming trials early, focusing compute on promising configurations.
The Search Spec Structure
Add a search section to your spec with {key}__{space_type} syntax:
You can search any spec field using dot notation. Common hyperparameters:
Algorithm Hyperparameters
Parameter
Path
Typical Range
Impact
Discount factor
agent.algorithm.gamma
0.9-0.999
High
GAE lambda
agent.algorithm.lam
0.7-0.99
High
Learning rate
agent.net.optim_spec.lr
1e-5 to 1e-3
High
Entropy coefficient
agent.algorithm.entropy_coef_spec.start_val
0.001-0.1
Medium
Clip epsilon (PPO)
agent.algorithm.clip_eps_spec.start_val
0.1-0.3
Medium
Time horizon
agent.algorithm.time_horizon
64-2048
Medium
Minibatch size
agent.algorithm.minibatch_size
32-512
Low
Training epochs
agent.algorithm.training_epoch
3-10
Low
Network Hyperparameters
Parameter
Path
Typical Range
Impact
Hidden layers
agent.net.hid_layers
[64,64] to [512,512]
Medium
Activation
agent.net.hid_layers_activation
relu, tanh
Low
Gradient clip
agent.net.clip_grad_val
0.5-10
Low
Focus on high-impact parameters first. Learning rate, gamma, and lambda typically have the largest effect on performance.
Search Space Types
Type
Syntax
Description
Best For
uniform
[low, high]
Uniform distribution
Bounded continuous (gamma, lam)
loguniform
[low, high]
Log-uniform distribution
Learning rates, small values
choice
[v1, v2, ...]
Sample from list
Discrete options, architectures
randint
[low, high]
Random integer
Batch sizes, layer counts
Note:grid_search is not supported with Optuna. Use choice instead for exhaustive enumeration.
Examples
Prefer continuous distributions (uniform, loguniform) over choice when possible. Continuous distributions allow ASHA to interpolate and find optimal values, while choice only samples from a fixed list.
ASHA Configuration
ASHA (Asynchronous Successive Halving Algorithm) terminates underperforming trials early. Configure it in the meta spec:
ASHA Settings
Setting
Description
Typical Value
max_session
Sessions per trial
1 (required for fair comparison)
max_trial
Total trials to run
8-30
grace_period
Minimum frames before first eval
5-10% of max_frame
reduction_factor
Keep top 1/N of trials at each rung
3 (keeps top 1/3)
search_resources.gpu
GPU fraction per trial
0.125 (8 trials per GPU)
How ASHA works: At each checkpoint, ASHA terminates the bottom 2/3 of trials and continues the top 1/3. A 16-trial search might only run 5-6 trials to completion.
Grace Period by Environment
Environment
grace_period
Reasoning
Classic Control
10000-50000
Fast learning, quick signal
Box2D
50000-100000
Medium complexity
MuJoCo
100000-1000000
Slower learning curves
Atari
500000-1000000
Need significant training for signal
Search Budget Sizing
Rule: ~3-4 trials per search dimension minimum.
max_trial
Max Dimensions
Use Case
8
2-3
Focused refinement
12-16
3-4
Typical search
20
5
Wide exploration
30
6-7
Broad ASHA search
Common mistake: Too many dimensions wastes trials on under-sampled combinations. Search 2-3 parameters at a time, then fix the best values and search others.
Three-Stage Search Process
For robust hyperparameter tuning:
Stage
Mode
Config
Purpose
1. ASHA
search
max_session=1, search_scheduler
Wide exploration
2. Multi
search
max_session=4, no scheduler
Validate top configs
3. Final
train
Best hyperparameters
Confirmation run
ASHA stage: Quick exploration across many configurations
Multi stage: Run top 3-5 configs with multiple seeds (no early stopping)
Final stage: Update spec defaults with best hyperparameters
For benchmarks: Do not use search results directly. Always run a final validation with train mode and the committed spec file.
Example: PPO Lambda Search on Breakout
Breakout is a classic Atari benchmarkโbreak bricks by bouncing a ball with a paddle.
Running the Search
Ray Tune queues trials and runs them as resources free up. With gpu: 0.125, 8 trials run in parallel on a single GPU.
Analyzing Results
Results are saved to data/ppo_breakout_{timestamp}/:
File
Contents
info/experiment_df.csv
All trial results, sorted best-first
t{N}/ subdirectories
Per-trial session data and models
Reading experiment_df.csv
After Search
Check experiment_df.csv for top-performing configurations
Narrow search range around best values (if needed)
Run validation with max_session=4 (no ASHA)
Update spec defaults with final values
Results
After a search run, SLM Lab generates experiment-level graphs that help you analyze which hyperparameters work best.
Multi-Trial Graph
The multi-trial graph overlays all trials, showing how different hyperparameter configurations compare:
Multi-Trial Graph
Each color represents a different trial (hyperparameter configuration). This quickly shows which settings learn fastest and achieve the highest scores.
Experiment Variable Graph
The experiment variable graph plots final performance against hyperparameter values:
Experiment Variable Graph
X-axis: Hyperparameter value (e.g., lambda)
Y-axis: Performance metric (strength)
Color: Overall trial quality (darker = better)
This reveals the relationship between hyperparameters and performanceโuseful for narrowing search ranges.
Best Configuration
From this lambda search, PPO achieves 327 MA on Breakout-v5 with ฮป=0.70.
For full benchmarking methodology and results across 54 Atari games, see Atari Benchmark.
Algorithm-Specific Search Recommendations
Different algorithms have different sensitive hyperparameters:
Algorithm
High-Impact Parameters
Typical Search
DQN/DDQN
lr, gamma, explore_var_spec.end_step
Learning rate and exploration schedule
A2C
lr, gamma, lam, entropy_coef_spec.start_val
GAE parameters and entropy
PPO
lr, gamma, lam, clip_eps_spec.start_val
GAE parameters and clipping
SAC
lr, gamma, alpha (entropy)
Learning rate and entropy coefficient
Example Search Blocks by Algorithm
PPO (policy gradient):
DQN (value-based):
SAC (off-policy):
Finding Search Examples
Benchmark specs often include search blocks from tuning. Check existing specs:
Search blocks don't affect train mode. You can leave a search block in a specโit's only used when running slm-lab run ... search. The train mode ignores it.
# Find specs with search blocks
grep -r '"search"' slm_lab/spec/benchmark/
# See a complete search example
cat slm_lab/spec/benchmark/ppo/ppo_cartpole.json
# Run ASHA search
slm-lab run spec.json spec_name search
# Check results
cat data/spec_name_*/info/experiment_df.csv
# Validate best config (update spec first, then)
slm-lab run spec.json spec_name train