Skip to content

Gaussian process optimization using GPyOpt for Unity ML-Agents Toolkit

License

Notifications You must be signed in to change notification settings

Coac/GPyOpt-ml-agents

Repository files navigation

GPyOpt-ml-agents

Gaussian process optimization using GPyOpt for Unity ML-Agents Toolkit.

If you are fed up with tuning your ml-agents training parameters by hand, this repository is made for you!

Requirements

Follow the ml-agents installation documentation to install it as a python module: Link

If you plan to use a GPU, you need to modify the ml-agents code to prevent tensorflow session to not allocate all the memory. Set this gpu_options.allow_growth = True to the options when creating the session. For more info see this

And install repository dependencies:

pip install -r requirements.txt

Usage

Since Unity ML-Agents is using grpc, make sure you do not have any set proxy.

The parameters contained in the config trainer_config.yaml is used as default when the parameters are not specified. Make sure your max_steps is greater than summary_freq and the steps required to finish an episode.

The code uses a slightly modified ml-agents learn.py which accepts an additional parameter to specify the trainer config file.

Gaussian process optimization

Modify the config file hyperopt_conf.py to fit your environment and run

python hyperopt.py <env>

The optimization algorithm will explore the parameters space and try to maximize the final reward at max_steps. The reward is read in the tensorboard summary event file.

The batch_size variable specifies the number of instances launched in parallel. The total number of trains will be batch_size * max_iter + batch_size as the algorithm will run an initial batch to collect points before running the optimization.

For more details, see GPyOpt documentation

Grid search

If your training is too long to run multiple iterations, you can always use the grid search.

Specify your hyperparameter space dict as params_grid in grid_search_conf.py and run

python grid_search.py <env>

The grid search currently runs in parallel all the combinations of the parameters space without taking into account the number of CPU core, so beware to not specify a too large search space.

You can stop the training to generate the graph for each instance whenever you want using a simple CTRL+C. It is also possible to reload the saved model to continue training using the --load option.

Example

ml-agents 3DBall environment

We are searching for the best hyperparameters that maximize the reward at 10k steps.

Config and Parameters space:

definition = [{
  'name': 'learning_rate',
  'type': 'continuous',
  'domain': [1e-5, 1e-3]
}, {
  'name': 'epsilon',
  'type': 'continuous',
  'domain': [0.1, 0.3]
}, {
  'name': 'gamma',
  'type': 'continuous',
  'domain': [0.8, 0.995]
}, {
  'name': 'lambd',
  'type': 'continuous',
  'domain': [0.9, 0.95]
}, {
  'name': 'num_epoch',
  'type': 'discrete',
  'domain': [3, 10]
}, {
  'name': 'beta',
  'type': 'continuous',
  'domain': [1e-4, 1e-2]
}, {
  'name': 'num_layers',
  'type': 'discrete',
  'domain': [1, 3]
}, {
  'name': 'hidden_units',
  'type': 'discrete',
  'domain': [32, 64, 128, 256, 512]
}]

batch_size = 8
num_cores = 32
max_iter = 16

Optimal params found achieving 96.80000305175781 as mean reward:

learning_rate: 1.0e-03
epsilon: 0.3
gamma: 0.995
lambd: 0.95
num_epoch: 3
beta: 1.0e-02
num_layers: 3
hidden_units: 256

screenshot

Releases

No releases published

Packages

No packages published

Languages