Beispiel #1
0
## Higher order benchmarking

The above solves the `CartPole-v0` env. However, we want to benchmark more envs. In addition, a single solve isn't representative, especially given the randomness inherent in many of these environments. What would give more specific info is to do the above benchmarking, but run it a number of times to create a distribution.

This is easily doable with the `Evolve` class and the `Benchmark.py` functions. Briefly, an `Evolve` object creates an `Agent` class object for a given env, and then `Evolve.evolve()` does RWG to try and solve that env. It returns the solve time (in number of generations).

`Benchmark.benchmark_envs()` takes a list of envs. For each, it creates a dist of the solve times, by (some specified number of times) creating a new `Evolve` object and solving the env.

Here's a simple usage, from `scripts/benchmark_example.py`:

```
import path_utils
import Benchmark

Benchmark.benchmark_envs(['CartPole-v0'], N_dist=100, N_gen=1000)
```

This will only benchmark `CartPole-v0`. It will create a distribution of solve times from `N_dist` instances of that env. Each one will have a max number of generations `N_gen` (if it doesn't solve in that time, it gets marked as the maximum time; this might be suboptimal because it's underestimating these outliers).

This produces:

<p align="center">
  <img width="640" height="480" src="misc/CartPole-v0_solve_gen_dist.png">
</p>

Something curious: even though it seems to have a well-defined Gamma-like (?) distribution shape, there are always some at the maximum `N_gen` (meaning they didn't solve). This is curious, since every iteration of `evolve()` is independent. However, since we're just testing for `mean_score` > `best_score`, it's possible that it gets a "lucky" set of weights that got a high score for its 3 episode trials, but couldn't solve it. Then, later sets that might not get as high a 3-episode score, but *would* solve it, don't get tested. This has to be looked at more.

In addition, it creates a timestamped directory in the `outputs` directory for the benchmarking run. Within that, it creates:

* For each env benchmarked, a directory with the FF plot for each run
import path_utils
import Benchmark
import os
'''
For testing various benchmarking examples.

'''

#['CartPole-v0', 'MountainCar-v0', 'MountainCarContinuous-v0', 'Pendulum-v0']
Benchmark.benchmark_envs([
    'CartPole-v0', 'MountainCar-v0', 'MountainCarContinuous-v0', 'Pendulum-v0'
],
                         N_gen=2000,
                         N_dist=25,
                         NN='FFNN_multilayer',
                         N_hidden_layers=0,
                         N_hidden_units=0,
                         act_fn='linear')

exit()

Benchmark.benchmark_vary_params(
    {
        'env_name': 'MountainCar-v0',
        'NN': 'FFNN_multilayer'
    }, {
        'N_hidden_units': [2, 4],
        'N_hidden_layers': [0, 1],
        'act_fn': ['tanh', 'relu']
    },
    N_gen=5,