예제 #1
0
# -

# Loads all the logs from the above time range
bulk = slio.load_a_list_of_logs(logs)

# ## Parse logs and visualize
#
# You will notice in here that reward graps are missing, as are many others from the training. These have been trimmed down for clarity.
#
# Do not get tricked though - this notebook provides features that the training one doesn't have, such as batch visualisation of race submission laps.
#
# Side note: Evaluation/race logs contain a reward field but it's not connected to your reward. It is there most likely to ensure logs have consistent structure to make their parsing easier. The value appears to be dependand on distance of the car from the centre of the track. As such it provides no value and is not visualised in this notebook.

# +
simulation_agg = au.simulation_agg(bulk,
                                   'stream',
                                   add_timestamp=True,
                                   is_eval=True)
complete_ones = simulation_agg[simulation_agg['progress'] == 100]

# This gives the warning about ptp method deprecation. The code looks as if np.ptp was used, I don't know how to fix it.
au.scatter_aggregates(simulation_agg, is_eval=True)
if complete_ones.shape[0] > 0:
    au.scatter_aggregates(complete_ones, "Complete ones", is_eval=True)
# -

# ## Data in tables

# View fifteen most progressed attempts
simulation_agg.nlargest(15, 'progress')

# View fifteen fastest complete laps
예제 #2
0
# #### Total reward per episode
#
# This graph has been taken from the orignal notebook and can show progress on certain groups of behaviours. It usually forms something like a triangle, sometimes you can see a clear line of progress that shows some new way has been first taught and then perfected.
#
# #### Mean completed lap times per iteration
#
# Once we have a model that completes laps reasonably often, we might want to know how fast the car gets around the track. This graph will show you that. I use it quite often when looking for a model to shave a couple more miliseconds. That said it has to go in pair with the last one:
#
# #### Completion rate per iteration
#
# It represents how big part of all episodes in an iteration is full laps. The value is from range [0, 1] and is a result of deviding amount of full laps in iteration by amount of all episodes in iteration. I say it has to go in pair with the previous one because you not only need a fast lapper, you also want a race completer.
#
# The higher the value, the more stable the model is on a given track.

# +
simulation_agg = au.simulation_agg(df)

au.analyze_training_progress(simulation_agg, title='Training progress')
# -
# ### Stats for all laps
#
# Previous graphs were mainly focused on the state of training with regards to training progress. This however will not give you a lot of information about how well your reward function is doing overall.
#
# In such case `scatter_aggregates` may come handy. It comes with three types of graphs:
# * progress/steps/reward depending on the time of an episode - of this I find reward/time and new_reward/time especially useful to see that I am rewarding good behaviours - I expect the reward to time scatter to look roughly triangular
# * histograms of time and progress - for all episodes the progress one is usually quite handy to get an idea of model's stability
# * progress/time_if_complete/reward to closest waypoint at start - these are really useful during training as they show potentially problematic spots on track. It can turn out that a car gets best reward (and performance) starting at a point that just cannot be reached if the car starts elsewhere, or that there is a section of a track that the car struggles to get past and perhaps it's caused by an aggressive action space or undesirable behaviour prior to that place
#
# Side note: `time_if_complete` is not very accurate and will almost always look better for episodes closer to 100% progress than in case of those 50% and below.

au.scatter_aggregates(simulation_agg, 'Stats for all laps')
EPISODES_PER_ITERATION = 20  #  Set to value of your hyperparameter in training

data = slio.load_data(fname)
df = slio.convert_to_pandas(data,
                            episodes_per_iteration=EPISODES_PER_ITERATION)

df = df.sort_values(['episode', 'steps'])
# personally I think normalizing can mask too high rewards so I am commenting it out,
# but you might want it.
# slio.normalize_rewards(df)

#Uncomment the line of code below to evaluate a different reward function
#nr.new_reward(df, l_center_line, 'reward.reward_sample') #, verbose=True)

# + jupyter={"source_hidden": true}
simulation_agg = au.simulation_agg(df)
au.analyze_training_progress(simulation_agg, title='Training progress')

# + jupyter={"source_hidden": true}
au.scatter_aggregates(simulation_agg, 'Stats for all laps')

# + jupyter={"source_hidden": true}
complete_ones = simulation_agg[simulation_agg['progress'] == 100]

if complete_ones.shape[0] > 0:
    au.scatter_aggregates(complete_ones, 'Stats for complete laps')
else:
    print('No complete laps yet.')
# -

# View five best rewarded in completed laps (according to new_reward if you are using it)