Results

Green lines indicate expert benchmarks, blue dots indicate average performance at iteration, red lines indicate standard deviation at iteration.

Behavioral Cloning

Run ./make_clone_results.bash to recreate the graphs

Dagger

Sample a larger distribution of states to learn how to react when the observations deviate from the optimum. Instead of learning by only observing experts, perform actions with the learner model, record expert actions, but perform learner actions. Then train on the expert actions in batches. In theory, the learner model should converge to the expert model. Run ./make_dagger_results.bash to recreate the graphs below

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
experts		experts
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
behavioral_cloning.py		behavioral_cloning.py
dagger.py		dagger.py
expert_data_stats.py		expert_data_stats.py
load_policy.py		load_policy.py
make_clone_results.bash		make_clone_results.bash
make_dagger_graphs.py		make_dagger_graphs.py
make_dagger_results.bash		make_dagger_results.bash
make_expert_actions.bash		make_expert_actions.bash
make_graphs.py		make_graphs.py
model.py		model.py
parse_expert.py		parse_expert.py
run_expert.py		run_expert.py
tf_util.py		tf_util.py
util.py		util.py

License

tunamonster/expert_imitation

Folders and files

Latest commit

History

Repository files navigation

Results

Behavioral Cloning

Dagger

About

Resources

License

Stars

Watchers

Forks

Languages