Skip to content

Teaching several MuJoCo models to imitate expert behavior with neural networks, behavioral cloning and the DAGGER algorithm. Assignment 1 for UC Berkeley's deep RL course

License

tunamonster/expert_imitation

Repository files navigation

Results

Green lines indicate expert benchmarks, blue dots indicate average performance at iteration, red lines indicate standard deviation at iteration.

Behavioral Cloning

Run ./make_clone_results.bash to recreate the graphs Alt text Alt text Alt text Alt text Alt text Alt text

Dagger

Sample a larger distribution of states to learn how to react when the observations deviate from the optimum. Instead of learning by only observing experts, perform actions with the learner model, record expert actions, but perform learner actions. Then train on the expert actions in batches. In theory, the learner model should converge to the expert model. Run ./make_dagger_results.bash to recreate the graphs below Alt text Alt text Alt text Alt text Alt text Alt text

About

Teaching several MuJoCo models to imitate expert behavior with neural networks, behavioral cloning and the DAGGER algorithm. Assignment 1 for UC Berkeley's deep RL course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published