This is a variation of Value Iteration Network, NIPS 2016 [arxiv] .
The main idea building upon original VIN, is to iterate a generated step-wise reward map in the value-iteration loop, for learning to plan in a dynamic scene. This work can be combined with "Video Prediction" techniques, and it is still in progress. Currently, it is trained on the ground-truth state in the simulator.
We use A3C + Curriculum Learning for Rl-training scheme, similar to [Wu et al, ICLR 2017]. Due to the skeleton method of pygame rendering, instead of multiple threads training at the same time, here we use multi-processes generating experience from simulator for multiple agents to learn from.
The a3c.py
defines the policy/value network with a share structure (a3c) embedded with a VI Module, as the following,
The agent.py
indicates the single agent and interaction with the environment in reinforcement learning stage, which includes the async with global model and the training methods.
The thread.py
contains high-level distributed training with tf.train.ClusterSpec
, and curriculum settings.
The constants.py
defines all the hyper-parameters.
- Start training:
bash train_scipt.sh
- Open tmux for monitoring:
tmux a -t a3c
(you can monitor each thread by switching tmux control pane:ctrl + b, w
) - Open tensorboard: **.**.**.**:15000
- Check log:
less Curriculum log
- Stop training:
ctrl + c
- Tensorflow 1.1
- Pygame
- Numpy
I completed this code when I was an intern at Horizon Robotics. Greatly thanks my mentor Penghong Lin, and Lisen Mu for helpful discussions.