Distributional Policy Optimization for Multi-Goal Reinforcement Learning

In this respository, we implmented our proposed distributional policy optimization model for continuous control tasks. Environments where we evaluated can be find in Experimental Settings section.

Our distributional policy optimization model consists of an actor network and a distributional value network. This is an actor critic structure but with a different alogrithm to optimize. Our algorithm is adopted to estimate the value distribution or the distribution of the sum of discounted rewards where the algorithm is under an actor critic structure to deal with robotic control problems. We present a distributional policy optimization where a trajectory is taken into account to carry out the value distribution for policy optimization. We also compare the result with other algorithms like proximal policy optimization (PPO) and actor critic experience replay (ACER).

Preliminary

python 3
- numpy
- scipy
  - to store .mat file
- pytorch
  - to train model
- OpenAI baselines
  - to parallel enviroment
- OpenAI gym
  - enviroment
- mujoco
  - preliminary of gym for continuous control
- sacred
  - hyperparameter management

How to train the model

Simply train the model by typing the following command

$ python train.py

in this dictionary. If you want to change the setting for the model, you can directly edit the configuration file

./common/config.py

or you can use the experimental setting in configuration dictionary like

$ python train.py with ./config/InvertedPendulum-v2/DPPO.json

We recommand you to write your own configuration file to train the model if you want to change the setting.

$ python train.py with /path/to/your/config/file.json

You can print the setting by using the following comand.

$ python train.py print_config with /path/to/your/config/file.json

How to test the model

Simply test the model typing the command

$ python test.py with /path/to/your/config/file.json

in this dictionary. It will report the statistics of return over 100 epoch.

How to get record from the model

Simply record the model typing the command

$ python record.py with /path/to/your/config/file.json

in this dictionary. Then use matlab to get the result.

Experimental Settings

Framework
- python 3.5.2
- pytorch 1.0.0
- numpy 1.15.4
- scipy 1.2.0
- sacried 0.7.4
- baselines git@f7fcc80
- gym git@f6ff898
- mujoco-py git@f7b4191
Hardware
- CPU: Intel Core i7-7700 @3.60 GHz
- RAM: DDR3 2400MHz 64GB
- GPU: Geforce GTX 1080Ti, Geforce GTX 980
- CUDA Version: 9.0.176
- CuDNN Version: 7.1.2
Environment
- InvertedPendulum-v2
- Reacher-v2

Experimental Result

On-policy comparison in inverted pendulum

The reward curve is smoothed by averaging the returns of recent 500 episodes with one standard deviation shaded area. And the final performance statistics of return over 100 epoch.

Algorithm	min	max	median	mean std
PPO	377.00	1000.00	1000.00	989.98±72.23
Distri. PPO	1000.00	1000.00	1000.00	1000.00±0.00

Off-policy comparison in reacher

The reward curve is smoothed by averaging the returns of recent 500 episodes with one standard deviation shaded area. And the final performance statistics of return over 100 epoch.

Algorithm	min	max	median	mean std
ACER	-21.01	-2.02	-10.33	-11.02±4.67
Distri. ACER	-16.28	-3.47	-9.67	-9.79±3.10

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
common		common
config		config
image		image
plot script		plot script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
record.py		record.py
test.py		test.py
train.py		train.py
train_goal.py		train_goal.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

config

config

image

image

plot script

plot script

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

record.py

record.py

test.py

test.py

train.py

train.py

train_goal.py

train_goal.py

Repository files navigation

Distributional Policy Optimization for Multi-Goal Reinforcement Learning

Preliminary

How to train the model

How to test the model

How to get record from the model

Experimental Settings

Experimental Result

On-policy comparison in inverted pendulum

Off-policy comparison in reacher

About

Releases

Packages

Languages

License

NCTUMLlab/Yong-Guan-Liao-Distributional-Policy-Optimization-for-Multi-Goal-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Distributional Policy Optimization for Multi-Goal Reinforcement Learning

Preliminary

How to train the model

How to test the model

How to get record from the model

Experimental Settings

Experimental Result

On-policy comparison in inverted pendulum

Off-policy comparison in reacher

About

Resources

License

Stars

Watchers

Forks

Languages