Skip to content

A pytorch implementation of the advantage actor-critic (A2C) algorithm (Mnih et al. 2016). Used as a demo for NEU330 Computational Modeling of Psychological Function, Spring 2019.

License

qihongl/demo-advantage-actor-critic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

demo-A2C

A demo of the discrete action space advantage actor critic (A2C) (Mnih et al. 2016).

The animation below shows the learned behavior on CartPole-v0. The goal is to keep the pole upright. For comparison, here's a random policy.

Here's the learning curve:

How to use:

The dependencies are: pytorch, gym, numpy, matplotlib, seaborn. The lastest version should work.

For training (the default environment is CartPole-v0):

python train.py

For rendering the learned behavior:

python render.py

The agent should be runnable on any environemnt with a discrete action space. To run the agent on some other environment, type python train.py -env ENVIRONMENT_NAME.

For example, the same architecture can also solve Acrobot-v1:

... and LunarLander-v2:

dir structure:

.
├── LICENSE
├── README.md
├── figs                            # figs           
├── log                             # pre-trained weights 
├── requirements.txt
└── src
    ├── models
    │   ├── _A2C_continuous.py      # gaussian A2C
    │   ├── _A2C_discrete.py        # multinomial A2C
    │   ├── _A2C_helper.py          # some helper funcs 
    │   ├── __init__.py
    │   └── utils.py                
    ├── render.py                   # render the trained policy 
    ├── train.py                    # train a model 
    └── utils.py

Reference:

[1] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., … Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning. Retrieved from http://arxiv.org/abs/1602.01783

[2] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. Retrieved from http://arxiv.org/abs/1606.01540

[3] pytorch/examples/reinforcement_learning/actor_critic

[4] Slides from Deep Reinforcement Learning, CS294-112 at UC Berkeley

About

A pytorch implementation of the advantage actor-critic (A2C) algorithm (Mnih et al. 2016). Used as a demo for NEU330 Computational Modeling of Psychological Function, Spring 2019.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages