Skip to content

wesenu/Interpolated-Policy-Gradient-with-PPO-for-Robotics-Control-

 
 

Repository files navigation

  • Some Results based on FetchReach-v0 environment between IPG and PPO:

A slide for initial result and comparison among PPO, IPG and HER+IPG based on multi-goal RL environment (FetchReach-v0 from Robotic of Gym).

(./Results/Apr-12_23:16-Seed1234.png)

  • TODO:

  • Combine with Experience Replay!!

  • Clean up current neural networks settings and

  • Check if it's under correct formula from the paper

  • Combine with basic IPG

  • More debug, verify if the algorithm is working as it was totally created based on my understanding of the algorithm diagram from the initial paper (it works)

  • More experiments on pushing, sliding, pick&place

  • Generalize to hindsight experience replay and compare with experience replay

  • Change tanh to ReLU

  • Change mini batch size to 256 (so far is 500)

  • For pushing, sliding tasks, need to train more episodes (in original papar, they train for 50 epochs (one epoch consists of 19 · 2 · 50 = 1 900 full episodes, which amounts to a total of 4.75 · 106 timesteps). And also improve the time steps to 2500 per episode for these tasks. (Should not change the time steps configuration in Gym, but must train with much more episodes, in Multi-goal Reinforcement learning (no.5 reference), they train more than 50000 episodes to see little improvement).

  • Try to delete coefficient (1/ET) for on-policy loss

  • Add a stochastic target policy for critic of IPG

  • Reference:

 "CODE IS FAR AWAY FROM BUG WITH THE ANIMAL PROTECTING"
 
 *          ##2      ##2
 *-##1   ┏-##1
 *_┛ ┻---━┛_┻━━┓
 *    ┃           ┃     
 *    ┃   ━       ┃    
 *    ┃ @^   @^    ┃   
 *    ┃        ┃
 *    ┃   ┻    ┃
 *_      _*     ┗━┓   ┏━┛
 *      ┃   ┃神兽保佑
 *      ┃   ┃永无BUG*      ┃   ┗━━━┓----|
 *      ┃         ┣┓}}}
 *      ┃         ┏┛
 *      ┗┓&&&-&&&┓┏┛-|
 *       ┃┫┫  ┃┫┫
 *       ┗┻┛  ┗┻┛
 *
 *
 "CODE IS FAR AWAY FROM BUG WITH THE ANIMAL PROTECTING"

Jianing Sun

Last modified: April 23th, 2018

About

Reinforcement Learning for robotics continuous control, mainly based on Proximal Policy Optimization, extending to Interpolated Policy Gradient and Hindsight Experience Replay (HER)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%