GitHub - northtiger/Deep-Reinforcement-Learning-Udacity: Projects and algorithms in the framework of Deep Reinforcement Learning

Deep Reinforcement Learning Nanodegree Udacity

Monte-Carlo Methods
In Monte Carlo (MC), we play episodes of the game until we reach the end, we grab the rewards collected on the way
and move backward to the start of the episode. We repeat this method a sufficient number of times and we average
the value of each state.
Temporal Difference Methods and Q-learning
Reinforcement Learning in Continuous Space (Deep Q-Network)
Function Approximation and Neural Network
The Universal Approximation Theorem (UAT) states that feed-forward neural networks containing a
single hidden layer with a finite number of nodes can be used to approximate any continuous function provided
rather mild assumptions about the form of the activation function are satisfied.
Policy-Based Methods, Hill Climbing, Simulating Annealing
Random-restart hill climbing is a surprisingly effective algorithm in many cases. Simulated annealing is a good probabilistic technique because it does not accidentally think a local extrema is a global extrema.
Policy-Gradient Methods, REINFORCE, PPO
Define a performance measure J(\theta) to maximaze. Learn policy paramter \theta throgh approximate gradient ascent.
Actor-Critic Methods, A3C, A2C, DDPG, SAC
The key difference from A2C is the Asynchronous part. A3C consists of multiple independent agents(networks) with
their own weights, who interact with a different copy of the environment in parallel. Thus, they can explore
a bigger part of the state-action space in much less time.

Name		Name	Last commit message	Last commit date
Latest commit History 545 Commits
BipedalWalker-A2C-VectorizedEnv		BipedalWalker-A2C-VectorizedEnv
BipedalWalker-PPO-VectorizedEnv		BipedalWalker-PPO-VectorizedEnv
BipedalWalker-Soft-Actor-Critic		BipedalWalker-Soft-Actor-Critic
BipedalWalker-TwinDelayed-DDPG (TD3)		BipedalWalker-TwinDelayed-DDPG (TD3)
CarRacing-From-Pixels-PPO		CarRacing-From-Pixels-PPO
CartPole-Policy-Based-Hill-Climbing		CartPole-Policy-Based-Hill-Climbing
CartPole-Policy-Gradient-Reinforce		CartPole-Policy-Gradient-Reinforce
Cartpole-Deep-Q-Learning		Cartpole-Deep-Q-Learning
Cartpole-Double-Deep-Q-Learning		Cartpole-Double-Deep-Q-Learning
LunarLander-v2-DQN		LunarLander-v2-DQN
Markov-Decision-Process_6x6		Markov-Decision-Process_6x6
Pong-Policy-Gradient-PPO		Pong-Policy-Gradient-PPO
Pong-Policy-Gradient-REINFORCE		Pong-Policy-Gradient-REINFORCE
Project-1_Navigation-DQN		Project-1_Navigation-DQN
Project-2_Continuous-Control-Crawler-PPO		Project-2_Continuous-Control-Crawler-PPO
Project-2_Continuous-Control-Reacher-DDPG		Project-2_Continuous-Control-Reacher-DDPG
Project-3_Collaboration_Competition-Tennis-Maddpg		Project-3_Collaboration_Competition-Tennis-Maddpg
Udacity Certificate		Udacity Certificate
README.md		README.md
policy-gradient-methods-2.jpg		policy-gradient-methods-2.jpg