Thesis duration:
- start date: 05.02.2018 - 04.08.2018
- DP (Dynamic Programming)
- TD (Temporal Difference Learning)
- MC (Monte Carlo)
- PG (Policy Gradient)
- EM (Expected Maximum)
- REPS (Relative Entropy Policy Search)
- fREPS (f-divergence Relative Entropy Policy Search)
- TRPO (Trust Region Policy Optimization)
- PPO (Proximal Policy Optimization)
Code:
Lecture: