GitHub - amathsow/TAMER-Summer-2018-Lab: Summer Research

Week 3

Berkeley Grid World
- Setup TAMER on berkely gridworld
- Make it able to save Tamer (Q table, log) on every trial
Experiment Creater and Resumer
- Modify the original experiment launcher and resumer
- Redirect all the output (print) to log files and make a copy of related files

Reading Note
- RL Reading Note - Chapter 4 DP: https://guansuns.github.io/blog/09/RL-Note-Chapter-4-DP/
Berkeley Grid World
- Add Evaluation metric: number of episode/step to reach optimal policy
- Add instant feedback: pause and wait until human feedback
- Random starting initial starting position
- Softmax action selection for exploration with overflow protection

Reading Note
- RL Reading Note - Chapter 12 Eligibility Traces: https://guansuns.github.io/blog/08/RL-Note-Chapter-12-Eligibility-Traces/
Berkeley Grid World
- Make environment deterministic
- Value Iteration Experiment: use value iteration to find the optimal (converged) values (Q-Values) in the Gridworld
- QValue Saver and Loader: save the computed q-values to json files, as well as read q-values from json files
- Record and display policy agreement ratio
- Add temperature control into Softmax action selection
- Visualize last state and action
- Statistics Module: visualize and compare experiment results

Reading Note
- RL Reading Note - Covergent Actor-Critic by Humans (COACH): https://guansuns.github.io/blog/10/RL-Reading-Note-Covergent-Actor-Critic-by-Humans-COACH/
- RL Reading Note - Deep COACH: https://guansuns.github.io/blog/11/RL-Reading-Note-Deep-COACH/
Berkeley Grid World
- Use distinct color to display 'STAY' move
- Add flags to control how many details printed on the screen as log during the experiment
- Experiment on TAMER with epsilon-greedy
- Evaluate TAMER's performance in noise environment

Reading Note
- Double DQN: https://guansuns.github.io/blog/21/RL-Reading-Note-Double-DQN/
- Deep RL from Human Preferences: https://guansuns.github.io/blog/22/Deep-RL-from-Human-Preferences/
Berkeley Grid World
- Use VDBE as epsilon annealing policy
- Use both global-wise epsilon annealing and episode-wise epsilon annealing

Reading Note
- Preference-Based RL Relevant Paper: https://guansuns.github.io/blog/23/Relevant-Papers/
Berkeley Grid World
- Record and visualize VDBE values

Reading Note
- Relevant Papers: learning through human feedback: https://guansuns.github.io/blog/02/Papers-RL-with-Human-Feedback/
Berkeley Grid World
- Add supports for preferences-based agents

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
RL Introduction		RL Introduction
Reading Notes		Reading Notes
Week1/ALE Examples		Week1/ALE Examples
Week10/BerkeleyGridWorld		Week10/BerkeleyGridWorld
Week11/FirstOpenAIGym		Week11/FirstOpenAIGym
Week12/Experiment		Week12/Experiment
Week2/ALE Tamer		Week2/ALE Tamer
Week3		Week3
Week4/BerkeleyGridWorld		Week4/BerkeleyGridWorld
Week5		Week5
Week6		Week6
Week7		Week7
Week8		Week8
Week9/BerkeleyGridWorld		Week9/BerkeleyGridWorld
experiment-logs		experiment-logs
Commands.md		Commands.md
README.md		README.md