GitHub

HW1

Command

$python3 main.py [testcaseNum] [vi/pi]

Policy Iteration

initialize our policy to be a random policy and load our initial value.
evaluate policy and derive our value function
improve our policy by choosing greedily.
if the new value function and the old one is the same down to the third decimal places, terminate. Otherwise, keep on repeating 2. and 3.

Value Iteration

load our initial value
update value function using the vlue iteration equation in the lecture.
If the new value funciton and the old one is the same down to the third deciaml places, terminates. Otherwise repeat step 3.
Update policy once by choosing greedily.

Testing

I used the unittest library in python to test the correctness of each function. This way, we can avoid printing out value or policy function and checking them manuelly for correctness. Code below is one of the function test.

Test Result

case1

takes 5 steps of policy iteration to converge
takes 3 steps of value iteration to converge

case2

takes 8 steps of policy iteration to converge
takes 6 steps of value iteration to converge

case3

takes 8 steps of policy iteration to converge
takes 6 steps of value iteration to converge

case4

takes 6 steps of policy iteration to converge
takes 4 steps of value iteration to converge

Problem and Discussion

Value iteration seems to be faster than policy iteration.
I forgot to update value synchronously, which leads to weird result. To solve this, I create a new value array and store the newly updated value in the array. Then copy the new array to replace the value array when we have iterated through all the state in the array.
At first i check float value with "assertEqual" and it always result in error whenever I ran the test. I later figure out that, for float number, we use "assertAlmostEqual" to test them. This will test the number equality only down to the seventh deciam point.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
code		code
icons		icons
testcase		testcase
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

code

code

icons

icons

testcase

testcase

README.md

README.md

Repository files navigation

HW1

Command

Policy Iteration

Value Iteration

Testing

Test Result

case1

case2

case3

case4

Problem and Discussion

About

Releases

Packages

Languages

bobcheng15/RL_DP

Folders and files

Latest commit

History

Repository files navigation

HW1

Command

Policy Iteration

Value Iteration

Testing

Test Result

case1

case2

case3

case4

Problem and Discussion

About

Resources

Stars

Watchers

Forks

Languages