find the optimal value for each grid cell
Here we save the the value for each state action pair.
Which is defining how good an action is by taking the action and how much we can get from the state we land in.
In this case instead of finding the max value over all action. we will take the value for the defined policy
So the only difference in the equation is the absence of max()
python policy_evaluation.py