Python print_policy示例

编程语言: Python

命名空间/包名称: policy_evaluation

方法/功能: print_policy

hotexamples.com的示例: 2

Python print_policy - 已找到2个示例。这些是从开源项目中提取的最受好评的policy_evaluation.print_policy现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： monte_carlo.py 项目： ArslanThobani/RL-Algorithms

        (0, 1): 'R',
        (0, 2): 'R',
        (1, 2): 'R',
        (2, 1): 'R',
        (2, 2): 'R',
        (2, 3): 'U',
    }

    V = {}
    returns = {}  # dictionary of state -> list of returns we've received
    states = grid.all_states()
    for s in states:
        if s in grid.actions:
            returns[s] = []
        else:
            V[s] = 0

    for t in range(100):
        states_and_returns = play_game(grid, policy)
        seen_states = set()
        for s, G in states_and_returns:
            if s not in seen_states:
                returns[s].append(G)
                V[s] = np.mean(returns[s])
                seen_states.add(s)

    print("values:")
    print_values(V, grid)
    print("policy:")
    print_policy(policy, grid)

示例#2

显示文件

文件： policy_iteration_random_state_space.py 项目： KonstantinosNikolakakis/Robot_in_a_grid

    for s in states:
        if (
                len(actions[states.index(s)]) != 0
        ):  # Check for terminal or unreachable positions, they have no further action
            random_index = np.random.choice(
                np.arange(len(actions[states.index(
                    s)])))  #Choose randomnly one of the allowed next positions
            policy_list.append(actions[states.index(s)][random_index])
        else:
            policy_list.append(
                '     ')  # Terminal states have no further action

    policy = dict(zip(states, policy_list)
                  )  # Create a dictionary keys: position, value: next position
    print("The initial random policy is:")
    print_policy(policy, grid)  # Print the initial policy
    print("")
    #######################################

    ### initialize the values V(s) randomly ####
    V = {}
    for s in states:  # Initialize the values to 0
        if (
                len(actions[states.index(s)]) != 0
        ):  # Check for terminal or unreachable positions, they have no further action
            V[s] = np.random.random()
        else:
            V[s] = 0
    print(
        "The values are initialized randomly, terminal and unreachable positions have value 0:"
    )