Python visualise 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: game

메소드/함수: visualise

hotexamples.com에서의 예제들: 3

Python visualise - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 game.visualise에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

        terminal = False

        state = game.initialise_state()
        action = epsilon_greedy(state)

        E_matrix = np.zeros_like(theta)

        while not terminal:
            # take action a, observe r, s'
            next_state, reward = game.step(state, action)
            # choose a' from s' using policy from Q

            terminal = next_state.terminal

            if not terminal:
                next_action = epsilon_greedy(state)
                delta = reward + Q(next_state, next_action) - Q(state, action)
            else:
                delta = reward - Q(state, action)

            E_matrix = np.add(lmd * E_matrix, psi(state, action))

            theta += alpha * delta * E_matrix

            if not terminal:
                state = next_state
                action = next_action

    game.visualise(V(generate_Q()))

예제 #2

파일 보기

파일: sarsa.py 프로젝트: Soundpulse/easy21-rl

        E_matrix = np.zeros_like(Q_matrix)

        state = game.initialise_state()
        action = epsilon_greedy(allQ(state), allN(state))

        while not terminal:
            next_state, reward = game.step(state, action)

            terminal = state.terminal

            if not terminal:
                next_action = epsilon_greedy(allQ(state), allN(state))
                delta = reward + Q(next_state, next_action) - Q(state, action)
            else:
                delta = reward - Q(state, action)

            allE(state)[int(action)] += 1
            allN(state)[int(action)] += 1

            alpha = 1 / N(state, action)

            Q_matrix += alpha * delta * E_matrix
            E_matrix *= lmd

            if not terminal:
                state = next_state
                action = next_action

    game.visualise(V(Q_matrix))

예제 #3

파일 보기

        while not terminal:
            state, reward = game.step(state, action)
            action = softmax_policy(state, theta)

            terminal = state.terminal

            if terminal:
                state_action_pairs = zip(history[0::3], history[1::3])

                history.append(reward)
                history.append(state)

                Gt = sum(history[2::3])

                for s, a in state_action_pairs:
                    increment_n(s, a)
                    alpha = 1 / N(s, a)
                    advantage = Gt - Q(s, a, theta)
                    theta += alpha * score_function(s, a, theta) * advantage

            else:
                history.append(reward)
                history.append(state)
                history.append(action)

        if k % 10000 == 0:
            print("MSE: " +
                  str(round(np.sum((Q_star - generate_Q(theta))**2), 2)))

    game.visualise(V(generate_Q(theta)))