Python PSRONashResponse.update_rolling_winrates примеры использования

Язык программирования: Python

Пространство имен/Пакет: regym.training_schemes

Класс/Тип: PSRONashResponse

Метод/Функция: update_rolling_winrates

Примеров на hotexamples.com: 2

Python PSRONashResponse.update_rolling_winrates - 2 примера найдено. Это лучшие примеры Python кода для regym.training_schemes.PSRONashResponse.update_rolling_winrates, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

PSRONashResponse(11)

menagerie(2)

update_rolling_winrates(2)

fill_meta_game_missing_entries(1)

has_policy_converged(1)

match_outcome_rolling_window(1)

meta_game(1)

update_meta_game(1)

Пример #1

Показать файл

Файл: test_psro.py Проект: Mark-F10/Regym

def test_can_keep_track_of_window_of_winrate_for_learning_policy(RPS_task):
    psro = PSRONashResponse(task=RPS_task, match_outcome_rolling_window_size=3)
    training_agent_indeces = [1, 1, 0, 1]
    expected_rolling_window = [1, 0, 1]

    # TODO this is very ugly. It always chooses player 2 (1-index) as winner
    # We should really find a way of mocking this.
    sample_trajectory = [([], [], [0, 1], [])]  # (s, a, r, s')
    for i in training_agent_indeces:
        psro.update_rolling_winrates(episode_trajectory=sample_trajectory,
                                     training_agent_index=i)

    np.testing.assert_array_equal(expected_rolling_window,
                                  psro.match_outcome_rolling_window)

Пример #2

Показать файл

def test_can_keep_track_of_window_of_winrate_for_learning_policy(RPS_task):
    psro = PSRONashResponse(task=RPS_task, match_outcome_rolling_window_size=3)
    training_agent_indeces = [1, 1, 0, 1]
    expected_rolling_window = [1, 0, 1]

    # TODO this is very ugly. It always chooses player 2 (1-index) as winner
    # We should really find a way of mocking this.
    sample_trajectory = Trajectory(
        env_type=EnvType.MULTIAGENT_SIMULTANEOUS_ACTION, num_agents=2)
    sample_trajectory.add_timestep(None, None, [0, 1], None, True)

    for i in training_agent_indeces:
        psro.update_rolling_winrates(episode_trajectory=sample_trajectory,
                                     training_agent_index=i)

    np.testing.assert_array_equal(expected_rolling_window,
                                  psro.match_outcome_rolling_window)