Python TFPyEnvironment.current_time_step примеры использования

Язык программирования: Python

Пространство имен/Пакет: tf_agents.environments.tf_py_environment

Класс/Тип: TFPyEnvironment

Метод/Функция: current_time_step

Примеров на hotexamples.com: 3

Python TFPyEnvironment.current_time_step - 3 примера найдено. Это лучшие примеры Python кода для tf_agents.environments.tf_py_environment.TFPyEnvironment.current_time_step, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

TFPyEnvironment(30)

action_spec(29)

reset(29)

time_step_spec(29)

observation_spec(26)

step(21)

current_time_step(3)

close(1)

seed(1)

Пример #1

Показать файл

Файл: ts_agents_full_example.py Проект: albertium/ReinforceMarketMaking

def collect_steps(env: tf_py_environment.TFPyEnvironment,
                  policy: tf_policy.Base, buffer: ReplayBuffer):
    time_step = env.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = env.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)
    buffer.add_batch(traj)

Пример #2

Показать файл

def collect_step(env: tf_py_environment.TFPyEnvironment, policy, buffer):
    time_step = env.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = env.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)

    # Add trajectory to the replay buffer
    buffer.add_batch(traj)

Пример #3

Показать файл

Файл: utils.py Проект: GraceRuisiGu/kindo-1

def step(
    environment: TFPyEnvironment, policy: tf_policy.TFPolicy, replay_buffer: ReplayBuffer
) -> typing.Tuple[float, bool]:
    time_step = environment.current_time_step()
    action_step = policy.action(time_step)
    next_time_step = environment.step(action_step.action)
    traj = trajectory.from_transition(time_step, action_step, next_time_step)
    replay_buffer.add_batch(traj)
    return next_time_step.reward.numpy()[0], next_time_step.is_last()