Python Memory.set_initial_state примеры использования

Язык программирования: Python

Пространство имен/Пакет: utils.memory

Класс/Тип: Memory

Метод/Функция: set_initial_state

Примеров на hotexamples.com: 3

Python Memory.set_initial_state - 3 примера найдено. Это лучшие примеры Python кода для utils.memory.Memory.set_initial_state, полученные из open source проектов. Вы можете ставить оценку каждому примеру, чтобы помочь нам улучшить качество примеров.

Основные методы

Показать Скрыть

Memory(19)

add_transition(8)

set_initial_state(3)

add(3)

extend_column(2)

save(2)

sample(2)

extrinsic_discounted_rtg(2)

get_columns(2)

store(2)

remember(1)

states(1)

split(1)

actions(1)

append_column(1)

sample_states(1)

buffering(1)

resident(1)

reset(1)

data_management(1)

create(1)

intrinsic_val_est(1)

intrinsic_rtg(1)

intrinsic_rewards(1)

intrinsic_gae(1)

act_log_prob(1)

extrinsic_val_est(1)

extrinsic_rtg(1)

extrinsic_rewards(1)

extrinsic_gae(1)

push(1)

Пример #1

Показать файл

        for i in range(3):
            next_frame, _, _, _ = env.step(0)  # Take an arbitrary action
            frame_list.append(transform(next_frame))

        current_state = torch.cat(frame_list, dim=0).to(
            device)  # Stack the images. Note that image shape is (N, C, H, W)

        # Obtain action, log probability, and value estimate for the initial state
        # Move the outputs to cpu to save memory
        action, log_prob, ex_val = actor_critic(current_state.unsqueeze(dim=0))
        action = action.squeeze().cpu()
        log_prob = log_prob.squeeze().cpu()
        ex_val = ex_val.squeeze().cpu()

        # Store the first state and value estimate in memory
        memory.set_initial_state(current_state.clone().detach().cpu(),
                                 initial_ex_val_est=ex_val)

        for t in count():

            # Interact with the environment
            next_frame, reward, done, _ = env.step(action.item())
            running_reward += reward

            # Pop the frame from the top of the list and append the new frame, and stack to form the current state
            frame_list.pop(0)
            frame_list.append(transform(next_frame))
            next_state = torch.cat(frame_list,
                                   dim=0).to(device)  # Stack the images

            # Obtain action, log probability and value estimate for the next state in a single propagation
            # Move the outputs to cpu to save memory

Пример #2

Показать файл

        # TODO: Change codes below

        # Estimate the value of the initial state
        ex_val = value_net_ex(
            torch.tensor([current_state], dtype=torch.float32,
                         device=device)).squeeze()  # squeeze the dimension
        in_val = value_net_in(
            torch.tensor(
                [np.concatenate((current_state, [i_episode]), axis=0)],
                dtype=torch.float32,
                device=device)).squeeze(
                )  # provide i_episode as additional info as input

        # Store the first state and value estimate in memory
        memory.set_initial_state(current_state,
                                 initial_ex_val_est=ex_val,
                                 initial_in_val_est=in_val)

        # Obtain current state hash code
        current_state_hash = simhash.hash(current_state)

        for t in count():

            # Sample an action given the current state
            action, log_prob = policy_net(
                torch.tensor([current_state],
                             dtype=torch.float32,
                             device=device))
            log_prob = log_prob.squeeze()

            # Interact with the environment

Пример #3

Показать файл

Файл: baseline_vpg_disc_rtg.py Проект: ntsliyang/SegFault

            load_checkpoint(ckpt_dir, i_epoch, layer_sizes, input_size, device=device)

    # To record episode stats
    episode_durations = []
    episode_rewards = []

    for i_episode in range(batch_size):

        # Keep track of the running reward
        running_reward = 0

        # Initialize the environment and state
        current_state = env.reset()

        # Store the first state and value estimate in memory
        memory.set_initial_state(current_state)

        for t in count():
            # Make sure that policy net and value net is in training mode
            policy_net.train()

            # Sample an action given the current state
            action, log_prob = policy_net(
                torch.tensor([current_state], device=device))
            log_prob = log_prob.squeeze()

            # Interact with the environment
            next_state, reward, done, _ = env.step(action.item())
            running_reward += reward

            # Render this episode