Python ValueNet.eval示例

编程语言: Python

命名空间/包名称: model.ppo_discrete

类/类型: ValueNet

方法/功能: eval

hotexamples.com的示例: 2

Python ValueNet.eval - 已找到2个示例。这些是从开源项目中提取的最受好评的model.ppo_discrete.ValueNet.eval现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

parameters(4)

train(4)

ValueNet(2)

eval(2)

load_state_dict(2)

optimize_model(2)

示例#1

显示文件

    # Every save_ckpt_interval, Check if there is any checkpoint.
    # If there is, load checkpoint and continue training
    # Need to specify the i_episode of the checkpoint intended to load
    # if i_epoch % save_ckpt_interval == 0 and os.path.isfile(os.path.join(ckpt_dir, "ckpt_eps%d.pt" % i_epoch)):
    #     policy_net, value_net_in, value_net_ex, valuenet_in_optimizer, valuenet_ex_optimizer,\
    #     simhash, training_info = \
    #         load_checkpoint(ckpt_dir, i_epoch, layer_sizes, input_size, device=device)
    #     print("\n\tCheckpoint successfully loaded!\n")

    # To record episode stats
    episode_durations = []
    episode_rewards = []

    # Use value net in evaluation mode when collecting trajectories
    value_net_in.eval()
    value_net_ex.eval()

    ###################################################################
    # Collect trajectories

    print("\n\n\tCollecting %d episodes: " % (batch_size))

    for i_episode in tqdm(range(batch_size)):  # Use tqdm to show progress bar

        # Keep track of the running reward
        running_reward = 0

        # Initialize the environment and state
        current_state = env.reset()

示例#2

显示文件

    finished_rendering_this_epoch = False

    # Every save_ckpt_interval, Check if there is any checkpoint.
    # If there is, load checkpoint and continue training
    # Need to specify the i_episode of the checkpoint intended to load
    if i_epoch % save_ckpt_interval == 0 and os.path.isfile(
            os.path.join(ckpt_dir, "ckpt_eps%d.pt" % i_epoch)):
        policy_net, value_net, valuenet_optimizer, training_info = \
            load_checkpoint(ckpt_dir, i_epoch, layer_sizes, input_size, device=device)

    # To record episode stats
    episode_durations = []
    episode_rewards = []

    # Use value net in evaluation mode when collecting trajectories
    value_net.eval()

    ###################################################################
    # Collect trajectories

    print("\n\n\tCollecting %d episodes: " % (batch_size))

    for i_episode in tqdm(range(batch_size)):  # Use tqdm to show progress bar

        # Keep track of the running reward
        running_reward = 0

        # Initialize the environment and state
        current_state = env.reset()

        # Estimate the value of the initial state