Python get_contiguous_sub_episodes示例

编程语言: Python

命名空间/包名称: tf_agents.utils.common

方法/功能: get_contiguous_sub_episodes

hotexamples.com的示例: 2

Python get_contiguous_sub_episodes - 已找到2个示例。这些是从开源项目中提取的最受好评的tf_agents.utils.common.get_contiguous_sub_episodes现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： particles.py 项目： adak32/bellman

def averaged_particle_returns(reward: Tensor, discount: Tensor,
                              number_of_particles: int) -> Tensor:
    """
    Compute the returns from a set of trajectories, averaging over a number of particles per
    element of the batch.

    :param reward: A batch of trajectories of step rewards. The batch size is the number of action
                   trajectories multiplied by the number of monte-carlo rollouts of each action
                   trajectory.
    :param discount: A batch of trajectories of step discounts. The batch size is the number of
                     action trajectories multiplied by the number of monte-carlo rollouts of each
                     action trajectory.
    :param number_of_particles: Number of monte-carlo rollouts of each action trajectory.

    :return: Monte-carlo estimate of the returns from each action trajectory.
    """
    # Looks weird but is correct! At first sight, digging into it, it evokes the impression
    # that the last reward signal is missed. However, tf's cumprod method has an `exclusive`
    # flag which is set to True such that the last reward signal is included.
    mask = get_contiguous_sub_episodes(discount)

    particles_returns = tf.reduce_sum(reward * mask,
                                      axis=1)  # shape = (batch_size,)
    batch_particles_returns = reshape_create_particle_axis(
        particles_returns, number_of_particles)
    return tf.reduce_mean(batch_particles_returns, axis=1)

示例#2

显示文件

文件： common_test.py 项目： tonylibing/agents

    def testNumSteps(self):
        discounts = [
            [0.9, 0.9, 0.9, 0.9],  # No episode termination.
            [0.0, 0.9, 0.9, 0.9],  # Episode terminates on first step.
            [0.9, 0.9, 0.0, 0.9]
        ]  # Episode terminates on third step.

        tensor = tf.constant(discounts, dtype=tf.float32)
        result = common.get_contiguous_sub_episodes(tensor)

        expected_result = [[1.0, 1.0, 1.0, 1.0], [1.0, 0.0, 0.0, 0.0],
                           [1.0, 1.0, 1.0, 0.0]]

        self.assertAllClose(expected_result, self.evaluate(result))