Python ActionDistribution.sampleの例

プログラミング言語: Python

名前空間/パッケージ名: ray.rllib.models.action_dist

クラス/型: ActionDistribution

メソッド/関数: sample

hotexamples.comのコード掲載数: 3

Python ActionDistribution.sample - 3件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのray.rllib.models.action_dist.ActionDistribution.sampleの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

deterministic_sample(14)

__init__(7)

logp(5)

sample(3)

entropy(1)

required_model_output_shape(1)

sampled_action_logp(1)

コード例 #1

ファイルを表示

    def get_exploration_action(
        self,
        action_distribution: ActionDistribution,
        timestep: Union[int, TensorType],
        explore: bool = True,
    ):
        assert (self.framework == "torch"
                ), "ERROR: SlateSoftQ only supports torch so far!"

        cls = type(action_distribution)

        # Re-create the action distribution with the correct temperature
        # applied.
        action_distribution = cls(action_distribution.inputs,
                                  self.model,
                                  temperature=self.temperature)
        batch_size = action_distribution.inputs.size()[0]
        action_logp = torch.zeros(batch_size, dtype=torch.float)

        self.last_timestep = timestep

        # Explore.
        if explore:
            # Return stochastic sample over (q-value) logits.
            action = action_distribution.sample()
        # Return the deterministic "sample" (argmax) over (q-value) logits.
        else:
            action = action_distribution.deterministic_sample()

        return action, action_logp

コード例 #2

ファイルを表示

    def _get_torch_exploration_action(self, action_dist: ActionDistribution,
                                      timestep: Union[TensorType, int],
                                      explore: Union[TensorType, bool]):
        # Set last timestep or (if not given) increase by one.
        self.last_timestep = timestep if timestep is not None else \
            self.last_timestep + 1

        # Apply exploration.
        if explore:
            # Random exploration phase.
            if self.last_timestep < self.random_timesteps:
                action, logp = \
                    self.random_exploration.get_torch_exploration_action(
                        action_dist, explore=True)
            # Take a sample from our distribution.
            else:
                action = action_dist.sample()
                logp = action_dist.sampled_action_logp()

        # No exploration -> Return deterministic actions.
        else:
            action = action_dist.deterministic_sample()
            logp = torch.zeros_like(action_dist.sampled_action_logp())

        return action, logp

コード例 #3

ファイルを表示

 def get_exploration_action(
     self,
     *,
     action_distribution: ActionDistribution,
     timestep: int,
     explore: bool = True,
 ) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
     if explore:
         if timestep < self._pure_exploration_steps:
             return super().get_exploration_action(
                 action_distribution=action_distribution,
                 timestep=timestep,
                 explore=explore,
             )
         return action_distribution.sample()
     return action_distribution.deterministic_sample()