Python GreedyActionSampler.sample_actionの例

プログラミング言語: Python

名前空間/パッケージ名: reagent.gym.policies.samplers.discrete_sampler

メソッド/関数: sample_action

hotexamples.comのコード掲載数: 2

Python GreedyActionSampler.sample_action - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのreagent.gym.policies.samplers.discrete_sampler.GreedyActionSampler.sample_actionの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

GreedyActionSampler(6)

sample_action(2)

よく使われるメソッド

GreedyActionSampler (6)

sample_action (2)

コード例 #1

ファイルを表示

ファイル: predictor_policies.py プロジェクト: t-triobox/ReAgent

class DiscreteDQNPredictorPolicy(Policy):
    def __init__(self, wrapped_dqn_predictor,
                 rl_parameters: Optional[RLParameters]):
        if rl_parameters and rl_parameters.softmax_policy:
            self.sampler = SoftmaxActionSampler(
                temperature=rl_parameters.temperature)
        else:
            self.sampler = GreedyActionSampler()
        self.scorer = discrete_dqn_serving_scorer(
            q_network=DiscreteDqnPredictorUnwrapper(wrapped_dqn_predictor))

    # pyre-fixme[56]: Decorator `torch.no_grad(...)` could not be called, because
    #  its type `no_grad` is not callable.
    @torch.no_grad()
    def act(
        self,
        obs: Union[rlt.ServingFeatureData, Tuple[torch.Tensor, torch.Tensor]],
        possible_actions_mask: Optional[np.ndarray],
    ) -> rlt.ActorOutput:
        """Input is either state_with_presence, or
        ServingFeatureData (in the case of sparse features)"""
        assert isinstance(obs, tuple)
        if isinstance(obs, rlt.ServingFeatureData):
            state: rlt.ServingFeatureData = obs
        else:
            state = rlt.ServingFeatureData(
                float_features_with_presence=obs,
                id_list_features={},
                id_score_list_features={},
            )
        scores = self.scorer(state, possible_actions_mask)
        return self.sampler.sample_action(scores).cpu().detach()

コード例 #2

ファイルを表示

ファイル: predictor_policies.py プロジェクト: hermes2k/ReAgent

class DiscreteDQNPredictorPolicy(Policy):
    def __init__(self, wrapped_dqn_predictor):
        self.sampler = GreedyActionSampler()
        self.scorer = discrete_dqn_serving_scorer(
            q_network=DiscreteDqnPredictorUnwrapper(wrapped_dqn_predictor)
        )

    @torch.no_grad()
    def act(
        self, obs: Union[rlt.ServingFeatureData, Tuple[torch.Tensor, torch.Tensor]]
    ) -> rlt.ActorOutput:
        """ Input is either state_with_presence, or
        ServingFeatureData (in the case of sparse features) """
        assert isinstance(obs, tuple)
        if isinstance(obs, rlt.ServingFeatureData):
            state: rlt.ServingFeatureData = obs
        else:
            state = rlt.ServingFeatureData(
                float_features_with_presence=obs,
                id_list_features={},
                id_score_list_features={},
            )
        scores = self.scorer(state)
        return self.sampler.sample_action(scores).cpu().detach()