Python rolloutの例

プログラミング言語: Python

名前空間/パッケージ名: ray.rllib.agents.es.es_tf_policy

メソッド/関数: rollout

hotexamples.comのコード掲載数: 5

Python rollout - 5件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのray.rllib.agents.es.es_tf_policy.rolloutの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

コード例 #1

ファイルを表示

 def rollout(self, timestep_limit, add_noise=True):
     rollout_rewards, rollout_fragment_length = rollout(
         self.policy,
         self.env,
         timestep_limit=timestep_limit,
         add_noise=add_noise)
     return rollout_rewards, rollout_fragment_length

コード例 #2

ファイルを表示

 def rollout(self, timestep_limit):
     rollout_rewards, rollout_length = rollout(
         self.policy,
         self.env,
         timestep_limit=timestep_limit,
         add_noise=False)
     return rollout_rewards, rollout_length

コード例 #3

ファイルを表示

ファイル: ars.py プロジェクト: RuofanKong/ray

 def rollout(self, timestep_limit, add_noise=False):
     rollout_rewards, rollout_fragment_length = rollout(
         self.policy,
         self.env,
         timestep_limit=timestep_limit,
         add_noise=add_noise,
         offset=self.config["offset"])
     return rollout_rewards, rollout_fragment_length

コード例 #4

ファイルを表示

ファイル: es_co_trainer.py プロジェクト: NREL/K_Road

    def evaluate(self, candidate):
        noise_index, multiplier = candidate

        weights = self.common.model_keeper.get_perturbed_weights(
            noise_index, multiplier)
        self.common.policy.set_flat_weights(weights)

        rewards, length = \
            rollout(
                self.common.policy,
                self.common.env,
                timestep_limit=self.timestep_limit,
                add_noise=False)
        return rewards.sum(), length

コード例 #5

ファイルを表示

ファイル: coordinated_dps_trainer.py プロジェクト: NREL/K_Road

    def evaluate(self, candidate):
        # ******************************* how to evaluate a candidate message

        weights = self.common.optimizer.expand(candidate)
        self.common.policy.set_flat_weights(weights)

        rewards, length = \
            rollout(
                self.common.policy,
                self.common.env,
                timestep_limit=self.timestep_limit,
                add_noise=False)
        # if candidate == 0:
        logger.info('candidate {} {} {} {} {}'.format(
            candidate, weights[0],
            self.common.policy.get_flat_weights()[0], rewards.sum(), length))

        return rewards.sum(), length