Python RandomAlfEnvironment._doneの例

プログラミング言語: Python

名前空間/パッケージ名: alf.environments.random_alf_environment

メソッド/関数: _done

hotexamples.comのコード掲載数: 3

Python RandomAlfEnvironment._done - 3件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのalf.environments.random_alf_environment.RandomAlfEnvironment._doneの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

RandomAlfEnvironment(9)

step(8)

reset(5)

_done(3)

current_time_step(1)

render(1)

コード例 #1

ファイルを表示

 def testRewardCheckerBatchSizeOne(self):
     # Ensure batch size 1 with scalar reward works
     obs_spec = BoundedTensorSpec((2, 3), torch.int32, -10, 10)
     action_spec = BoundedTensorSpec((1, ), torch.int64)
     env = RandomAlfEnvironment(obs_spec,
                                action_spec,
                                reward_fn=lambda *_: np.array([1.0]),
                                batch_size=1)
     env._done = False
     env.reset()
     action = torch.tensor([0], dtype=torch.int64)
     time_step = env.step(action)
     self.assertEqual(time_step.reward, 1.0)

コード例 #2

ファイルを表示

 def testCustomRewardFn(self):
     obs_spec = BoundedTensorSpec((2, 3), torch.int32, -10, 10)
     action_spec = BoundedTensorSpec((1, ), torch.int64)
     batch_size = 3
     env = RandomAlfEnvironment(obs_spec,
                                action_spec,
                                reward_fn=lambda *_: np.ones(batch_size),
                                batch_size=batch_size)
     env._done = False
     env.reset()
     action = torch.ones(batch_size)
     time_step = env.step(action)
     self.assertSequenceAlmostEqual([1.0] * 3, time_step.reward)

コード例 #3

ファイルを表示

 def testRewardCheckerSizeMismatch(self):
     # Ensure custom scalar reward with batch_size greater than 1 raises
     # ValueError
     obs_spec = BoundedTensorSpec((2, 3), torch.int32, -10, 10)
     action_spec = BoundedTensorSpec((1, ), torch.int64)
     env = RandomAlfEnvironment(obs_spec,
                                action_spec,
                                reward_fn=lambda *_: np.array([1.0]),
                                batch_size=5)
     env.reset()
     env._done = False
     action = torch.tensor(0, dtype=torch.int64)
     with self.assertRaises(ValueError):
         env.step(action)