Python FullyConnectedParametricDQN.eval 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ml.rl.models.parametric_dqn

메소드/함수: eval

hotexamples.com에서의 예제들: 5

Python FullyConnectedParametricDQN.eval - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ml.rl.models.parametric_dqn.FullyConnectedParametricDQN.eval에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

FullyConnectedParametricDQN(19)

cuda(13)

get_target_network(6)

get_distributed_data_parallel_model(5)

eval(3)

get_data_parallel_model(1)

input_prototype(1)

train(1)

예제 #1

파일 보기

파일: test_parametric_dqn.py 프로젝트: mikekwright/ml-horizon

 def test_save_load_batch_norm(self):
     state_dim = 8
     action_dim = 4
     model = FullyConnectedParametricDQN(
         state_dim,
         action_dim,
         sizes=[8, 4],
         activations=["relu", "relu"],
         use_batch_norm=True,
     )
     # Freezing batch_norm
     model.eval()
     expected_num_params, expected_num_inputs, expected_num_outputs = 21, 2, 1
     check_save_load(self, model, expected_num_params, expected_num_inputs,
                     expected_num_outputs)

예제 #2

파일 보기

파일: test_parametric_dqn.py 프로젝트: sra4077/Horizon

 def test_save_load_batch_norm(self):
     state_dim = 8
     action_dim = 4
     model = FullyConnectedParametricDQN(
         state_dim,
         action_dim,
         sizes=[8, 4],
         activations=["relu", "relu"],
         use_batch_norm=True,
     )
     # Freezing batch_norm
     model.eval()
     expected_num_params, expected_num_inputs, expected_num_outputs = 21, 2, 1
     check_save_load(
         self, model, expected_num_params, expected_num_inputs, expected_num_outputs
     )

예제 #3

파일 보기

파일: test_parametric_dqn.py 프로젝트: sra4077/Horizon

 def test_basic(self):
     state_dim = 8
     action_dim = 4
     model = FullyConnectedParametricDQN(
         state_dim,
         action_dim,
         sizes=[8, 4],
         activations=["relu", "relu"],
         use_batch_norm=True,
     )
     input = model.input_prototype()
     self.assertEqual((1, state_dim), input.state.float_features.shape)
     self.assertEqual((1, action_dim), input.action.float_features.shape)
     # Using batch norm requires more than 1 example in training, avoid that
     model.eval()
     single_q_value = model(input)
     self.assertEqual((1, 1), single_q_value.q_value.shape)

예제 #4

파일 보기

파일: test_parametric_dqn.py 프로젝트: mikekwright/ml-horizon

 def test_basic(self):
     state_dim = 8
     action_dim = 4
     model = FullyConnectedParametricDQN(
         state_dim,
         action_dim,
         sizes=[8, 4],
         activations=["relu", "relu"],
         use_batch_norm=True,
     )
     input = model.input_prototype()
     self.assertEqual((1, state_dim), input.state.float_features.shape)
     self.assertEqual((1, action_dim), input.action.float_features.shape)
     # Using batch norm requires more than 1 example in training, avoid that
     model.eval()
     single_q_value = model(input)
     self.assertEqual((1, 1), single_q_value.q_value.shape)

예제 #5

파일 보기

    def test_slate_q_trainer(self):
        recsim = RecSim(num_users=10)

        # Build memory pool with random policy
        memory_pool = OpenAIGymMemoryPool(10000000)
        random_reward = recsim.rollout_policy(random_policy, memory_pool)

        # Train a model
        q_network = FullyConnectedParametricDQN(
            state_dim=memory_pool.state_dim,
            action_dim=memory_pool.action_dim,
            sizes=[64, 32],
            activations=["relu", "relu"],
        )

        q_network = q_network.eval()
        recsim.reset()
        untrained_policy_reward = recsim.rollout_policy(
            partial(top_k_policy, q_network))
        q_network = q_network.train()

        q_network_target = q_network.get_target_network()
        parameters = SlateQTrainerParameters()
        trainer = SlateQTrainer(q_network, q_network_target, parameters)

        for _i in range(1000):
            tdp = memory_pool.sample_memories(
                128, model_type=ModelType.PYTORCH_PARAMETRIC_DQN.value)
            training_batch = tdp.as_slate_q_training_batch()
            trainer.train(training_batch)

        q_network = q_network.eval()
        recsim.reset()
        trained_policy_reward = recsim.rollout_policy(
            partial(top_k_policy, q_network))

        print(
            f"Reward; random: {random_reward}; untrained: {untrained_policy_reward}; "
            f"trained: {trained_policy_reward}")

        self.assertGreater(trained_policy_reward, untrained_policy_reward)
        self.assertGreater(trained_policy_reward, random_reward)
        self.assertEqual(random_reward, 1384.0)
        self.assertEqual(untrained_policy_reward, 1200.0)
        self.assertEqual(trained_policy_reward, 1432.0)