Python Policy._moving_average_sqd_adv_norm 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ray.rllib.policy.policy

클래스/타입: Policy

메소드/함수: _moving_average_sqd_adv_norm

hotexamples.com에서의 예제들: 3

Python Policy._moving_average_sqd_adv_norm - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ray.rllib.policy.policy.Policy._moving_average_sqd_adv_norm에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

__init__(17)

_mean_policy_loss(6)

_mean_vf_loss(6)

_mean_entropy(6)

_mean_kl(5)

_get_is_training_placeholder(5)

_moving_average_sqd_adv_norm(3)

_mean_temporal_caps_loss(2)

_critic_grads_and_vars(2)

_critic_optimizer(2)

_is_ratio(2)

_mean_symmetric_policy_loss(2)

_mean_spatial_caps_loss(2)

_actor_optimizer(2)

_mean_global_caps_loss(2)

_actor_grads_and_vars(2)

_q_loss(1)

Policy(1)

_q_values(1)

_next_q_values(1)

_random_action_generator(1)

_replay_click_q(1)

_score_no_click(1)

_scores(1)

_slate_q_values(1)

_q_clicked(1)

_mean_reward_loss(1)

_next_q_target_slate(1)

_next_q_target_max(1)

_mean_td_error(1)

_mean_actions(1)

_mcts_loss(1)

_loss_wo_vf(1)

_bellman_reward(1)

_alpha_prime_grads_and_vars(1)

_alpha_grads_and_vars(1)

_target(1)

예제 #1

파일 보기

파일: marwil_tf_policy.py 프로젝트: weileze/ray

def setup_mixins(policy: Policy, obs_space: gym.spaces.Space,
                 action_space: gym.spaces.Space,
                 config: TrainerConfigDict) -> None:
    ValueNetworkMixin.__init__(policy, obs_space, action_space, config)
    # Set up a tf-var for the moving avg (do this here to make it work with
    # eager mode); "c^2" in the paper.
    policy._moving_average_sqd_adv_norm = get_variable(
        100.0,
        framework="tf",
        tf_name="moving_average_of_advantage_norm",
        trainable=False)

예제 #2

파일 보기

def setup_mixins(policy: Policy, obs_space: gym.spaces.Space,
                 action_space: gym.spaces.Space,
                 config: TrainerConfigDict) -> None:
    # Setup Value branch of our NN.
    ValueNetworkMixin.__init__(policy, obs_space, action_space, config)

    # Not needed for pure BC.
    if policy.config["beta"] != 0.0:
        # Set up a torch-var for the squared moving avg. advantage norm.
        policy._moving_average_sqd_adv_norm = torch.tensor(
            [policy.config["moving_average_sqd_adv_norm_start"]],
            dtype=torch.float32,
            requires_grad=False).to(policy.device)

예제 #3

파일 보기

파일: marwil_tf_policy.py 프로젝트: yiranwang52/ray

def setup_mixins(policy: Policy, obs_space: gym.spaces.Space,
                 action_space: gym.spaces.Space,
                 config: TrainerConfigDict) -> None:
    # Setup Value branch of our NN.
    ValueNetworkMixin.__init__(policy, obs_space, action_space, config)

    # Not needed for pure BC.
    if policy.config["beta"] != 0.0:
        # Set up a tf-var for the moving avg (do this here to make it work
        # with eager mode); "c^2" in the paper.
        policy._moving_average_sqd_adv_norm = get_variable(
            policy.config["moving_average_sqd_adv_norm_start"],
            framework="tf",
            tf_name="moving_average_of_advantage_norm",
            trainable=False)