Python Policy.entropyの例

プログラミング言語: Python

名前空間/パッケージ名: ray.rllib.policy

クラス/型: Policy

メソッド/関数: entropy

hotexamples.comのコード掲載数: 2

Python Policy.entropy - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのray.rllib.policy.Policy.entropyの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

__init__(11)

set_weights(5)

q_func_vars(4)

target_q_func_vars(3)

policy_spec(3)

td_error(2)

target_q_model(2)

entropy(2)

get_tower_stats(2)

spl_loss(2)

q_model(2)

_get_is_training_placeholder(2)

target_model(1)

q_values(1)

policy_loss(1)

pi_err(1)

p2sro_policy_spec(1)

model(1)

loss(1)

get_weights(1)

dist_class(1)

compute_single_action(1)

_lazy_tensor_dict(1)

to_log(1)

コード例 #1

ファイルを表示

ファイル: supervised_learning.py プロジェクト: tobiasbaumann1/amd

def spl_torch_loss(
        policy: Policy, model: ModelV2,
        dist_class: Type[TorchDistributionWrapper],
        train_batch: SampleBatch) -> Union[TensorType, List[TensorType]]:
    """The basic policy gradients loss function.

    Args:
        policy (Policy): The Policy to calculate the loss for.
        model (ModelV2): The Model to calculate the loss for.
        dist_class (Type[ActionDistribution]: The action distr. class.
        train_batch (SampleBatch): The training data.

    Returns:
        Union[TensorType, List[TensorType]]: A single loss tensor or a list
            of loss tensors.
    """
    # Pass the training data through our model to get distribution parameters.
    dist_inputs, _ = model.from_batch(train_batch)
    # Create an action distribution object.
    predictions = dist_class(dist_inputs, model)

    targets = []
    if policy.config["learn_action"]:
        targets.append(train_batch[SampleBatch.ACTIONS])
    if policy.config["learn_reward"]:
        targets.append(train_batch[SampleBatch.REWARDS])
    assert len(targets) > 0
    targets = torch.cat(targets, dim=0)

    # Save the loss in the policy object for the spl_stats below.
    policy.spl_loss = policy.config["loss_fn"](predictions.dist.probs, targets)
    policy.entropy = predictions.dist.entropy().mean()

    return policy.spl_loss

コード例 #2

ファイルを表示

ファイル: supervised_learning.py プロジェクト: longtermrisk/marltoolbox

def spl_torch_loss(
    policy: Policy,
    model: ModelV2,
    dist_class: Type[TorchDistributionWrapper],
    train_batch: SampleBatch,
) -> Union[TensorType, List[TensorType]]:
    """The basic policy gradients loss function.

    Args:
        policy (Policy): The Policy to calculate the loss for.
        model (ModelV2): The Model to calculate the loss for.
        dist_class (Type[ActionDistribution]: The action distr. class.
        train_batch (SampleBatch): The training data.

    Returns:
        Union[TensorType, List[TensorType]]: A single loss tensor or a list
            of loss tensors.
    """
    # Pass the training data through our model to get distribution parameters.
    dist_inputs, _ = model.from_batch(train_batch)
    # Create an action distribution object.
    action_dist = dist_class(dist_inputs, model)
    if policy.config["explore"]:
        # Adding that because of a bug in TorchCategorical
        #  which modify dist_inputs through action_dist:
        _, _ = policy.exploration.get_exploration_action(
            action_distribution=action_dist,
            timestep=policy.global_timestep,
            explore=policy.config["explore"],
        )
        action_dist = dist_class(dist_inputs, policy.model)

    targets = []
    if policy.config["learn_action"]:
        targets.append(train_batch[SampleBatch.ACTIONS])
    if policy.config["learn_reward"]:
        targets.append(train_batch[SampleBatch.REWARDS])
    assert len(targets) > 0, (f"In config, use learn_action=True and/or "
                              f"learn_reward=True to specify which target to "
                              f"use in supervised learning")
    targets = torch.cat(targets, dim=0)

    # Save the loss in the policy object for the spl_stats below.
    policy.spl_loss = policy.config["loss_fn"](action_dist.dist.probs, targets)
    policy.entropy = action_dist.dist.entropy().mean()

    return policy.spl_loss