Python apply_grad_clipping示例

编程语言: Python

命名空间/包名称: ray.rllib.agents.a3c.a3c_torch_policy

方法/功能: apply_grad_clipping

hotexamples.com的示例: 4

Python apply_grad_clipping - 已找到4个示例。这些是从开源项目中提取的最受好评的ray.rllib.agents.a3c.a3c_torch_policy.apply_grad_clipping现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： data_augmenting_ppo_agent.py 项目： wulfebw/neurips2020-procgen

def my_apply_grad_clipping(policy, optimizer, loss):
    # Apply the gradient clipping elementwise first to prevent the larger gradients at the
    # end of the network from dominating after clipping the gradients by the global norm.
    info = apply_grad_clipping_elementwise(policy, optimizer, loss)

    # Update the grad clip value depending on the mode.
    if policy.config["grad_clip_options"]["mode"] == "adaptive":
        assert policy.config.get("grad_clip_elementwise", None) is not None
        if len(policy.prev_gradient_norms
               ) == policy.config["grad_clip_options"]["adaptive_buffer_size"]:
            # Compute the grad clip value as a percentile of the previous buffer_size gradient norms.
            grad_clip = np.percentile(policy.prev_gradient_norms,
                                      q=policy.config["grad_clip_options"]["adaptive_percentile"])
            # Clip the grad clip value to a reasonable range.
            grad_clip = np.clip(
                grad_clip,
                policy.config["grad_clip_options"]["adaptive_min"],
                policy.config["grad_clip_options"]["adaptive_max"],
            )
            # Update the grad clip value on the policy. This will take effect below.
            policy.config["grad_clip"] = grad_clip
            # Track the effective grad clip value as a metric.
            info["effective_grad_clip"] = grad_clip

        # Update buffer of gradients.
        current_gradient_norm = info["after_ele_clip_global_grad_norm"]
        policy.prev_gradient_norms.append(current_gradient_norm)

    # Apply gradient clipping per usual, possibly using the updated grad clip value.
    global_info = apply_grad_clipping(policy, optimizer, loss)
    if "grad_gnorm" in global_info:
        info["final_grad_global_norm"] = global_info["grad_gnorm"].to("cpu")

    return info

示例#2

显示文件

def grad_process_and_td_error_fn(policy, optimizer, loss):
    # Clip grads if configured.
    info = apply_grad_clipping(policy, optimizer, loss)
    # Add td-error to info dict.
    info["td_error"] = policy.q_loss.td_error
    return info

示例#3

显示文件

def grad_process_and_td_error_fn(policy: Policy,
                                 optimizer: "torch.optim.Optimizer",
                                 loss: TensorType) -> Dict[str, TensorType]:
    # Clip grads if configured.
    return apply_grad_clipping(policy, optimizer, loss)

示例#4

显示文件

文件： dqn_torch_policy.py 项目： thalvari/ray

def grad_process_and_td_error_fn(policy, optimizer, loss):
    # Clip grads if configured.
    return apply_grad_clipping(policy, optimizer, loss)