Python PrioritizedReplayBuffer.batch_updateの例

プログラミング言語: Python

名前空間/パッケージ名: replay_buffer

メソッド/関数: batch_update

hotexamples.comのコード掲載数: 2

Python PrioritizedReplayBuffer.batch_update - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのreplay_buffer.PrioritizedReplayBuffer.batch_updateの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

PrioritizedReplayBuffer(30)

sample(28)

add(27)

update_priorities(26)

__len__(2)

store(2)

size(2)

sample_batch(2)

update_priority(2)

batch_update(2)

get_sample_indices(1)

increment_b(1)

get_sample_features(1)

push(1)

batch_load(1)

append(1)

add_experience(1)

update(1)

update_memory_sampling(1)

update_parameters(1)

add_episode(1)

num_valid_experiences(1)

コード例 #1

ファイルを表示

ファイル: per_test.py プロジェクト: hsl89/Practical_RL

def test_per(capacity):
    # test implementation of proritized replay buffer
    p_buffer = PrioritizedReplayBuffer(capacity)

    # populate the buffer
    for _ in range(capacity // 2):
        p_buffer.add(Experience())

    # update batches of experience
    n_batches = 10
    batch_size = 100
    for _ in range(10):
        # randomly sample $batch_size of tree indices
        idx = random.sample([x for x in range(capacity - 1, 2 * capacity - 1)],
                            batch_size)

        td_errors = np.random.uniform(0, 10, batch_size)

        p_buffer.batch_update(idx, td_errors)

        assert p_buffer.tree.max_priority == np.max(
            p_buffer.tree.tree[-capacity:])

    # test sample
    for _ in range(10):
        p_buffer.sample(batch_size)

    return

コード例 #2

ファイルを表示

ファイル: prioritized_replay_dqn.py プロジェクト: hsl89/Practical_RL

        Loss = weights * MSE
    
    '''

    # compute MSE adjusted by importance sampling weights
    # and backprop
    weights = torch.tensor(weights, dtype=torch.float32)
    #print(weights, torch.pow(td_loss, 2))
    loss = torch.mean(weights * torch.pow(td_loss, 2))
    loss.backward()
    grad_norm = nn.utils.clip_grad_norm_(agent.parameters(), max_grad_norm)
    opt.step()
    opt.zero_grad()

    # update the priorities of sampled exprs
    exp_replay.batch_update(b_idx, np.abs(td_loss.detach().cpu().numpy()))

    # increase the importance sampling hyperparameter b gradually to 1
    exp_replay.increment_b()

    if step % loss_freq == 0:
        # save MSE without importance sampling
        loss = torch.mean(torch.pow(td_loss, 2))
        td_loss_history.append(loss.cpu().item())

    if step % refresh_target_network_freq == 0:
        target_network.load_state_dict(agent.state_dict())

    if step % eval_freq == 0:
        mean_rw_history.append(
            evaluate(make_env(clip_rewards=True, seed=step),