Python PrioritizedReplayBuffer.batch_update 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: replay_buffer

메소드/함수: batch_update

hotexamples.com에서의 예제들: 2

Python PrioritizedReplayBuffer.batch_update - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 replay_buffer.PrioritizedReplayBuffer.batch_update에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

PrioritizedReplayBuffer(30)

sample(28)

add(27)

update_priorities(26)

__len__(2)

store(2)

size(2)

sample_batch(2)

update_priority(2)

batch_update(2)

get_sample_indices(1)

increment_b(1)

get_sample_features(1)

push(1)

batch_load(1)

append(1)

add_experience(1)

update(1)

update_memory_sampling(1)

update_parameters(1)

add_episode(1)

num_valid_experiences(1)

예제 #1

파일 보기

파일: per_test.py 프로젝트: hsl89/Practical_RL

def test_per(capacity):
    # test implementation of proritized replay buffer
    p_buffer = PrioritizedReplayBuffer(capacity)

    # populate the buffer
    for _ in range(capacity // 2):
        p_buffer.add(Experience())

    # update batches of experience
    n_batches = 10
    batch_size = 100
    for _ in range(10):
        # randomly sample $batch_size of tree indices
        idx = random.sample([x for x in range(capacity - 1, 2 * capacity - 1)],
                            batch_size)

        td_errors = np.random.uniform(0, 10, batch_size)

        p_buffer.batch_update(idx, td_errors)

        assert p_buffer.tree.max_priority == np.max(
            p_buffer.tree.tree[-capacity:])

    # test sample
    for _ in range(10):
        p_buffer.sample(batch_size)

    return

예제 #2

파일 보기

파일: prioritized_replay_dqn.py 프로젝트: hsl89/Practical_RL

        Loss = weights * MSE
    
    '''

    # compute MSE adjusted by importance sampling weights
    # and backprop
    weights = torch.tensor(weights, dtype=torch.float32)
    #print(weights, torch.pow(td_loss, 2))
    loss = torch.mean(weights * torch.pow(td_loss, 2))
    loss.backward()
    grad_norm = nn.utils.clip_grad_norm_(agent.parameters(), max_grad_norm)
    opt.step()
    opt.zero_grad()

    # update the priorities of sampled exprs
    exp_replay.batch_update(b_idx, np.abs(td_loss.detach().cpu().numpy()))

    # increase the importance sampling hyperparameter b gradually to 1
    exp_replay.increment_b()

    if step % loss_freq == 0:
        # save MSE without importance sampling
        loss = torch.mean(torch.pow(td_loss, 2))
        td_loss_history.append(loss.cpu().item())

    if step % refresh_target_network_freq == 0:
        target_network.load_state_dict(agent.state_dict())

    if step % eval_freq == 0:
        mean_rw_history.append(
            evaluate(make_env(clip_rewards=True, seed=step),