Python DQN.get_q_values 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: genrl.agents.deep.dqn.base

클래스/타입: DQN

메소드/함수: get_q_values

hotexamples.com에서의 예제들: 2

Python DQN.get_q_values - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 genrl.agents.deep.dqn.base.DQN.get_q_values에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

model(3)

get_q_values(2)

get_target_q_values(2)

target_model(2)

예제 #1

파일 보기

파일: utils.py 프로젝트: veds12/genrl

def categorical_q_loss(agent: DQN, batch: collections.namedtuple):
    """Categorical DQN loss function to calculate the loss of the Q-function

    Args:
        agent (:obj:`DQN`): The agent
        batch (:obj:`collections.namedtuple` of :obj:`torch.Tensor`): Batch of experiences

    Returns:
        loss (:obj:`torch.Tensor`): Calculateed loss of the Q-function
    """
    q_values = agent.get_q_values(batch.states, batch.actions)
    target_q_values = agent.get_target_q_values(batch.next_states,
                                                batch.rewards, batch.dones)

    # For the loss, we take the difference
    loss = -(target_q_values * q_values.log()).sum(1).mean()
    return loss

예제 #2

파일 보기

파일: utils.py 프로젝트: veds12/genrl

def prioritized_q_loss(agent: DQN, batch: collections.namedtuple):
    """Function to calculate the loss of the Q-function

    Returns:
        agent (:obj:`DQN`): The agent
        loss (:obj:`torch.Tensor`): Calculateed loss of the Q-function
    """
    q_values = agent.get_q_values(batch.states, batch.actions)
    target_q_values = agent.get_target_q_values(batch.next_states,
                                                batch.rewards, batch.dones)

    # Weighted MSE Loss
    loss = batch.weights * (q_values - target_q_values.detach())**2
    # Priorities are taken as the td-errors + some small value to avoid 0s
    priorities = loss + 1e-5
    loss = loss.mean()
    agent.replay_buffer.update_priorities(batch.indices,
                                          priorities.detach().cpu().numpy())
    agent.logs["value_loss"].append(loss.item())
    return loss