Python OUNoise 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: utilities.noises

클래스/타입: OUNoise

hotexamples.com에서의 예제들: 5

Python OUNoise - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 utilities.noises.OUNoise에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

OUNoise(5)

자주 사용되는 메소드들

OUNoise (5)

예제 #1

파일 보기

파일: unequal_game.py 프로젝트: imm-rl-lab/differential-games-rl

def init_v_agent(state_shape, action_shape, action_max, batch_size):
    mu_model = Seq_Network([state_shape, 16, 16, action_shape], nn.ReLU(), nn.Tanh())
    p_model = Seq_Network([state_shape, 16, 16, action_shape ** 2], nn.ReLU())
    v_model = Seq_Network([state_shape, 16, 16, 1], nn.ReLU())
    noise = OUNoise(1, threshold=action_max, threshold_min=0.001, threshold_decrease=0.001 * action_max)
    agent = NAFAgent(mu_model, p_model, v_model, noise, state_shape, action_shape, action_max, batch_size, gamma=1)
    return agent

예제 #2

파일 보기

def init_agent(state_shape, action_shape, u_max, v_max, batch_size):
    u_model = init_u_agent(state_shape, action_shape, u_max)
    v_model = init_v_agent(state_shape, action_shape, v_max)
    v_network = Seq_Network([state_shape, 16, 16, 1], nn.ReLU())
    noise = OUNoise(1, threshold=1, threshold_min=0.002, threshold_decrease=0.002)
    agent = models.centrilized_naf.CentralizedNafAgent(u_model, v_model, v_network, noise, state_shape, action_shape,
                                                       u_max, v_max,
                                                       batch_size)
    return agent

예제 #3

파일 보기

def init_agent(state_shape, action_shape, u_max, v_max, batch_size):
    u_model = init_u_agent(state_shape, action_shape, u_max)
    v_model = init_v_agent(state_shape, action_shape, v_max)
    v_network = Seq_Network([state_shape, 50, 50, 1], nn.ReLU())
    noise = OUNoise(2,
                    threshold=1,
                    threshold_min=0.002,
                    threshold_decrease=0.003)
    agent = CentralizedDoubleNafAgent(u_model, v_model, v_network, noise,
                                      state_shape, action_shape, u_max, v_max,
                                      batch_size)
    return agent

예제 #4

파일 보기

def init_u_agent(state_shape, action_shape, action_max, batch_size):
    mu_model = Seq_Network([state_shape, 32, 32, action_shape], nn.ReLU(),
                           nn.Tanh())
    p_model = Seq_Network([state_shape, 32, 32, action_shape**2], nn.ReLU())
    v_model = Seq_Network([state_shape, 32, 32, 1], nn.ReLU())
    noise = OUNoise(1,
                    threshold=1,
                    threshold_min=0.001,
                    threshold_decrease=0.003)
    agent = DoubleNAFAgent(mu_model,
                           p_model,
                           v_model,
                           noise,
                           state_shape,
                           action_shape,
                           action_max,
                           batch_size,
                           gamma=0.999)
    return agent

예제 #5

파일 보기

파일: nonlinear_problem.py 프로젝트: imm-rl-lab/differential-games-rl

from problems.nonlinear_problem.optimal_agent import OptimalAgent
from utilities.noises import OUNoise
from utilities.sequentialNetwork import Seq_Network

env = NonlinearProblem()
state_shape = 2
action_shape = 1
episodes_n = 250

mu_model = Seq_Network([state_shape, 100, 100, 100, action_shape],
                       nn.Sigmoid())
p_model = Seq_Network([state_shape, 100, 100, 100, action_shape**2],
                      nn.Sigmoid())
v_model = Seq_Network([state_shape, 100, 100, 100, 1], nn.Sigmoid())
noise = OUNoise(action_shape,
                threshold=1,
                threshold_min=0.001,
                threshold_decrease=0.004)
batch_size = 200
agent = SimpleNaf(mu_model, v_model, noise, state_shape, action_shape,
                  batch_size, 1)


def play_and_learn(env):
    total_reward = 0
    state = env.reset()
    done = False
    step = 0
    while not done:
        action = agent.get_action(state)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward