Python DDPG.predict 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: duckietown_rl.ddpg

클래스/타입: DDPG

메소드/함수: predict

hotexamples.com에서의 예제들: 2

Python DDPG.predict - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 duckietown_rl.ddpg.DDPG.predict에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

DDPG(4)

load(4)

predict(2)

train(2)

actor_target(1)

critic(1)

critic_target(1)

save(1)

select_action(1)

예제 #1

파일 보기

파일: 3-test-ddpg-cnn.py 프로젝트: krrish94/challenge-aido1_LF1-baseline-RL-sim-pytorch

# Wrappers
env = ResizeWrapper(env)
env = NormalizeWrapper(env)
env = ImgWrapper(env)  # to make the images from 160x120x3 into 3x160x120
env = ActionWrapper(env)
# env = DtRewardWrapper(env) # not during testing

state_dim = env.observation_space.shape
action_dim = env.action_space.shape[0]
max_action = float(env.action_space.high[0])

# Initialize policy
policy = DDPG(state_dim, action_dim, max_action, net_type="cnn")

policy.load(file_name, directory="./pytorch_models")

with torch.no_grad():
    while True:
        obs = env.reset()
        env.render()
        rewards = []
        while True:
            action = policy.predict(np.array(obs))
            obs, rew, done, misc = env.step(action)
            rewards.append(rew)
            env.render()
            if done:
                break
        print("mean episode reward:", np.mean(rewards))

예제 #2

파일 보기

파일: log.py 프로젝트: chingisooinar/End-to-End-Deep-Learning-Approach-for-Autonomous-Driving-DuckieTown

obs = env.get_features()
EPISODES, STEPS = 20, 1000
DEBUG = False

# please notice
logger = Logger(env, log_file=f'train-{int(EPISODES*STEPS/1000)}k.log')

start_time = time.time()
print(
    f"[INFO]Starting to get logs for {EPISODES} episodes each {STEPS} steps..")
with torch.no_grad():
    # let's collect our samples
    for episode in range(0, EPISODES):
        for steps in range(0, STEPS):
            # we use our 'expert' to predict the next action.
            action = expert.predict(np.array(obs))
            # Apply the action
            observation, reward, done, info = env.step(action)
            # Get features(state representation) for RL agent
            obs = env.get_features()

            if done:
                print(f"#Episode: {episode}\t | #Step: {steps}")
                break

            closest_point, _ = env.closest_curve_point(env.cur_pos,
                                                       env.cur_angle)
            if closest_point is None:
                done = True
                break
            # Cut the horizon: obs.shape = (480,640,3) --> (300,640,3)