Python ActorCriticRNN.config 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: models.actor_critic_rnn

클래스/타입: ActorCriticRNN

메소드/함수: config

hotexamples.com에서의 예제들: 2

Python ActorCriticRNN.config - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 models.actor_critic_rnn.ActorCriticRNN.config에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

load_state_dict(6)

parameters(4)

eval(3)

config(2)

detach_hidden(2)

load_weights(2)

reset_hidden(2)

train(2)

add_frame(1)

get_loss(1)

reset(1)

share_memory(1)

예제 #1

파일 보기

파일: workers.py 프로젝트: dasimagin/rita

def test_worker(args, shared_model, total_steps, optimizer):
    args.environment.clip_rewards = False
    env = make_env(args.environment)

    log_path = '{}/{}'.format(args.train.experiment_folder, 'log.txt')
    logging.basicConfig(filename=log_path, level=logging.INFO)
    logging.info("STARTED TRAINING PROCESS {}".format(time.strftime("%Y.%m.%d_%H:%M", time.localtime())))

    model = ActorCritic(env.observation_space.shape, env.action_space.n)
    model = BaseWrapper(model)
    if (args.train.use_pixel_control or
            args.train.use_reward_prediction):
        model = ExperienceWrapper(model)
    if args.train.use_pixel_control:
        model = PixelControlWrapper(model, args.train.gamma, args.train.pc_coef)
    if args.train.use_reward_prediction:
        model = RewardPredictionWrapper(model, args.train.rp_coef)
    if args.train.use_value_replay:
        model = ValueReplayWrapper(model)
    model.config = args
    model.eval()

    start_time = time.time()

    reward_history = []
    while True:
        model.load_state_dict(shared_model.state_dict())
        if (len(reward_history) + 1) % args.train.save_frequency == 0:
            save_progress(args, model, optimizer, total_steps.value)
        stats = play_game(model, env)
        reward_history.append(stats['total_reward'])

        log_message = (
                'Time {}, num steps {}, FPS {:.0f}, '+
                'curr episode reward {:.2f}, mean episode reward: {:.2f}, '+
                'mean policy loss {:.2f}, mean value loss {:.2f}, '+
                'mean entropy percentage {:.2f}'
            ).format(
            time.strftime("%Hh %Mm %Ss", time.gmtime(time.time() - start_time)),
            total_steps.value,
            total_steps.value / (time.time() - start_time),
            stats['total_reward'],
            np.mean(reward_history[-60:]),
            stats['policy_loss'],
            stats['value_loss'],
            stats['entropy']
        )
        if args.train.use_pixel_control:
            log_message += ', pixel control loss %.2f' %stats['pc_loss']
        if args.train.use_reward_prediction:
            log_message += ', reward prediction loss %.2f' %stats['rp_loss']
        if args.train.use_value_replay:
            log_message += ', value replay loss %.2f' %stats['vr_loss']
        print(log_message)
        logging.info(log_message)
        time.sleep(60)

예제 #2

파일 보기

            config.environment.episode_length_sec, 60)
        config.environment.prev_frame_h = config.environment.frame_h
        config.environment.prev_frame_w = config.environment.frame_w
        config.environment.frame_h = max(config.environment.frame_h, 256)
        config.environment.frame_w = max(config.environment.frame_w, 256)
    env = make_env(config.environment, recording=True)
    model = ActorCritic(env.observation_space.shape, env.action_space.n)
    model = BaseWrapper(model)
    if (config.train.use_pixel_control or config.train.use_reward_prediction):
        model = ExperienceWrapper(model)
    if config.train.use_pixel_control:
        model = PixelControlWrapper(model, config.train.gamma,
                                    config.train.pc_coef)
    if config.train.use_reward_prediction:
        model = RewardPredictionWrapper(model, config.train.rp_coef)
    if config.train.use_value_replay:
        model = ValueReplayWrapper(model)
    model.config = config
    if cmd_args.pretrained_weights is not None:
        model.load_state_dict(torch.load(cmd_args.pretrained_weights))
    else:
        print(
            "You have not specified path to model weigths, random plays will be performed"
        )
    model.eval()
    results = record_video(model, env)
    log_message = "evaluated on pretrained weights: {}, results: {}".format(
        cmd_args.pretrained_weights, results)
    print(log_message)
    logging.info(log_message)