Python TheanoEnv.reset 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: garage.theano.envs

클래스/타입: TheanoEnv

메소드/함수: reset

hotexamples.com에서의 예제들: 3

Python TheanoEnv.reset - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 garage.theano.envs.TheanoEnv.reset에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

TheanoEnv(30)

reset(3)

step(3)

close(2)

initialize(2)

render(1)

예제 #1

파일 보기

 def test_unflatten(self):
     env = TheanoEnv(
         normalize(gym.make('Blackjack-v0'),
                   normalize_reward=True,
                   normalize_obs=True,
                   flatten_obs=False))
     for i in range(10):
         env.reset()
         for e in range(5):
             action = env.action_space.sample()
             next_obs, reward, done, info = env.step(action)
             assert (env.observation_space.flatten(next_obs).shape ==
                     env.observation_space.flat_dim)  # yapf: disable
             if done:
                 break
     env.close()

예제 #2

파일 보기

 def test_flatten(self):
     env = TheanoEnv(
         normalize(gym.make('Pendulum-v0'),
                   normalize_reward=True,
                   normalize_obs=True,
                   flatten_obs=True))
     for i in range(10):
         env.reset()
         for e in range(5):
             env.render()
             action = env.action_space.sample()
             next_obs, reward, done, info = env.step(action)
             assert next_obs.shape == env.observation_space.low.shape
             if done:
                 break
     env.close()

예제 #3

파일 보기

f_train = theano.function(
    inputs=[observations_var, actions_var, advantages_var],
    outputs=None,
    updates=adam(grads, params, learning_rate=learning_rate),
    allow_input_downcast=True)

for _ in range(n_itr):

    paths = []

    for _ in range(N):
        observations = []
        actions = []
        rewards = []

        observation = env.reset()

        for _ in range(T):
            # policy.get_action() returns a pair of values. The second
            # one returns a dictionary, whose values contains
            # sufficient statistics for the action distribution. It
            # should at least contain entries that would be returned
            # by calling policy.dist_info(), which is the non-symbolic
            # analog of policy.dist_info_sym(). Storing these
            # statistics is useful, e.g., when forming importance
            # sampling ratios. In our case it is not needed.
            action, _ = policy.get_action(observation)
            # Recall that the last entry of the tuple stores diagnostic
            # information about the environment. In our case it is not needed.
            next_observation, reward, terminal, _ = env.step(action)
            observations.append(observation)