Python Buffer.batch_sample 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: Buffer

클래스/타입: Buffer

메소드/함수: batch_sample

hotexamples.com에서의 예제들: 1

Python Buffer.batch_sample - 1개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 Buffer.Buffer.batch_sample에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Buffer(30)

put_int(8)

get_int(8)

get_str(6)

insert(5)

getItem(5)

sample(5)

add(4)

append(3)

record(3)

put_str(3)

len(3)

getPointer(2)

get_all(2)

get_buff(2)

get_data(2)

accept(2)

hasAtLeast(2)

getBatch(2)

move(2)

push(2)

sendData(2)

getEmptyFlag(2)

sync(2)

clear(2)

add_buff(2)

copy(2)

count(2)

capacity(2)

prepare_packet(1)

buffer(1)

__init__(1)

put(1)

pushItem(1)

getAltitude(1)

possible_start_photo(1)

possible_start_info(1)

readData(1)

offset(1)

new_episode(1)

cargar_buffer(1)

load_buffer(1)

peek(1)

read_byte(1)

read_all(1)

remove_duplicates(1)

setEmptyFlag(1)

send_outoff(1)

add_observer(1)

returnDataTr(1)

예제 #1

파일 보기

파일: Train.py 프로젝트: godanyitamas/ReinforcementLearning

                noise = np.clip(noise, -0.5, 0.5)
                action += noise
                action = np.clip(action, -1, 1)
                action = np.reshape(action, newshape=(2, ))
            # Perform action, and get new information:
            new_state, reward, done, info = env.step(action)
            # Save reward:
            episodic_reward += reward
            # Store new values in buffer:
            action = np.squeeze(action)
            buffer.record((state, action, reward, new_state))
            # Update state with the new one:
            state = new_state
            """ Update / Learn """
            # Sample from the buffer:
            s_batch, a_batch, r_batch, ns_batch = buffer.batch_sample()

            s_batch = tf.convert_to_tensor(s_batch)
            a_batch = tf.convert_to_tensor(a_batch)
            r_batch = tf.convert_to_tensor(r_batch)
            ns_batch = tf.convert_to_tensor(ns_batch)

            # Select action according to the actor/policy:
            next_action = actor_target(ns_batch)
            next_action = np.clip(next_action, -1, 1)

            with tf.GradientTape(persistent=True) as tape:
                # Target Q values via target critic networks (next state, next action)
                q1_ = critic_target([ns_batch, next_action])
                # Choose minimum from these two values for the double Q update rule
                # Calculate actual Q value: