Python PG.store_transition 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: Agent

클래스/타입: PG

메소드/함수: store_transition

hotexamples.com에서의 예제들: 3

Python PG.store_transition - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 Agent.PG.store_transition에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

load_model(7)

PG(4)

save_model(4)

store_transition(3)

agent_store_transition(2)

memory(2)

select_action(2)

REINFORCE(1)

agent_REINFORCE(1)

agent_choose_action(1)

get_action(1)

get_policy_prob(1)

get_values(1)

train_episodes(1)

예제 #1

파일 보기

                
                
                # update model based module
                action_np_vec = np.zeros([1,6])
                action_np_vec[0,action-1] = 1.
                action_vec = torch.from_numpy(action_np_vec).float().cuda()
                current_state_action = torch.cat([state,action_vec],1)
                BBN_dynamic.train(current_state_action, next_state)
                
                # info gain
                hyperparameters = BBN_dynamic.dump_hyparameters()
                info_gain = BBN_dynamic.get_info_gain(hyperparameters, pre_hyperparameters)           
                
             
            # Store the transition in memory   
            agent.store_transition(state, action-1, reward+info_gain*ratio*(1-epoch/epochs))

 

            print('epoch: %d, image: %d, step: %d, reward: %d' %(epoch ,i, step, reward))    

            
            # Move to the next state
            state = next_state

            # Perform the optimization 
            if done:
               print("updating model !")
               agent.REINFORCE()
               print("finish updating model !")
               break

예제 #2

파일 보기

파일: Run_PG.py 프로젝트: johanesn/Wei-Lin-Liao

            else:
                offset, region_image, size_mask, region_mask = get_crop_image_and_mask(
                    original_shape, offset, region_image, size_mask, action)
                # update history vector and get next state
                history_vector = update_history_vector(history_vector, action)
                next_state = get_state(region_image, history_vector, model_vgg)

                # find the max bounding box in the region image
                new_iou = find_max_bounding_box(gt_masks, region_mask,
                                                classes_gt_objects,
                                                CLASS_OBJECT)
                reward = get_reward_movement(iou, new_iou)
                iou = new_iou

            # Store the transition in memory
            agent.store_transition(state, action - 1, reward)

            print('epoch: %d, image: %d, step: %d, reward: %d' %
                  (epoch, i, step, reward))

            # Move to the next state
            state = next_state

            # Perform the optimization
            if done:
                print("updating model !")
                agent.REINFORCE()
                print("finish updating model !")
                break

        #==================== loop of training procedure ==========================================#

예제 #3

파일 보기

파일: Run_PG_TUC.py 프로젝트: johanesn/Wei-Lin-Liao

                TUC_dynamic.train_enc_dec(state, next_state, action_vec)

                if step > 0:
                    mean, std = TUC_dynamic.dump_z_mean_std(state, action_vec)
                    intrinsic_reward = TUC_dynamic.dump_exploration_reward(
                        pre_mean, pre_std, mean, std)

            if i > 0:
                penalty = TUC_dynamic.dump_regret(state, action - 1)

            #print(intrinsic_reward,penalty)
            #print(ratio_1*((epochs-epoch)/epochs)*intrinsic_reward - ratio_2*(epoch/epochs)*penalty)

            # Store the transition in memory
            agent.store_transition(
                state, action - 1, reward + ratio_1 *
                (epochs - epoch / epochs) * intrinsic_reward - ratio_2 *
                (epoch / epochs) * penalty)

            print('epoch: %d, image: %d, step: %d, reward: %d' %
                  (epoch, i, step, reward))

            # Move to the next state
            state = next_state

            # Perform the optimization
            if done:

                states = torch.cat(agent.states)
                values = torch.tensor(np.expand_dims(np.array(
                    agent.get_values()),
                                                     axis=1),