Python TUC.train_criticの例

プログラミング言語: Python

名前空間/パッケージ名: Agent

クラス/型: TUC

メソッド/関数: train_critic

hotexamples.comのコード掲載数: 2

Python TUC.train_critic - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのAgent.TUC.train_criticの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

TUC(3)

load_model(3)

dump_regret(2)

dump_z_mean_std(2)

train_critic(2)

dump_exploration_reward(1)

dump_z(1)

save_model(1)

train_enc_dec(1)

train_tuc(1)

コード例 #1

ファイルを表示

ファイル: Run_Cart_Pole_TUC_KL_penalty_5.py プロジェクト: johanesn/Wei-Lin-Liao

            done = 1
             
          if done:
            
            # Record total reward
            EPs_total_reward.append(EP_reward_sum)            

            
            states = np.vstack(states)
            next_states = np.vstack(next_states)
            actions = np.vstack(actions)
            values = get_return(values,0.95)
            values = np.vstack(values)
            
            for ve in range(CRITIC_EPOCHS):                  
                critic_loss = tuc.train_critic(states, values, actions)     
 
            states = []
            next_states = []
            actions = []
            values = []
            
            # Print total reward          
            print("PG episode : {0: <5} , total reward : {1: <5}".format(EP,EP_reward_sum))
            
   
            # Stop to train agent
            PG_agent.agent_REINFORCE()
           
            break

コード例 #2

ファイルを表示

ファイル: Run_PG_TUC.py プロジェクト: johanesn/Wei-Lin-Liao

            # Move to the next state
            state = next_state

            # Perform the optimization
            if done:

                states = torch.cat(agent.states)
                values = torch.tensor(np.expand_dims(np.array(
                    agent.get_values()),
                                                     axis=1),
                                      dtype=torch.float32)

                actions_matrix = torch.cat(actions_matrix, 0).cuda()
                #print(len(actions_matrix))
                for e in range(3):
                    TUC_dynamic.train_critic(states.cuda(), values.cuda(),
                                             actions_matrix.cuda())

                print("updating agent !")
                agent.REINFORCE()

                break

        #==================== loop of training procedure ==========================================#

    time_cost = time.time() - now
    print('epoch = %d, time_cost = %.4f' % (epoch, time_cost))

# save the whole model
agent.save_model("./model_final/pg_TUC_agent_2")
TUC_dynamic.save_model("./model_final/pg_TUC_2")
print('Complete')