Python DQN.chooseAction Exemples

Langage de programmation: Python

Espace de nommage/Pack: model

Class/Type: DQN

Méthode/Fonction: chooseAction

Exemples au hotexamples.com: 2

Python DQN.chooseAction - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de model.DQN.chooseAction extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

DQN(30)

load_state_dict(30)

parameters(30)

train(30)

eval(30)

state_dict(30)

get_action(25)

init_state(20)

remember(19)

cuda(18)

reset_noise(14)

update_target_network(13)

zero_grad(12)

save(12)

load(11)

act(10)

to(8)

predict(8)

sample_noise(5)

forward(5)

update_noisy_modules(4)

update(4)

sample_action(3)

chooseAction(2)

update_target(2)

share_memory(2)

save_model(2)

initState(2)

getAction(2)

learn(2)

optimize(2)

perceive(1)

train_Xy(1)

store_path(1)

store_transition(1)

copy2target(1)

to_gpu(1)

choose_action(1)

train_net(1)

step(1)

train_step(1)

memory(1)

updateTargetNetwork(1)

fit(1)

forward_with_latent(1)

apply(1)

store(1)

copy_from(1)

pick_action(1)

reset_model(1)

Méthodes fréquemment utilisées

DQN (30)

load_state_dict (30)

parameters (30)

train (30)

eval (30)

state_dict (30)

get_action (25)

init_state (20)

remember (19)

cuda (18)

Méthodes fréquemment utilisées

reset_noise (14)

update_target_network (13)

zero_grad (12)

save (12)

load (11)

act (10)

to (8)

predict (8)

sample_noise (5)

forward (5)

update_noisy_modules (4)

update (4)

sample_action (3)

chooseAction (2)

update_target (2)

share_memory (2)

save_model (2)

initState (2)

getAction (2)

learn (2)

Méthodes fréquemment utilisées

update_noisy_modules (4)

update (4)

sample_action (3)

chooseAction (2)

update_target (2)

share_memory (2)

save_model (2)

initState (2)

getAction (2)

learn (2)

optimize (2)

perceive (1)

train_Xy (1)

store_path (1)

store_transition (1)

copy2target (1)

to_gpu (1)

choose_action (1)

train_net (1)

step (1)

train_step (1)

memory (1)

updateTargetNetwork (1)

fit (1)

forward_with_latent (1)

apply (1)

store (1)

copy_from (1)

pick_action (1)

reset_model (1)

Méthodes fréquemment utilisées

optimize (2)

perceive (1)

train_Xy (1)

store_path (1)

store_transition (1)

copy2target (1)

to_gpu (1)

choose_action (1)

train_net (1)

step (1)

train_step (1)

memory (1)

updateTargetNetwork (1)

fit (1)

forward_with_latent (1)

apply (1)

store (1)

copy_from (1)

pick_action (1)

reset_model (1)

plot_cost (1)

epsilon (1)

record (1)

episodeDecay (1)

reset_batch_noise (1)

reset_e_greedy_epsion (1)

ep_decay (1)

modules (1)

restore (1)

evaluate (1)

dump_weights (1)

save_memory (1)

named_parameters (1)

select_action (1)

zero_noise (1)

Exemple #1

0

Afficher le fichier

def main(): # Create carpole environment and network env = gym.make('CartPole-v0').unwrapped if not os.path.exists(model_path): raise Exception("You should train the DQN first!") net = DQN(n_state=env.observation_space.shape[0], n_action=env.action_space.n, epsilon=epsilon, batch_size=batch_size, model_path=model_path) net.load() net.cuda() reward_list = [] for i in range(episode): s = env.reset() total_reward = 0 while True: # env.render() # Select action and obtain the reward a = net.chooseAction(s) s_, r, finish, _ = env.step(a) total_reward += r if finish: print("Episode: %d \t Total reward: %d \t Eps: %f" % (i, total_reward, net.epsilon)) reward_list.append(total_reward) break s = s_ env.close() print("Testing average reward: ", np.mean(reward_list))

Exemple #2

0

Afficher le fichier

Fichier : train.py Projet : amirunpri2018/NCTU_deep_learning_and_practice_sunner

epsilon=epsilon, epsilon_decay=epsilon_decay, update_iter=update_iter, batch_size=batch_size, gamma=gamma, model_path=model_path) net.cuda() net.load() reward_list = [] for i in range(episode): s = env.reset() total_reward = 0 while True: # env.render() # Select action and obtain the reward a = net.chooseAction(s) s_, r, finish, info = env.step(a) # Record the total reward total_reward += r # Revised the reward if finish: # 如果遊戲已結束，則將reward設為0以讓網路收斂 r = 0 else: # ---------------------------------------------------- # 拆解reward，更精準的給予環境需要的訊息 # 1. r1得到的是對於距離的資訊， # -abs term代表鼓勵agent不要去移動車子， # 一直維持在中間才能獲得很高的獎賞！