Python QLearning.action_to_maximise_qの例

プログラミング言語: Python

名前空間/パッケージ名: q_learning

クラス/型: QLearning

メソッド/関数: action_to_maximise_q

hotexamples.comのコード掲載数: 2

Python QLearning.action_to_maximise_q - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのq_learning.QLearning.action_to_maximise_qの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

QLearning(30)

update(4)

run(3)

train(2)

load_table(2)

learn(2)

greedy_probability_policy(2)

get_policy(2)

get_action(2)

perform_sim_step(2)

set_general_state_action_values(2)

save_q_matrix(2)

choose_action(2)

action_to_maximise_q(2)

choose(2)

train_model(1)

reset_state(1)

plot_progress(1)

predict(1)

previous_action_idx(1)

previous_digitized_state(1)

q_table(1)

read_q_matrix(1)

update_state_action_function(1)

reset_epsilon(1)

reset_reward(1)

update_reward(1)

update_Qtable(1)

update_q_value(1)

transfer_model(1)

run_multiple_episodes(1)

sample_from_experience(1)

update_Q(1)

save(1)

save_experience(1)

save_q_model(1)

save_table(1)

select_action(1)

solve(1)

step(1)

test(1)

plot_avg_cost(1)

lr(1)

perform_lr_decay(1)

episode_companies_3(1)

action_values(1)

add_new_state(1)

assimilar(1)

best_action(1)

calc_new_q_value(1)

コード例 #1

ファイルを表示

num_bins = [3, 20, 3, 6, 6, 6, 3, 3]
num_pos_actions = len(actions)

q_learning = QLearning(env=env,
                       num_bins=num_bins,
                       num_pos_actions=num_pos_actions,
                       env_ranges=env_ranges,
                       discount=0,
                       episodes=0,
                       epsilon=None,
                       lr=None,
                       USE=True)

env = gym.make('LunarLander-v2')
q_learning.q_table = np.load('./data_lunarlander/0_9000.npy')

for _ in range(10):

    obs = q_learning.reset_state()  # Reset the environment and get the initial

    done = False
    while not done:

        action = q_learning.action_to_maximise_q(obs)
        obs, reward, done = q_learning.perform_sim_step(action)
        print(obs, reward, done)
        q_learning.env.render()

env.close()

コード例 #2

ファイルを表示

ファイル: cartpole_q_learning.py プロジェクト: sean578/reinforcement_learning

epsilon = [0.5, 1.0,
           episodes // 2]  # Epsilon start, start decay index, stop decay index
lr = [0.5, 1.0, episodes // 2
      ]  # Learning rate start, start decay index, stop decay index

q_learning = QLearning(env, num_bins, num_pos_actions, env_ranges, discount,
                       episodes, epsilon, lr)

print('q-table shape', q_learning.q_table.shape)

obs = q_learning.reset_state()  # Reset the environment and get the initial
obs = [obs[i] for i in obs_to_use]
print('\nInitial observation:', obs)

action_to_maximise_q = q_learning.action_to_maximise_q(
    obs)  # Find optimal action
action = q_learning.decide_on_action(
    action_to_maximise_q)  # Decide whether to use optimal or random action
observation, reward_current, done = q_learning.perform_sim_step(
    action)  # env.step(action)  # Perform the first action

NUM_TO_SHOW = 5
rewards = []

while q_learning.episode < q_learning.episodes:

    reward_sum = 0

    if not q_learning.episode % (episodes // NUM_TO_SHOW):
        render = True
        print('episode, learning_rate, epsilon', q_learning.episode,