Python RL.RL示例

编程语言: Python

类/类型: RL

方法/功能: RL

hotexamples.com的示例: 3

Python RL.RL - 已找到3个示例。这些是从开源项目中提取的最受好评的RL.RL 来自程序包 Outsmart现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

Model(5)

RL(3)

filter_states(3)

fill_missing_sum_states(3)

finiteMDP(2)

FB_GS(2)

Memory(2)

direct_DQN(2)

convert_to_value_function(2)

convert_to_sum_states(2)

choose_action(2)

DQN(1)

draw(1)

createGraph(1)

TrainDQN(1)

QMemory(1)

ReplayMemory(1)

DQN_measurement(1)

QLearning_NN(1)

QLearningTable(1)

QLearn(1)

PolicyGradient(1)

Player(1)

Manager(1)

FB_SimpleCoarseMarkovDecayEA(1)

Env(1)

DeepQNetwork(1)

getEpsilon(1)

示例#1

显示文件

文件： main.py 项目： kottkech/TicTacToeNN

def train():
    nn = RL.RL([squ, 10 * squ, 10 * squ, 10 * squ, squ])
    RL.train(False, nn)

示例#2

显示文件

# set rewards
R[:, 15] = 100
# goal state
R[:, 9] = -70
# bad state
R[:, 16] = 0
# end state

# Discount factor: scalar in [0,1)
discount = 0.95

# MDP object
mdp = MDP.MDP(T, R, discount)

# RL problem
rlProblem = RL.RL(mdp, np.random.normal)

# Test Q-learning
print("\nepsilon = 0.05")
Q = np.zeros([mdp.nActions, mdp.nStates])
policy = np.zeros(mdp.nStates, int)
c_reward = np.zeros(200)
for i in range(100):
    [Q_t, policy_t, cum_reward_t
     ] = rlProblem.qLearning(s0=0,
                             initialQ=np.zeros([mdp.nActions, mdp.nStates]),
                             nEpisodes=200,
                             nSteps=100,
                             epsilon=0.05)
    Q += Q_t
    c_reward += cum_reward_t

示例#3

显示文件

文件： main_tp1_ex2.py 项目： nymphias/mva_rl

from gridworld import GridWorld1
import RL
import gridrender as gui
import matplotlib.pyplot as plt
import numpy as np
import time

################################################################################
# Initialization
################################################################################

env = GridWorld1
n_states = env.n_states
n_actions = len(env.action_names)

model = RL.RL(env)

# Estimating initial state distribution
n_start = 10000
model.estimate_start_distribution(n_start)
print(
    f'Estimated start state distribution is {model.mu} after {n_start} throws')

# Computing Tmax such that the discounted truncated sum of rewards is delta-closed to the infinite sum
delta = 0.01
tmax = -int(np.log(delta) / (1 - env.gamma))
print(f'Tmax (max number of iterations in an episode) is chosen as : {tmax}')

################################################################################
# Q4: Policy evaluation
################################################################################