Python RL.finiteMDP 예제들

프로그래밍 언어: Python

클래스/타입: RL

메소드/함수: finiteMDP

hotexamples.com에서의 예제들: 2

Python RL.finiteMDP - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 RL.finiteMDP 패키지로부터 Outsmart에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

Model(5)

RL(3)

filter_states(3)

fill_missing_sum_states(3)

finiteMDP(2)

FB_GS(2)

Memory(2)

direct_DQN(2)

convert_to_value_function(2)

convert_to_sum_states(2)

choose_action(2)

DQN(1)

draw(1)

createGraph(1)

TrainDQN(1)

QMemory(1)

ReplayMemory(1)

DQN_measurement(1)

QLearning_NN(1)

QLearningTable(1)

QLearn(1)

PolicyGradient(1)

Player(1)

Manager(1)

FB_SimpleCoarseMarkovDecayEA(1)

Env(1)

DeepQNetwork(1)

getEpsilon(1)

예제 #1

파일 보기

Pl[5,0,5]=0.1
Pl[6,0,6]=1
Pl[0,1,0]=1
Pl[1,1,1]=0
Pl[1,1,0]=1
Pl[2,1,1]=1
Pl[3,1,2]=1
Pl[4,1,3]=1
Pl[5,1,4]=1    
Pl[6,1,5]=1
   
Rl = np.zeros((7,2))
Rl[[0,6],:]=1
absorv = np.zeros((7,1))
absorv[[0,6]]=1
fmdp = RL.finiteMDP(7,2,0.9,Pl,Rl,absorv)

J,traj = fmdp.runPolicy(10000,3,poltype = "exploration") #choose this value
data = np.load("Q1.npz")
Qr = fmdp.traces2Q(traj)
if np.sqrt(sum(sum((data['Q1']-Qr)**2)))<1:
    print("Aproximação de Q dentro do previsto. OK\n")
else:
    print("Aproximação de Q fora do previsto. FAILED\n")

J,traj = fmdp.runPolicy(3,3,poltype = "exploitation", polpar = Qr)
if np.sqrt(sum(sum((data['traj2']-traj)**2)))<1:
    print("Trajectória óptima. OK\n")
else:
    print("Trajectória não óptima. FAILED\n")

예제 #2

파일 보기

파일: RL-TestSet2.py 프로젝트: Keyaku/ist-ia-projects

Pl[:, 1, :] = np.array([[0, 0, 1, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 0,
                                                                   1]])

Pl[:, 2, :] = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 0, 1, 0], [0, 0, 1,
                                                                   0]])

Pl[:, 3, :] = np.array([[0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0,
                                                                   1]])

Rl = np.array([[-1, -1, -1, 0], [-1, 0, -1, -1], [-1, -1, -1, 0],
               [-1, 0, -1, 0]])

absorv = np.zeros((4, 1))
absorv[-1] = 1

fmdp = RL.finiteMDP(4, 4, 0.9, Pl, Rl, absorv)

J, traj = fmdp.runPolicy(3000, 0, poltype="exploration")
data = np.load("Q2.npz")
Qr = fmdp.traces2Q(traj)
result = np.sqrt(sum(sum((data['Q1'] - Qr)**2)))
if result < 1:
    print("Aproximação de Q dentro do previsto. OK\n")
else:
    print("Aproximação de Q fora do previsto. FAILED\n")

J, traj = fmdp.runPolicy(3, 1, poltype="exploitation", polpar=Qr)
result = np.sqrt(sum(sum((data['traj2'] - traj)**2)))
if result < 1:
    print("Trajectória óptima. OK\n")
else: