Python LSPI, rlpy 예제들

프로그래밍 언어: Python

클래스/타입: LSPI

hotexamples.com에서의 예제들: 3

Python LSPI - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 LSPI 패키지로부터 rlpy에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

process(1)

representationExpansionLSPI(1)

run_LSPI(1)

예제 #1

파일 보기

파일: LSPI_SARSA.py 프로젝트: irenge/RLPy

class LSPI_SARSA(Agent):
    def __init__(self,representation,policy,domain,logger, lspi_iterations = 5, sample_window = 100, epsilon = 1e-3, re_iterations = 100,initial_alpha =.1, lambda_=0,alpha_decay_mode ='dabney', boyan_N0 = 1000):
        self.SARSA = SARSA(representation, policy, domain,logger, initial_alpha, lambda_,alpha_decay_mode, boyan_N0)
        self.LSPI = LSPI(representation,policy,domain,logger, lspi_iterations, sample_window, epsilon, re_iterations)
        super(LSPI_SARSA,self).__init__(representation,policy,domain,logger)
    def learn(self,s,a,r,ns,na,terminal):
        self.LSPI.process(s,a,r,ns,na,terminal)        
        if self.LSPI.samples_count+1 % self.LSPI.steps_between_LSPI == 0:
            self.LSPI.representationExpansionLSPI()
            if terminal:
                self.episodeTerminated()
        else:
            self.SARSA.learn(s,a,r,ns,na,terminal)

예제 #2

파일 보기

파일: LSPI_SARSA.py 프로젝트: irenge/RLPy

 def __init__(self,representation,policy,domain,logger, lspi_iterations = 5, sample_window = 100, epsilon = 1e-3, re_iterations = 100,initial_alpha =.1, lambda_=0,alpha_decay_mode ='dabney', boyan_N0 = 1000):
     self.SARSA = SARSA(representation, policy, domain,logger, initial_alpha, lambda_,alpha_decay_mode, boyan_N0)
     self.LSPI = LSPI(representation,policy,domain,logger, lspi_iterations, sample_window, epsilon, re_iterations)
     super(LSPI_SARSA,self).__init__(representation,policy,domain,logger)

예제 #3

파일 보기

파일: test.py 프로젝트: canmanietp/approx_pol

# import sampler
# 
# samples = sampler.sample(50)
# print(samples)

import LSPI
import matplotlib.pyplot as plt
import sampler

pi,distances = LSPI.run_LSPI()

plt.plot(distances)
plt.show()

sampler.use_policy(pi)