Python PolicyGradient.ep 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: policy_gradient

클래스/타입: PolicyGradient

메소드/함수: ep

hotexamples.com에서의 예제들: 2

Python PolicyGradient.ep - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 policy_gradient.PolicyGradient.ep에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

PolicyGradient(30)

learn(14)

store_transition(13)

choose_action(11)

plot_cost(3)

choose_action1(2)

ep(2)

get_distribution(2)

solve_environment(1)

run_simulation(1)

run(1)

quiet(1)

learning(1)

plot(1)

paper(1)

multi_solve_environment(1)

game_rewards(1)

episode_rewards(1)

discount_rewards(1)

costs(1)

train(1)

예제 #1

파일 보기

currIt = 0
rd = []
plt.ion()
try:
    while True:
        print("Start iteration: ", currIt)
        obs = env.reset()
        print("Step: ", stepIdx)
        print("---obs:", obs)
        while True:
            reward = 0
            matrixOfChanAlloc = np.zeros((nOfenb, nOfchannel))

            stepIdx += 1
            if stepIdx % 100 == 0:
                PG.ep = PG.ep * 0.7
            ax.append(stepIdx)
            print("stepIdx: ", stepIdx)
            # ax.append(stepIdx)
            # ---------------------------------------------------------------------------------------
            observation = []  #环境的观测值，状态observation
            for j in range((int)(len(obs) / 4)):
                #状态
                observation.append([
                    obs[4 * j], obs[4 * j + 1], obs[4 * j + 2], obs[4 * j + 3]
                ])
            action_list = []
            print("obs: ", obs)
            if (len(observation) == 0):
                observation_step = [0, 0, 0, 0]
                ss = observation[k].copy()

예제 #2

파일 보기

plt.ion()

try:
    while True:
        print("Start iteration: ", currIt)
        obs = env.reset()
        print("Step: ", stepIdx)
        print("---obs:", obs)
        flag = False
        while True:
            reward = 0
            matrixOfChanAlloc = np.zeros((nOfenb, nOfchannel))

            stepIdx += 1
            if stepIdx % 100 == 0:
                PG.ep = PG.ep * 0.95

            ax.append(stepIdx)
            print("stepIdx: ", stepIdx)
            print("obs: ", obs)
            observation = []  #环境的观测值，状态observation
            observation, numue = getObservation(observation,
                                                obs)  #将ns3的观测值转为gym可用的形式

            action_list = []  #存储动作的list

            if numue == 0:  #若有效请求数为0，则返回一个空动作
                addaction(0, 0, 0, action_list)
                action_tuple = listTotuple(action_list)
                obs, reward_step, done, info = env.step(
                    action_tuple)  #获取这一eposide的奖励