Python EpsGreedyPolicy.act 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: pyreinforce.acting

클래스/타입: EpsGreedyPolicy

메소드/함수: act

hotexamples.com에서의 예제들: 2

Python EpsGreedyPolicy.act - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 pyreinforce.acting.EpsGreedyPolicy.act에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

EpsGreedyPolicy(5)

act(2)

seed(1)

자주 사용되는 메소드들

EpsGreedyPolicy (5)

act (2)

seed (1)

예제 #1

파일 보기

파일: test_acting.py 프로젝트: aserhiychuk/pyreinforce

class EpsGreedyPolicyTest(unittest.TestCase):
    def setUp(self):
        self._eps = 0.1
        self._acting = EpsGreedyPolicy(self._eps)

    def test_seed(self):
        seed = 123
        eps = 0.7
        lowest_q = 1
        highest_q = 10
        n_actions = 10
        n_qs = 1000

        acting1 = EpsGreedyPolicy(eps)
        acting1.seed(seed)

        acting2 = EpsGreedyPolicy(eps)
        acting2.seed(seed)

        qs = [
            np.random.uniform(lowest_q, highest_q, size=(1, n_actions))
            for _ in range(n_qs)
        ]

        for q in qs:
            a1 = acting1.act(q)
            a2 = acting2.act(q)

            self.assertEqual(a1, a2)

    def test_act(self):
        n_total = 10000
        lowest_q = 1
        highest_q = 10
        n_actions = 100

        n_max_q = 0
        n_random = 0

        for _ in range(n_total):
            q = np.random.uniform(lowest_q, highest_q, size=(1, n_actions))
            arg_max = np.argmax(q)
            action = self._acting.act(q)

            if arg_max == action:
                n_max_q += 1
            else:
                n_random += 1

        actual = n_random / n_total

        max_deviation = 0.1
        actual_deviation = abs((self._eps - actual) / self._eps)

        self.assertLess(actual_deviation, max_deviation)

예제 #2

파일 보기

파일: test_acting.py 프로젝트: aserhiychuk/pyreinforce

    def test_seed(self):
        seed = 123
        eps = 0.7
        lowest_q = 1
        highest_q = 10
        n_actions = 10
        n_qs = 1000

        acting1 = EpsGreedyPolicy(eps)
        acting1.seed(seed)

        acting2 = EpsGreedyPolicy(eps)
        acting2.seed(seed)

        qs = [
            np.random.uniform(lowest_q, highest_q, size=(1, n_actions))
            for _ in range(n_qs)
        ]

        for q in qs:
            a1 = acting1.act(q)
            a2 = acting2.act(q)

            self.assertEqual(a1, a2)