コード例 #1
0
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint with hyperparameters in ``hyperparameter_info``.

        Computes the allocation to each arm based on the given subtype, historical info, and hyperparameter info.

        Works with k-armed bandits (k >= 1).

        The Algorithm: http://en.wikipedia.org/wiki/Multi-armed_bandit#Approximate_solutions

        This method starts with a pure exploration phase, followed by a pure exploitation phase.
        If we have a total of T trials, the first :math:`\epsilon` T trials, we only explore.
        After that, we only exploit (t = :math:`\epsilon` T, :math:`\epsilon` T + 1, ..., T).

        This method will pull a random arm in the exploration phase.
        Then this method will pull the optimal arm (best expected return) in the exploitation phase.

        In case of a tie in the exploitation phase, the method will split the allocation among the optimal arms.

        For example, if we have three arms, two arms (arm1 and arm2) with an average payoff of 0.5
        (``{win:10, lose:10, total:20}``)
        and a new arm (arm3, average payoff is 0 and total is 0).

        Let the epsilon :math:`\epsilon` be 0.1.

        The allocation depends on which phase we are in:

        *Case 1: T = 50*

        Recall that T = number to sample + number sampled. number sampled :math:`= 20 + 20 + 0 = 40`.
        So we are on trial #41. We explore the first :math:`\epsilon T = 0.1 * 50 = 5` trials
        and thus we are in the exploitation phase. We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        *Case 2: T = 500*

        We explore the first :math:`\epsilon T = 0.1 * 500 = 50` trials.
        Since we are on trail #41, we are in the exploration phase. We choose arms randomly:

        ``{arm1: 0.33, arm2: 0.33, arm3: 0.33}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled

        if not arms_sampled:
            raise ValueError('sample_arms is empty!')

        num_sampled = sum([sampled_arm.total for sampled_arm in arms_sampled.values()])
        # Exploration phase, trials 1,2,..., epsilon * T
        # Allocate equal probability to all arms
        if num_sampled < self._total_samples * self._epsilon:
            return get_equal_arm_allocations(arms_sampled)

        # Exploitation phase, trials epsilon * T+1, ..., T
        return get_equal_arm_allocations(arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #2
0
ファイル: epsilon_first.py プロジェクト: Allensmile/MOE
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint with hyperparameters in ``hyperparameter_info``.

        Computes the allocation to each arm based on the given subtype, historical info, and hyperparameter info.

        Works with k-armed bandits (k >= 1).

        The Algorithm: http://en.wikipedia.org/wiki/Multi-armed_bandit#Approximate_solutions

        This method starts with a pure exploration phase, followed by a pure exploitation phase.
        If we have a total of T trials, the first :math:`\epsilon` T trials, we only explore.
        After that, we only exploit (t = :math:`\epsilon` T, :math:`\epsilon` T + 1, ..., T).

        This method will pull a random arm in the exploration phase.
        Then this method will pull the optimal arm (best expected return) in the exploitation phase.

        In case of a tie in the exploitation phase, the method will split the allocation among the optimal arms.

        For example, if we have three arms, two arms (arm1 and arm2) with an average payoff of 0.5
        (``{win:10, lose:10, total:20}``)
        and a new arm (arm3, average payoff is 0 and total is 0).

        Let the epsilon :math:`\epsilon` be 0.1.

        The allocation depends on which phase we are in:

        *Case 1: T = 50*

        Recall that T = number to sample + number sampled. number sampled :math:`= 20 + 20 + 0 = 40`.
        So we are on trial #41. We explore the first :math:`\epsilon T = 0.1 * 50 = 5` trials
        and thus we are in the exploitation phase. We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        *Case 2: T = 500*

        We explore the first :math:`\epsilon T = 0.1 * 500 = 50` trials.
        Since we are on trail #41, we are in the exploration phase. We choose arms randomly:

        ``{arm1: 0.33, arm2: 0.33, arm3: 0.33}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled

        if not arms_sampled:
            raise ValueError('sample_arms is empty!')

        num_sampled = sum([sampled_arm.total for sampled_arm in arms_sampled.itervalues()])
        # Exploration phase, trials 1,2,..., epsilon * T
        # Allocate equal probability to all arms
        if num_sampled < self._total_samples * self._epsilon:
            return get_equal_arm_allocations(arms_sampled)

        # Exploitation phase, trials epsilon * T+1, ..., T
        return get_equal_arm_allocations(arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #3
0
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint.

        Computes the allocation to each arm based on the given subtype, and, historical info.

        Works with k-armed bandits (k >= 1).

        The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART.
        The original algorithm handles k = 2. We extended the algorithm naturally to handle k >= 1.

        This method will pull the optimal arm (best BLA payoff).

        See :func:`moe.bandit.bla.BLA.get_bla_payoff` for details on how to compute the BLA payoff

        In case of a tie, the method will split the allocation among the optimal arms.
        For example, if we have three arms (arm1, arm2, and arm3) with expected BLA payoff 0.5, 0.5, and 0.1 respectively.
        We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled
        if not arms_sampled:
            raise ValueError('sample_arms are empty!')

        return get_equal_arm_allocations(
            arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #4
0
ファイル: bla.py プロジェクト: Recmo/MOE
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint.

        Computes the allocation to each arm based on the given subtype, and, historical info.

        Works with k-armed bandits (k >= 1).

        The Algorithm is from the paper: A Generic Solution to Multi-Armed Bernoulli Bandit Problems, Norheim, Bradland, Granmo, OOmmen (2010) ICAART.
        The original algorithm handles k = 2. We extended the algorithm naturally to handle k >= 1.

        This method will pull the optimal arm (best BLA payoff).

        See :func:`moe.bandit.bla.BLA.get_bla_payoff` for details on how to compute the BLA payoff

        In case of a tie, the method will split the allocation among the optimal arms.
        For example, if we have three arms (arm1, arm2, and arm3) with expected BLA payoff 0.5, 0.5, and 0.1 respectively.
        We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled
        if not arms_sampled:
            raise ValueError('sample_arms are empty!')

        return get_equal_arm_allocations(arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #5
0
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint.

        Computes the allocation to each arm based on the given subtype, and, historical info.

        Works with k-armed bandits (k >= 1).

        The Algorithm: http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf

        If there is at least one unsampled arm, this method will choose to pull the unsampled arm
        (randomly choose an unsampled arm if there are multiple unsampled arms).
        If all arms are pulled at least once, this method will pull the optimal arm
        (best expected upper confidence bound payoff).

        See :func:`moe.bandit.ucb.ucb_interface.UCBInterface.get_ucb_payoff` for details on how to compute the expected upper confidence bound payoff (expected UCB payoff)

        In case of a tie, the method will split the allocation among the optimal arms.
        For example, if we have three arms (arm1, arm2, and arm3) with expected UCB payoff 0.5, 0.5, and 0.1 respectively.
        We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled
        if not arms_sampled:
            raise ValueError('sample_arms are empty!')

        return get_equal_arm_allocations(
            arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #6
0
ファイル: ucb_interface.py プロジェクト: Allensmile/MOE
    def allocate_arms(self):
        r"""Compute the allocation to each arm given ``historical_info``, running bandit ``subtype`` endpoint.

        Computes the allocation to each arm based on the given subtype, and, historical info.

        Works with k-armed bandits (k >= 1).

        The Algorithm: http://moodle.technion.ac.il/pluginfile.php/192340/mod_resource/content/0/UCB.pdf

        If there is at least one unsampled arm, this method will choose to pull the unsampled arm
        (randomly choose an unsampled arm if there are multiple unsampled arms).
        If all arms are pulled at least once, this method will pull the optimal arm
        (best expected upper confidence bound payoff).

        See :func:`moe.bandit.ucb.ucb_interface.UCBInterface.get_ucb_payoff` for details on how to compute the expected upper confidence bound payoff (expected UCB payoff)

        In case of a tie, the method will split the allocation among the optimal arms.
        For example, if we have three arms (arm1, arm2, and arm3) with expected UCB payoff 0.5, 0.5, and 0.1 respectively.
        We split the allocation between the optimal arms arm1 and arm2.

        ``{arm1: 0.5, arm2: 0.5, arm3: 0.0}``

        :return: the dictionary of (arm, allocation) key-value pairs
        :rtype: a dictionary of (str, float64) pairs
        :raise: ValueError when ``sample_arms`` are empty.

        """
        arms_sampled = self._historical_info.arms_sampled
        if not arms_sampled:
            raise ValueError('sample_arms are empty!')

        return get_equal_arm_allocations(arms_sampled, self.get_winning_arm_names(arms_sampled))
コード例 #7
0
ファイル: utils_test.py プロジェクト: sai-nirish/MOE
 def test_get_equal_arm_allocations_no_winner(self):
     """Test allocations split among all sample arms when there is no winner."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(
             self.two_unsampled_arms_test_case.arms_sampled), {
                 "arm1": 0.5,
                 "arm2": 0.5
             })
コード例 #8
0
ファイル: utils_test.py プロジェクト: sai-nirish/MOE
 def test_get_equal_arm_allocations_one_winner(self):
     """Test all allocation given to the winning arm."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(self.three_arms_test_case.arms_sampled,
                                   frozenset(["arm1"])), {
                                       "arm1": 1.0,
                                       "arm2": 0.0,
                                       "arm3": 0.0
                                   })
コード例 #9
0
ファイル: utils_test.py プロジェクト: sai-nirish/MOE
 def test_get_equal_arm_allocations_two_winners(self):
     """Test allocations split between two winning arms."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(
             self.three_arms_two_winners_test_case.arms_sampled,
             frozenset(["arm1", "arm2"])), {
                 "arm1": 0.5,
                 "arm2": 0.5,
                 "arm3": 0.0
             })
コード例 #10
0
ファイル: utils_test.py プロジェクト: Recmo/MOE
 def test_get_equal_arm_allocations_two_winners(self):
     """Test allocations split between two winning arms."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(self.three_arms_two_winners_test_case.arms_sampled, frozenset(["arm1", "arm2"])),
         {"arm1": 0.5, "arm2": 0.5, "arm3": 0.0},
     )
コード例 #11
0
ファイル: utils_test.py プロジェクト: Recmo/MOE
 def test_get_equal_arm_allocations_one_winner(self):
     """Test all allocation given to the winning arm."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(self.three_arms_test_case.arms_sampled, frozenset(["arm1"])),
         {"arm1": 1.0, "arm2": 0.0, "arm3": 0.0},
     )
コード例 #12
0
ファイル: utils_test.py プロジェクト: Recmo/MOE
 def test_get_equal_arm_allocations_no_winner(self):
     """Test allocations split among all sample arms when there is no winner."""
     T.assert_dicts_equal(
         get_equal_arm_allocations(self.two_unsampled_arms_test_case.arms_sampled), {"arm1": 0.5, "arm2": 0.5}
     )
コード例 #13
0
ファイル: utils_test.py プロジェクト: jdc08161063/qKG
 def test_get_equal_arm_allocations_empty_arm_invalid(self):
     """Test empty ``arms_sampled`` causes an ValueError."""
     with pytest.raises(ValueError):
         get_equal_arm_allocations({})