Python InverseProbabilityWeighting.estimate_policy_value 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: obp.ope

메소드/함수: estimate_policy_value

hotexamples.com에서의 예제들: 3

Python InverseProbabilityWeighting.estimate_policy_value - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 obp.ope.InverseProbabilityWeighting.estimate_policy_value에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

InverseProbabilityWeighting(7)

estimate_policy_value(3)

estimate_interval(1)

예제 #1

파일 보기

파일: run_cf_policy_search.py 프로젝트: smart-patrol/zr-obp

 def process(b: int):
     # sample bootstrap from batch logged bandit feedback
     boot_bandit_feedback = obd.sample_bootstrap_bandit_feedback(
         test_size=test_size, is_timeseries_split=True, random_state=b
     )
     # train an evaluation on the training set of the logged bandit feedback data
     action_dist = counterfactual_policy.fit(
         context=boot_bandit_feedback["context"],
         action=boot_bandit_feedback["action"],
         reward=boot_bandit_feedback["reward"],
         pscore=boot_bandit_feedback["pscore"],
         position=boot_bandit_feedback["position"],
     )
     # make action selections (predictions)
     action_dist = counterfactual_policy.predict(
         context=boot_bandit_feedback["context_test"]
     )
     # estimate the policy value of a given counterfactual algorithm by the three OPE estimators.
     ipw = InverseProbabilityWeighting()
     return ipw.estimate_policy_value(
         reward=boot_bandit_feedback["reward_test"],
         action=boot_bandit_feedback["action_test"],
         position=boot_bandit_feedback["position_test"],
         pscore=boot_bandit_feedback["pscore_test"],
         action_dist=action_dist,
     )

예제 #2

파일 보기

파일: test_ipw_estimators.py 프로젝트: aiueola/zr-obp

def test_ipw_using_invalid_input_data(
    action_dist: np.ndarray,
    action: np.ndarray,
    reward: np.ndarray,
    pscore: np.ndarray,
    position: np.ndarray,
    use_estimated_pscore: bool,
    estimated_pscore: np.ndarray,
    description: str,
) -> None:
    # prepare ipw instances
    ipw = InverseProbabilityWeighting(use_estimated_pscore=use_estimated_pscore)
    snipw = SelfNormalizedInverseProbabilityWeighting(
        use_estimated_pscore=use_estimated_pscore
    )
    sgipw = SubGaussianInverseProbabilityWeighting(
        use_estimated_pscore=use_estimated_pscore
    )
    ipw_tuning = InverseProbabilityWeightingTuning(
        lambdas=[10, 1000], use_estimated_pscore=use_estimated_pscore
    )
    sgipw_tuning = SubGaussianInverseProbabilityWeightingTuning(
        lambdas=[0.01, 0.1], use_estimated_pscore=use_estimated_pscore
    )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = ipw.estimate_policy_value(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = ipw.estimate_interval(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = snipw.estimate_policy_value(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = snipw.estimate_interval(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = ipw_tuning.estimate_policy_value(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = ipw_tuning.estimate_interval(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = sgipw.estimate_policy_value(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = sgipw.estimate_interval(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = sgipw_tuning.estimate_policy_value(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )
    with pytest.raises(ValueError, match=f"{description}*"):
        _ = sgipw_tuning.estimate_interval(
            action_dist=action_dist,
            action=action,
            reward=reward,
            pscore=pscore,
            position=position,
            estimated_pscore=estimated_pscore,
        )

예제 #3

파일 보기

파일: run_cf_policy_search.py 프로젝트: kkmogi/zr-obp

        action_dist = evaluation_policy.fit(
            context=boot_bandit_feedback["context"],
            action=boot_bandit_feedback["action"],
            reward=boot_bandit_feedback["reward"],
            pscore=boot_bandit_feedback["pscore"],
            position=boot_bandit_feedback["position"],
        )
        # make action selections (predictions)
        action_dist = evaluation_policy.predict(
            context=boot_bandit_feedback["context_test"])
        # estimate the policy value of a given counterfactual algorithm by the three OPE estimators.
        ipw = InverseProbabilityWeighting()
        ope_results[b] = (ipw.estimate_policy_value(
            reward=boot_bandit_feedback["reward_test"],
            action=boot_bandit_feedback["action_test"],
            position=boot_bandit_feedback["position_test"],
            pscore=boot_bandit_feedback["pscore_test"],
            action_dist=action_dist,
        ) / ground_truth)

        print(
            f"{b+1}th iteration: {np.round((time.time() - start) / 60, 2)}min")
    ope_results_dict = estimate_confidence_interval_by_bootstrap(
        samples=ope_results, random_state=random_state)
    ope_results_dict["mean(no-boot)"] = ope_results.mean()
    ope_results_dict["std"] = np.std(ope_results, ddof=1)
    ope_results_df = pd.DataFrame(ope_results_dict, index=["ipw"])

    # calculate estimated policy value relative to that of the behavior policy
    print("=" * 70)
    print(f"random_state={random_state}: evaluation policy={policy_name}")