Python EmpiricalBellmanResidualMinimization.F_q示例

编程语言: Python

命名空间/包名称: ifqi.algorithms.pbo.ebrm

类/类型: EmpiricalBellmanResidualMinimization

方法/功能: F_q

hotexamples.com的示例: 1

Python EmpiricalBellmanResidualMinimization.F_q - 已找到1个示例。这些是从开源项目中提取的最受好评的ifqi.algorithms.pbo.ebrm.EmpiricalBellmanResidualMinimization.F_q现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

EmpiricalBellmanResidualMinimization(1)

F_bellman_err(1)

F_grad_bellman_berr(1)

F_q(1)

_make_additional_functions(1)

fit(1)

示例#1

显示文件

    discrete_actions = np.array([1., 2., 3.9], dtype=theano.config.floatX).reshape(-1,
                                                                                   1)  # discretization of the actions
    # to be used for maximum estimate
    # print(s,a,nexts,r,discrete_actions)

    q_model = LQRRegressor(theta)  # q-function

    pfpo = EmpiricalBellmanResidualMinimization(q_model=q_model,
                                                discrete_actions=discrete_actions,
                                                gamma=gamma, optimizer="adam",
                                                state_dim=1, action_dim=1)
    start = time()
    pfpo._make_additional_functions()
    print('compilation time: {}'.format(time() - start))

    check_v(pfpo.F_q(s, a), lqr_reg(s, a, [q_model.theta.eval()]))

    print('\n--- checking bellman error')
    berr = pfpo.F_bellman_err(s, a, nexts, r, discrete_actions)
    tv = empirical_bop(s, a, r, nexts, discrete_actions, gamma, lqr_reg, [q_model.theta.eval()])
    check_v(berr, tv, 1)

    print('\n--- checking gradient of the bellman error')
    berr_grad = pfpo.F_grad_bellman_berr(s, a, nexts, r, discrete_actions)
    eps = np.sqrt(np.finfo(float).eps)
    f = lambda x: empirical_bop(s, a, r, nexts, discrete_actions, gamma, lqr_reg, [x])
    approx_grad = optimize.approx_fprime(q_model.theta.eval().ravel(), f, eps).reshape(berr_grad[0].shape)
    check_v(berr_grad, approx_grad, 1)

    print()
    print('--' * 30)