Python Policy.weights Examples

Programming Language: Python

Namespace/Package Name: policy

Class/Type: Policy

Method/Function: weights

Examples at hotexamples.com: 1

Python Policy.weights - 1 examples found. These are the top rated real world Python examples of policy.Policy.weights extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Policy(30)

action_prob(20)

__init__(13)

act(12)

checkWin(6)

build_deterministic(5)

action(4)

MakeMove(3)

build(3)

CheckLegal(3)

CRAWLER_NUMBER(2)

query(2)

qFunc(2)

choose_action(2)

fromString(2)

INVALID(2)

epsilonGreedy(2)

check_policy(1)

user(1)

classifier(1)

group(1)

script(1)

set_probability(1)

APPLY_TIME_INTERVAL(1)

actions_probas_from(1)

check(1)

calculate_probs(1)

apply_accumulated_gradients(1)

add_models(1)

B(1)

action_masks(1)

_placeholders(1)

_func(1)

__getitem__(1)

W(1)

TIME_INTERVAL_ST(1)

TIME_INTERVAL_ED(1)

CRAWLER_TYPE(1)

weights(1)

Example #1

Show file

File: lspi.py Project: notokay/eecs_491_project

def lspi(maxiter, epsilon, samples, basis, discount, initial_policy):
    """
    Runs the LSPI algorithm
    """

    iteration = -1
    distance = float('inf')
    policy = initial_policy
    all_policies = [initial_policy]
    
    while (iteration < maxiter) and (distance > epsilon):

        # print the number of iterations
        iteration = iteration + 1
        print ('============================')
        print 'LSPI iteration: %i' % iteration
        if iteration == 0:
            firsttime = 1
        else:
            firsttime = 0

        policy = Policy(policy=policy)

        policy.weights = lstdq(samples, all_policies[iteration], policy)[0]

        diff = policy.weights - all_policies[iteration].weights
        LMAXnorm = LA.norm(diff, np.inf)
        L2norm = LA.norm(diff)

        distance = L2norm

        all_policies.append(policy)

    print '================================'
    if distance > epsilon:
        print 'LSPI finished in %i iterations WITHOUT convergence to a fixed point' % iteration
    else:
        print 'LSPI converged in %i iterations' % iteration
    print
    print 'weights'
    print policy.weights
    print

    return policy, all_policies