Exemplos de GridWorldModel em Python

Linguagem de programação: Python

Espaço para nome / nome do pacote: segmentcentroid.tfmodel.MiceModel

Classe / Tipo: GridWorldModel

Exemplos em hotexamples.com: 2

GridWorldModel em Python - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de segmentcentroid.tfmodel.MiceModel.GridWorldModel em Python extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

evalpi(3)

train(2)

GridWorldModel(1)

Métodos Frequentes

evalpi (3)

train (2)

GridWorldModel (1)

Exemplo n.º 1

0

Exibir arquivo

Arquivo: mice_final_option.py Projeto: vrmehta29/DDO

[-0.25, -0.75], [-0.25, -0.75], [-0.25, -0.75], [ 0.75, -0.75], [-0.25, 0.25], [ 0.75, -0.75]]))]] whereas it is discrete array([1.],[0],[0],[0]) for the other experiment1.py code ''' #print (np.shape(full_traj),full_traj) demonstrations = 10 super_iterations = 3000 #10000 sub_iterations = 0 learning_rate = 10 #k=4 in this case. number of primitive options m = GridWorldModel(4, statedim=(12, 2)) m.sess.run(tf.initialize_all_variables()) with tf.variable_scope("optimizer"): opt = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) #define he optimizer, put the full trajectorty, 1000, 0 m.train(opt, full_traj, super_iterations, sub_iterations) '''So how do we generate the visualised options? We can look at a state, and then apply the respective options policy from that state. so how is this done for the gridworld data? It just computes the max of action probabilities over the entire gridworld. Instead of doing that we need to provide the same option policy over continues states until it actally terminates. How do we do this? 1. Find a few good states in the state space. 2. Iterate over the numerb of options and do the sam ething as before 3. till the termination poliy is reached iterate of v evalpi for the same state space

Exemplo n.º 2

0

Exibir arquivo

Arquivo: mice_final_option2.py Projeto: vrmehta29/DDO

[-0.25, -0.75], [-0.25, -0.75], [ 0.75, -0.75], [-0.25, 0.25], [ 0.75, -0.75]]))]] whereas it is discrete array([1.],[0],[0],[0]) for the other experiment1.py code ''' #print (np.shape(full_traj),full_traj) demonstrations=10 super_iterations=1000#3000#10000 sub_iterations=0 learning_rate=10 #k=4 in this case. number of primitive options m = GridWorldModel(4, statedim=(12,2)) m.sess.run(tf.initialize_all_variables()) with tf.variable_scope("optimizer"): opt = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) #define he optimizer, put the full trajectorty, 1000, 0 m.train(opt, full_traj, super_iterations, sub_iterations) '''So how do we generate the visualised options? We can look at a state, and then apply the respective options policy from that state. so how is this done for the gridworld data? It just computes the max of action probabilities over the entire gridworld. Instead of doing that we need to provide the same option policy over continues states until it actally terminates. How do we do this? 1. Find a few good states in the state space. 2. Iterate over the numerb of options and do the sam ething as before