Python LPLGraph.action_confidenceの例

プログラミング言語: Python

名前空間/パッケージ名: LPLGraph.LPLGraph

クラス/型: LPLGraph

メソッド/関数: action_confidence

hotexamples.comのコード掲載数: 2

Python LPLGraph.action_confidence - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのLPLGraph.LPLGraph.LPLGraph.action_confidenceの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

LPLGraph(3)

action_confidence(2)

update_termination(1)

update_transition(1)

コード例 #1

ファイルを表示

            # Obtain next state hash code
            next_state_hash = simhash.hash(next_state)

            # Update action counter
            act_counter[action.item()] += 1

            # If next state hashed to a different code than the current state, then infer the dominating action,
            #   update causal link, and clear action counter
            if next_state_hash != current_state_hash:
                main_action = np.argmax(act_counter)
                graph.update_transition(current_state_hash, main_action,
                                        next_state_hash)
                act_counter = np.zeros((output_size, ), dtype=np.int32)

            # Take the action confidence with current state hash code as the intrinsic reward
            in_reward = curiosity_weight * graph.action_confidence(
                current_state_hash, action.item())
            # in_reward = curiosity_weight * np.sqrt(in_reward)       # Take the square root of confidence value

            # Record transition in memory
            memory.add_transition(action,
                                  log_prob,
                                  next_state,
                                  extrinsic_reward=reward,
                                  extrinsic_value_estimate=ex_val,
                                  intrinsic_reward=in_reward,
                                  intrinsic_value_estimate=in_val)
            # memory.add_transition(action, log_prob, next_state,
            #                       extrinsic_reward=running_reward if done else 0., extrinsic_value_estimate=ex_val,
            #                       intrinsic_reward=in_reward, intrinsic_value_estimate=in_val)

            # Update current state

コード例 #2

ファイルを表示

            # Obtain next state hash code
            next_state_hash = simhash.hash(next_state)

            # Update action counter
            act_counter[action.item()] += 1

            # If next state hashed to a different code than the current state, then infer the dominating action,
            #   update causal link, and clear action counter
            if next_state_hash != current_state_hash:
                main_action = np.argmax(act_counter)
                graph.update_transition(current_state_hash, main_action,
                                        next_state_hash)
                act_counter = np.zeros((output_size, ), dtype=np.int32)

            # Take the action confidence with current state hash code as the intrinsic reward
            in_reward = graph.action_confidence(current_state_hash,
                                                action.item())
            in_reward = curiosity_weight * np.sqrt(
                in_reward)  # Take the square root of confidence value

            # Record transition in memory
            memory.add_transition(action,
                                  log_prob,
                                  next_state,
                                  extrinsic_reward=reward,
                                  extrinsic_value_estimate=ex_val,
                                  intrinsic_reward=in_reward,
                                  intrinsic_value_estimate=in_val)

            # Update current state
            current_state = next_state
            current_state_hash = next_state_hash