Python MancalaEnv.compute_end_game_rewardの例

プログラミング言語: Python

名前空間/パッケージ名: magent.mancala

クラス/型: MancalaEnv

メソッド/関数: compute_end_game_reward

hotexamples.comのコード掲載数: 2

Python MancalaEnv.compute_end_game_reward - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのmagent.mancala.MancalaEnv.compute_end_game_rewardの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

MancalaEnv(10)

clone(7)

get_legal_moves(4)

perform_move(3)

compute_end_game_reward(2)

get_action_mask_with_no_pie(2)

is_game_over(2)

is_legal(2)

next_states(2)

get_player_utility(1)

get_winner(1)

is_legal_action(1)

make_move(1)

コード例 #1

ファイルを表示

ファイル: rollout_policy.py プロジェクト: crisbodnar/KalahAI

 def backpropagate(self, root: Node, final_state: MancalaEnv):
     """
     backpropgate pushes the reward (pay/visits) to the parents node up to the root
     :param root: starting node to backpropgate from
     :param final_state: the state of final node (holds final reward from the simulation)
     """
     node = root
     # propagate node reward to parents'
     while node is not None:
         side = node.parent.state.side_to_move if node.parent is not None else node.state.side_to_move  # root node
         node.update(final_state.compute_end_game_reward(side))
         node = node.parent

コード例 #2

ファイルを表示

ファイル: rollout_policy.py プロジェクト: crisbodnar/KalahAI

 def backpropagate(self, root: Node, final_state: MancalaEnv, lmbd=1):
     """backpropgate pushes the reward (pay/visits) to the parents node starting from the root down
         :param root: starting node to backpropgate from
         :param final_state: the state of final node (holds final reward from the simulation)
         :param lmbd: a parameter to control the weight of the value network
     """
     path_stack = []
     node = root
     # propagate node reward to parents'
     while node is not None:
         path_stack.append(node)
         node = node.parent
     # Update from root downward so the exploration bonus calculation is correct
     while len(path_stack) > 0:
         node = path_stack.pop()
         side = node.parent.state.side_to_move if node.parent is not None else node.state.side_to_move  # root node
         game_reward = final_state.compute_end_game_reward(side)
         # _, value = self.network.evaluate_state(final_state)
         # game_reward = (1 - lmbd) * value + (lmbd * side_final_reward) # value from network + value from actionNet
         node.update(game_reward)