def main(): print('cuda:', use_cuda) # g_evaluator = evaluator env = game.GameState('text') result = {'Player': 0, 'Enemy': 0, 'Draw': 0} turn = 0 enemy_turn = 1 gi.enemy_turn = enemy_turn player_elo = 1500 enemy_elo = 1500 print('Player ELO: {:.0f}, Enemy ELO: {:.0f}'.format( player_elo, enemy_elo)) # i = 0 for i in range(N_MATCH): board = np.zeros([BOARD_SIZE, BOARD_SIZE]) root_id = (0, ) win_index = 0 action_index = None if i % 2 == 0: print('Player Color: Black') else: print('Player Color: White') while win_index == 0: utils.render_str(board, BOARD_SIZE, action_index) action, action_index = evaluator.get_action( root_id, board, turn, enemy_turn) p, v = evaluator.get_pv(root_id, turn, enemy_turn) if turn != enemy_turn: # player turn root_id = evaluator.player.root_id + (action_index, ) else: # enemy turn root_id = evaluator.enemy.root_id + (action_index, ) board, check_valid_pos, win_index, turn, _ = env.step(action) # WebAPI gi.game_board = board gi.action_index = int(action_index) gi.win_index = win_index gi.curr_turn = turn move = np.count_nonzero(board) if evaluator.get_player_visit() is not None: player_agent_info.visit = evaluator.get_player_visit() if evaluator.get_enemy_visit() is not None: enemy_agent_info.visit = evaluator.get_enemy_visit() if turn == enemy_turn: evaluator.enemy.del_parents(root_id) player_agent_info.add_value(move, v) player_agent_info.p = p else: evaluator.player.del_parents(root_id) enemy_agent_info.add_value(move, v) enemy_agent_info.p = p # used for debugging if not check_valid_pos: raise ValueError('no legal move!') if win_index != 0: player_agent_info.clear_values() enemy_agent_info.clear_values() if turn == enemy_turn: if win_index == 3: result['Draw'] += 1 print('\nDraw!') player_elo, enemy_elo = elo(player_elo, enemy_elo, 0.5, 0.5) else: result['Player'] += 1 print('\nPlayer Win!') player_elo, enemy_elo = elo(player_elo, enemy_elo, 1, 0) else: if win_index == 3: result['Draw'] += 1 print('\nDraw!') player_elo, enemy_elo = elo(player_elo, enemy_elo, 0.5, 0.5) else: result['Enemy'] += 1 print('\nEnemy Win!') player_elo, enemy_elo = elo(player_elo, enemy_elo, 0, 1) utils.render_str(board, BOARD_SIZE, action_index) # Change turn enemy_turn = abs(enemy_turn - 1) gi.enemy_turn = enemy_turn turn = 0 pw, ew, dr = result['Player'], result['Enemy'], result['Draw'] winrate = (pw + 0.5 * dr) / (pw + ew + dr) * 100 print('') print('=' * 20, " {} Game End ".format(i + 1), '=' * 20) print('Player Win: {}' ' Enemy Win: {}' ' Draw: {}' ' Winrate: {:.2f}%'.format(pw, ew, dr, winrate)) print('Player ELO: {:.0f}, Enemy ELO: {:.0f}'.format( player_elo, enemy_elo)) evaluator.reset()
def self_play(n_selfplay): global cur_memory, rep_memory global Agent state_black = deque() state_white = deque() pi_black = deque() pi_white = deque() if RESIGN_MODE: resign_val_balck = [] resign_val_white = [] resign_val = [] resign_v = -1.0 n_resign_thres = N_SELFPLAY // 4 for episode in range(n_selfplay): if (episode + 1) % 10 == 0: logging.warning('Playing Episode {:3}'.format(episode + 1)) env = game.GameState('text') board = np.zeros((BOARD_SIZE, BOARD_SIZE), 'float') turn = 0 root_id = (0, ) win_index = 0 time_steps = 0 action_index = None if RESIGN_MODE: resign_index = 0 while win_index == 0: if PRINT_SELFPLAY: utils.render_str(board, BOARD_SIZE, action_index) # ====================== start MCTS ============================ # if time_steps < TAU_THRES: tau = 1 else: tau = 0 pi = Agent.get_pi(root_id, tau) # ===================== collect samples ======================== # state = utils.get_state_pt(root_id, BOARD_SIZE, IN_PLANES) if turn == 0: state_black.appendleft(state) pi_black.appendleft(pi) else: state_white.appendleft(state) pi_white.appendleft(pi) # ======================== get action ========================== # action, action_index = utils.get_action(pi) root_id += (action_index, ) # ====================== print evaluation ====================== # if PRINT_SELFPLAY: Agent.model.eval() with torch.no_grad(): state_input = torch.tensor([state]).to(device).float() p, v = Agent.model(state_input) p = p.cpu().numpy()[0] v = v.item() print('\nPi:\n{}'.format( pi.reshape(BOARD_SIZE, BOARD_SIZE).round(decimals=2))) print('\nPolicy:\n{}'.format( p.reshape(BOARD_SIZE, BOARD_SIZE).round(decimals=2))) if turn == 0: print("\nBlack's win%: {:.2f}%".format((v + 1) / 2 * 100)) if RESIGN_MODE: if episode < n_resign_thres: resign_val_balck.append(v) elif v < resign_v: resign_index = 2 if PRINT_SELFPLAY: print('"Black Resign!"') else: print("\nWhite's win%: {:.2f}%".format((v + 1) / 2 * 100)) if RESIGN_MODE: if episode < n_resign_thres: resign_val_white.append(v) elif v < resign_v: resign_index = 1 if PRINT_SELFPLAY: print('"White Resign!"') # =========================== step ============================= # board, _, win_index, turn, _ = env.step(action) time_steps += 1 # ========================== result ============================ # if RESIGN_MODE: if resign_index != 0: win_index = resign_index result['Resign'] += 1 if win_index != 0: if win_index == 1: reward_black = 1. reward_white = -1. result['Black'] += 1 if RESIGN_MODE: if episode < n_resign_thres: for val in resign_val_balck: resign_val.append(val) resign_val_balck.clear() resign_val_white.clear() elif win_index == 2: reward_black = -1. reward_white = 1. result['White'] += 1 if RESIGN_MODE: if episode < n_resign_thres: for val in resign_val_white: resign_val.append(val) resign_val_white.clear() resign_val_balck.clear() else: reward_black = 0. reward_white = 0. result['Draw'] += 1 if RESIGN_MODE: if episode < n_resign_thres: for val in resign_val_balck: resign_val.append(val) for val in resign_val_white: resign_val.append(val) resign_val_balck.clear() resign_val_white.clear() if RESIGN_MODE: if episode + 1 == n_resign_thres: resign_v = min(resign_val) resign_val.clear() if PRINT_SELFPLAY: print('Resign win%: {:.2f}%'.format( (resign_v + 1) / 2 * 100)) # ====================== store in memory ======================= # while state_black or state_white: if state_black: cur_memory.append( (state_black.pop(), pi_black.pop(), reward_black)) if state_white: cur_memory.append( (state_white.pop(), pi_white.pop(), reward_white)) # ========================= result =========================== # if PRINT_SELFPLAY: utils.render_str(board, BOARD_SIZE, action_index) bw, ww, dr, rs = result['Black'], result['White'], \ result['Draw'], result['Resign'] print('') print('=' * 20, " {:3} Game End ".format(episode + 1), '=' * 20) print('Black Win: {:3} ' 'White Win: {:3} ' 'Draw: {:2} ' 'Win%: {:.2f}%' '\nResign: {:2}'.format(bw, ww, dr, (bw + 0.5 * dr) / (bw + ww + dr) * 100, rs)) print('current memory size:', len(cur_memory)) Agent.reset() rep_memory.extend(utils.augment_dataset(cur_memory, BOARD_SIZE))
def set_agents(self, model_path_a, model_path_b, model_path_m): if model_path_a == 'human' or model_path_b == 'human': game_mode = 'pygame' else: game_mode = 'text' self.env = game.GameState(game_mode) if model_path_a == 'random': print('load player model:', model_path_a) self.player = agents.RandomAgent(BOARD_SIZE) elif model_path_a == 'puct': print('load player model:', model_path_a) self.player = agents.PUCTAgent(BOARD_SIZE, N_MCTS_PLAYER) elif model_path_a == 'uct': print('load player model:', model_path_a) self.player = agents.UCTAgent(BOARD_SIZE, N_MCTS_PLAYER) elif model_path_a == 'human': print('load player model:', model_path_a) self.player = agents.HumanAgent(BOARD_SIZE, self.env) elif model_path_a == 'web': print('load player model:', model_path_a) self.player = agents.WebAgent(BOARD_SIZE) else: print('load player model:', model_path_a) self.player = agents.ZeroAgent(BOARD_SIZE, N_MCTS_PLAYER, IN_PLANES_PLAYER, noise=False) self.player.model = model.PVNet(N_BLOCKS_PLAYER, IN_PLANES_PLAYER, OUT_PLANES_PLAYER, BOARD_SIZE).to(device) state_a = self.player.model.state_dict() my_state_a = torch.load( model_path_a, map_location='cuda:0' if use_cuda else 'cpu') for k, v in my_state_a.items(): if k in state_a: state_a[k] = v self.player.model.load_state_dict(state_a) if model_path_b == 'random': print('load enemy model:', model_path_b) self.enemy = agents.RandomAgent(BOARD_SIZE) elif model_path_b == 'puct': print('load enemy model:', model_path_b) self.enemy = agents.PUCTAgent(BOARD_SIZE, N_MCTS_ENEMY) elif model_path_b == 'uct': print('load enemy model:', model_path_b) self.enemy = agents.UCTAgent(BOARD_SIZE, N_MCTS_ENEMY) elif model_path_b == 'human': print('load enemy model:', model_path_b) self.enemy = agents.HumanAgent(BOARD_SIZE, self.env) elif model_path_b == 'web': print('load enemy model:', model_path_b) self.enemy = agents.WebAgent(BOARD_SIZE) else: print('load enemy model:', model_path_b) self.enemy = agents.ZeroAgent(BOARD_SIZE, N_MCTS_ENEMY, IN_PLANES_ENEMY, noise=False) self.enemy.model = model.PVNet(N_BLOCKS_ENEMY, IN_PLANES_ENEMY, OUT_PLANES_ENEMY, BOARD_SIZE).to(device) state_b = self.enemy.model.state_dict() my_state_b = torch.load( model_path_b, map_location='cuda:0' if use_cuda else 'cpu') for k, v in my_state_b.items(): if k in state_b: state_b[k] = v self.enemy.model.load_state_dict(state_b) # monitor agent self.monitor = agents.ZeroAgent(BOARD_SIZE, N_MCTS_MONITOR, IN_PLANES_ENEMY, noise=False) self.monitor.model = model.PVNet(N_BLOCKS_ENEMY, IN_PLANES_ENEMY, OUT_PLANES_ENEMY, BOARD_SIZE).to(device) state_b = self.monitor.model.state_dict() my_state_b = torch.load( model_path_m, map_location='cuda:0' if use_cuda else 'cpu') for k, v in my_state_b.items(): if k in state_b: state_b[k] = v self.monitor.model.load_state_dict(state_b)
def set_agents(self, model_path_a, model_path_b, model_path_m): # 플레이어 중 human이 있으면 pygame창에서 게임 실행, 아니면 텍스트만 출력 if model_path_a == 'human' or model_path_b == 'human': game_mode = 'pygame' else: game_mode = 'text' # env파일의 gamemode 설정 self.env = game.GameState(game_mode) # 플레이어의 모델 설정 (human) if model_path_a == 'random': print('load player model:', model_path_a) self.player = agents.RandomAgent(BOARD_SIZE) elif model_path_a == 'puct': print('load player model:', model_path_a) self.player = agents.PUCTAgent(BOARD_SIZE, N_MCTS_PLAYER) elif model_path_a == 'uct': print('load player model:', model_path_a) self.player = agents.UCTAgent(BOARD_SIZE, N_MCTS_PLAYER) elif model_path_a == 'human': print('load player model:', model_path_a) self.player = agents.HumanAgent(BOARD_SIZE, self.env) elif model_path_a == 'web': print('load player model:', model_path_a) self.player = agents.WebAgent(BOARD_SIZE) else: print('load player model:', model_path_a) self.player = agents.ZeroAgent(BOARD_SIZE, N_MCTS_PLAYER, IN_PLANES_PLAYER, noise=False) self.player.model = model.PVNet(N_BLOCKS_PLAYER, IN_PLANES_PLAYER, OUT_PLANES_PLAYER, BOARD_SIZE).to(device) state_a = self.player.model.state_dict() my_state_a = torch.load( model_path_a, map_location='cuda:0' if use_cuda else 'cpu') for k, v in my_state_a.items(): if k in state_a: state_a[k] = v self.player.model.load_state_dict(state_a) # 적 플레이어의 모델 설정 ( if model_path_b == 'random': print('load enemy model:', model_path_b) self.enemy = agents.RandomAgent(BOARD_SIZE) elif model_path_b == 'puct': print('load enemy model:', model_path_b) self.enemy = agents.PUCTAgent(BOARD_SIZE, N_MCTS_ENEMY) elif model_path_b == 'uct': print('load enemy model:', model_path_b) self.enemy = agents.UCTAgent(BOARD_SIZE, N_MCTS_ENEMY) elif model_path_b == 'human': print('load enemy model:', model_path_b) self.enemy = agents.HumanAgent(BOARD_SIZE, self.env) elif model_path_b == 'web': print('load enemy model:', model_path_b) self.enemy = agents.WebAgent(BOARD_SIZE) else: # 이미 만들어진 데이터를 사용할땐 이 부분이 실행됨 print('load enemy model:', model_path_b) # 적 에이전트 설정 self.enemy = agents.ZeroAgent(BOARD_SIZE, N_MCTS_ENEMY, IN_PLANES_ENEMY, noise=False) # 적 신경망 모델 설정 및 device(GPU)로 불러와 agents.ZeroAgent().model에 저장 self.enemy.model = model.PVNet(N_BLOCKS_ENEMY, IN_PLANES_ENEMY, OUT_PLANES_ENEMY, BOARD_SIZE).to(device) state_b = self.enemy.model.state_dict() # dict형식의 신경망 파라미터의 텐서 my_state_b = torch.load(model_path_b, map_location='cuda:0' if use_cuda else 'cpu') # 저장한 파라미터 파일을 불러옴 # state_b에는 키 값으로 여러 레이어의 weight, bias 등과 그에 해당하는 value들이 저장됨 for k, v in my_state_b.items(): if k in state_b: state_b[k] = v self.enemy.model.load_state_dict(state_b) # 딥러닝 모델에 파라미터 설정 # monitor agent 위와 동일 self.monitor = agents.ZeroAgent(BOARD_SIZE, N_MCTS_MONITOR, IN_PLANES_ENEMY, noise=False) self.monitor.model = model.PVNet(N_BLOCKS_ENEMY, IN_PLANES_ENEMY, OUT_PLANES_ENEMY, BOARD_SIZE).to(device) state_b = self.monitor.model.state_dict() my_state_b = torch.load(model_path_m, map_location='cuda:0' if use_cuda else 'cpu') for k, v in my_state_b.items(): if k in state_b: state_b[k] = v self.monitor.model.load_state_dict(state_b)
def self_play(agent, cur_memory, rank=0): agent.model.eval() state_black = deque() state_white = deque() pi_black = deque() pi_white = deque() episode = 0 while True: if (episode + 1) % 10 == 0: logging.info('Playing Episode {:3}'.format(episode + 1)) env = game.GameState('text') board = np.zeros((BOARD_SIZE, BOARD_SIZE), 'float') turn = 0 root_id = (0, ) win_index = 0 time_steps = 0 action_index = None while win_index == 0: if PRINT_SELFPLAY and rank == 0: utils.render_str(board, BOARD_SIZE, action_index) # ====================== start MCTS ============================ # if time_steps < TAU_THRES: tau = 1 else: tau = 0 pi = agent.get_pi(root_id, tau, rank) # ===================== collect samples ======================== # state = utils.get_state_pt(root_id, BOARD_SIZE, IN_PLANES) if turn == 0: state_black.appendleft(state) pi_black.appendleft(pi) else: state_white.appendleft(state) pi_white.appendleft(pi) # ======================== get action ========================== # action, action_index = utils.get_action(pi) root_id += (action_index, ) # ====================== print evaluation ====================== # if PRINT_SELFPLAY and rank == 0: with torch.no_grad(): state_input = torch.tensor([state]).to(device).float() p, v = agent.model(state_input) p = p.cpu().numpy()[0] v = v.item() print('\nPi:\n{}'.format( pi.reshape(BOARD_SIZE, BOARD_SIZE).round(decimals=2))) print('\nPolicy:\n{}'.format( p.reshape(BOARD_SIZE, BOARD_SIZE).round(decimals=2))) if turn == 0: print("\nBlack's win%: {:.2f}%".format((v + 1) / 2 * 100)) else: print("\nWhite's win%: {:.2f}%".format((v + 1) / 2 * 100)) # =========================== step ============================= # board, _, win_index, turn, _ = env.step(action) time_steps += 1 # ========================== result ============================ # if win_index != 0: if win_index == 1: reward_black = 1. reward_white = -1. result['Black'] += 1 elif win_index == 2: reward_black = -1. reward_white = 1. result['White'] += 1 else: reward_black = 0. reward_white = 0. result['Draw'] += 1 # ====================== store in memory ======================= # while state_black or state_white: if state_black: cur_memory.append( (state_black.pop(), pi_black.pop(), reward_black)) if state_white: cur_memory.append( (state_white.pop(), pi_white.pop(), reward_white)) # ========================= result =========================== # if PRINT_SELFPLAY and rank == 0: utils.render_str(board, BOARD_SIZE, action_index) bw, ww, dr = result['Black'], result['White'], \ result['Draw'] print('') print('=' * 20, " {:3} Game End ".format(episode + 1), '=' * 20) print('Black Win: {:3} ' 'White Win: {:3} ' 'Draw: {:2} ' 'Win%: {:.2f}%'.format(bw, ww, dr, (bw + 0.5 * dr) / (bw + ww + dr) * 100)) print('current memory size:', len(cur_memory)) episode += 1 agent.reset() if len(cur_memory) >= MEMORY_SIZE: return utils.augment_dataset(cur_memory, BOARD_SIZE)