Python Arena.add_bandits Exemples

Langage de programmation: Python

Espace de nommage/Pack: arena

Class/Type: Arena

Méthode/Fonction: add_bandits

Exemples au hotexamples.com: 2

Python Arena.add_bandits - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de arena.Arena.add_bandits extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Arena(30)

add_character(6)

fight(4)

arena_matrix(4)

add_player(2)

clear(2)

get_arena(2)

add_bandits(2)

corner_location(2)

add_players(2)

from_mdf_strings(1)

down_tiles_can_be_exploded(1)

draw(1)

draw_arena(1)

feed(1)

focus_on_next_ship(1)

AddGroups(1)

from_name(1)

getFitness(1)

getHeight(1)

get_destructible_walls(1)

get_non_destructible_walls(1)

handleNextMarbleMove(1)

left_tiles_can_be_exploded(1)

position_valid(1)

right_tiles_can_be_exploded(1)

up_tiles_can_be_exploded(1)

doMarbleMove(1)

compete(1)

delete_character(1)

_update_players_state(1)

Turn(1)

_can_place_bomb(1)

_set_dice(1)

_update_matrix_explosion_center(1)

_update_matrix_explosion_down(1)

_update_matrix_explosion_left(1)

_update_matrix_explosion_right(1)

_update_matrix_explosion_up(1)

add_obstacle(1)

copy(1)

agents(1)

aktualisiere(1)

animate_universe_history(1)

battles(1)

build_team_one(1)

build_team_two(1)

clearPoint(1)

coords_have_class(1)

update_explosion_in_matrix(1)

Méthodes fréquemment utilisées

Arena (30)

add_character (6)

fight (4)

arena_matrix (4)

add_player (2)

clear (2)

get_arena (2)

add_bandits (2)

corner_location (2)

add_players (2)

Méthodes fréquemment utilisées

from_mdf_strings (1)

down_tiles_can_be_exploded (1)

draw (1)

draw_arena (1)

feed (1)

focus_on_next_ship (1)

AddGroups (1)

from_name (1)

getFitness (1)

getHeight (1)

get_destructible_walls (1)

get_non_destructible_walls (1)

handleNextMarbleMove (1)

left_tiles_can_be_exploded (1)

position_valid (1)

right_tiles_can_be_exploded (1)

up_tiles_can_be_exploded (1)

doMarbleMove (1)

compete (1)

delete_character (1)

Méthodes fréquemment utilisées

get_destructible_walls (1)

get_non_destructible_walls (1)

handleNextMarbleMove (1)

left_tiles_can_be_exploded (1)

position_valid (1)

right_tiles_can_be_exploded (1)

up_tiles_can_be_exploded (1)

doMarbleMove (1)

compete (1)

delete_character (1)

_update_players_state (1)

Turn (1)

_can_place_bomb (1)

_set_dice (1)

_update_matrix_explosion_center (1)

_update_matrix_explosion_down (1)

_update_matrix_explosion_left (1)

_update_matrix_explosion_right (1)

_update_matrix_explosion_up (1)

add_obstacle (1)

copy (1)

agents (1)

aktualisiere (1)

animate_universe_history (1)

battles (1)

build_team_one (1)

build_team_two (1)

clearPoint (1)

coords_have_class (1)

update_explosion_in_matrix (1)

Méthodes fréquemment utilisées

_update_players_state (1)

Turn (1)

_can_place_bomb (1)

_set_dice (1)

_update_matrix_explosion_center (1)

_update_matrix_explosion_down (1)

_update_matrix_explosion_left (1)

_update_matrix_explosion_right (1)

_update_matrix_explosion_up (1)

add_obstacle (1)

copy (1)

agents (1)

aktualisiere (1)

animate_universe_history (1)

battles (1)

build_team_one (1)

build_team_two (1)

clearPoint (1)

coords_have_class (1)

update_explosion_in_matrix (1)

Exemple #1

0

Afficher le fichier

Fichier : main.py Projet : leeykang/reinforcement_learning_files

def example_1(): """ Example 1: Compares rewards and percentage of optimum action selection between various methods, using the pit run_mode of Arena. Produces the results in the form of two plots. """ # Initialises the Arena and all required inputs. arena = Arena('base_problem') actions_list = [10] timesteps_list = [1000] runs_list = [2000] init_mean_list = [0] init_stddev_list = [1] action_stddev_list = [1] delta_mean_list = [0] delta_stddev_list = [0] first_considered_reward_step_list = [0] # Creates and adds Bandits to the Arena. arena.add_bandits([Bandit(*val) for val in zip(actions_list, timesteps_list, runs_list, first_considered_reward_step_list, \ init_mean_list, init_stddev_list, action_stddev_list, delta_mean_list, delta_stddev_list)]) # Creates and adds Players to the Arena. arena.add_players([ RandomPlayer(), QPlayer(initial_Q=0, epsilon=0.1), QPlayer(initial_Q=5, epsilon=0.1), UCBQPlayer(initial_Q=0, confidence_level=2), UCBQPlayer(initial_Q=5, confidence_level=2), GradientPlayer(step_size_parameter=0.1, use_baseline_reward=True), GradientPlayer(step_size_parameter=0.1, use_baseline_reward=False) ]) # Run the Arena in pit mode. arena.run('pit')

Exemple #2

0

Afficher le fichier

Fichier : main.py Projet : leeykang/reinforcement_learning_files

def example_2(): """ Example 2: Parameter study of various Players on a nonstationary Bandit. Produces the results in the form of a single plot. For a stationary Bandit, set all values of delta_mean_list and delta_stddev_list to 0. """ # Initialises the Arena and all required inputs. arena = Arena('base_problem') actions_list = [10] timesteps_list = [1000] runs_list = [2000] init_mean_list = [0] init_stddev_list = [1] action_stddev_list = [1] delta_mean_list = [0] delta_stddev_list = [0.01] first_considered_reward_step_list = [0] # Initialises the study ranges for all Players. epsilon_study_range = np.logspace(-7, -1, num=7, base=2.0, dtype=float).tolist() initial_Q_study_range = np.logspace(-2, 3, num=6, base=2.0, dtype=float).tolist() confidence_level_study_range = np.logspace(-4, 3, num=8, base=2.0, dtype=float).tolist() step_size_parameter_study_range = np.logspace(-5, 2, num=8, base=2.0, dtype=float).tolist() parameter_range = np.logspace(-8, 4, num=2, base=2.0, dtype=float).tolist() # Creates and adds Bandits to the Arena. arena.add_bandits([Bandit(*val) for val in zip(actions_list, timesteps_list, runs_list, first_considered_reward_step_list, \ init_mean_list, init_stddev_list, action_stddev_list, delta_mean_list, delta_stddev_list)]) # Creates and adds Players to the Arena. arena.add_players([ QPlayer(0, epsilon_study_range[0], study_variable='epsilon', study_range=epsilon_study_range ), # epsilon greedy, intial_q = 0 (study epsilon) QPlayer( 0, epsilon_study_range[0], 0.1, study_variable='epsilon', study_range=epsilon_study_range ), # epsilon greedy with alpha 0.1, initial_Q = 0 (study epsilon) QPlayer(initial_Q_study_range[0], 0, 0.1, study_variable='initial_Q', study_range=initial_Q_study_range ), # greedy with alpha 0.1 (study initial_Q) UCBQPlayer(0, confidence_level_study_range[0], study_variable='confidence_level', study_range=confidence_level_study_range ), # UCB, initial_Q = 0 (study ucb_c) UCBQPlayer(0, confidence_level_study_range[0], 0.1, study_variable='confidence_level', study_range=confidence_level_study_range ), # UCB, initial_Q = 0, alpha=0.1 (study ucb_c) GradientPlayer(step_size_parameter_study_range[0], study_variable='step_size_parameter', study_range=step_size_parameter_study_range) ]) # gradient bandit with baseline (study alpha) # Run the Arena in parameter study mode. arena.run('parameter_study', parameter_range)