blackbox

A specialized optimization toolkit for the BlackBox Challenge competition.

The problem, in short, is to teach an agent to play a game with unknown rules. The game consists of a number of steps, at each step the agent chooses one of the available actions (a small integer) seeing just some undescribed state (a bunch of floats), and gets a score update in return.

Such a formulation covers a wide subset of reinforced learning problems, allowing the toolkit to be potentially used for various optimization tasks.

Cons

Just a short excerpt from a few pages long list of notes and todos:

In its current form, the toolkit presents just a simplistic, evolutionary approach, all of the coded trainers treat the blackbox opaquely, not even looking at intermediate level scores.
The optimization process still requires a human to choose the results for further extension and refinement; and humans are not that easy to come by ;-)
Parts of the toolkit are over-optimized; for instance, trainers that could benefit from statically compiled code are quite imaginable, but none of the implemented ones really needs it.
The bot interface should be extended to allow more sophisticated trainers.

Pros

Although the framework is a simple Monte Carlo simulator at its heart, it has some nice attributes:

Simple interface between bots and trainers, each trainer can train every bot.
C-speed bots, trainers, collectors, and processors; writing a new bot only requires describing its parameters and coding a single function.
Extensive training infrastructure: seed trainers with previous results, move parts of parameter sets between bots, use different parameter sets for parts of levels, or emphasize components of the state.
Most of the training configuration is done using probability distributions, rather than fixed constants, giving a lot of flexibility.

Installation

Requires Python >= 3.4, NumPy >= 1.11, and Cython >= 0.24. Currently the only way to use the toolkit is to clone the source from GitHub and get the game interface and levels from the competition's site.

Copy the levels folder, interface.pxd (from the fast_bot folder), and the proper interface-X.so for your environment from the package into the folder that this file is in.

Alternatively, code an interface for your own problem according to the specification! Both Python and C API has to be provided, but there are no requirements on the levels content or format, other than that the interface can use them.

The file and directory structure should look as follows:

    blackbox
    ├── data                (records gathered by collectors)
    ├── levels              (should hold train/test_level.data)
    ├── packs               (self-contained bots for sharing)
    ├── params              (stored parameters, NumPy's format)
    ├ bot_*.pyx             (bot implementations)
    ├ bots.pyx              (add new bots here)
    ├ trainer_*.pyx         (trainer implementations)
    ├ trainers.pyx          (add new trainers here)
    ...
    ├ interface.[pxd,so]    (the game interface should be here)

Commands

The toolkit consists of six command-line utilities. To see a full list of options for any of them invoke it with --help. Some expanded descriptions and more examples can be found within their module docstrings.

train

Takes a bot name, trainer name and a configuration for it (plus a ton of optional parameters). For example:

    ./train.py linear local 100 1

starts a random local search for a set of coefficients for a linear regression bot.

play

Evaluates a bot on selected levels and prints out the scores. For instance:

    ./play.py linear 0

would play the bot with the coefficients found in the previous section on all available levels.

view

Lets you see a set of coefficients and their history. To review the coefficients generated above you would type:

    ./view.py linear 0

collect

Plays a bot on a level and stores some data, such as consequent states, actions taken, or intermediate rewards:

    ./collect.py srs --bot linear 0

process

Analyzes data sets gathered by a collector to display statistics or preprocess them for a trainer.

    ./process.py stats srs 0

pack

Creates a standalone bot package (that still requires the game interface and levels).

Bots

The agents choosing the actions.

All current bot implementations assume 4 actions, some can only play a particular number of steps on each act() call.

linear

A simple regression bot, optimized.

belief

Bots that try to simulate an entry, or a few, of the hidden state, and evaluate actions using state reinforced with beliefs.

The present bots simply update the beliefs linearly based on their previous values and the visible state, and use linear regression for action choosing.

states

Linear regression using the current and the last state.

diffs

Approximates action value as a linear function of the state components and their derivatives (with respect to level time). Or actually, uses finite backward differences to approximate the derivatives.

quadratic

Adds a coefficient for each pair of state components.

_m suffix

Bots that can use different sets of coefficients at different parts of the level. The "parts" may be phases (first half, second half), congruences (even step, odd step) or some other temporal patterns.

Trainers

The problem solvers, or optimization algorithms.

local

Randomly changes a number of parameter entries at each step. Can either redraw values anew or do slight adjustments, according to a variation scale given.

anneal

Almost a random local search, but sometimes accepts score drops. Useful when you feel you are stuck in a local maximum

_d suffix

Instead of taking a single variation scale, these variants are configured with a high and low value and gradually diminish parameter variations during the course of a training session.

select

Chooses the best from the seeds provided.

comb

Combines several parameter sets into a single multi-parameter set (for an "_m" bot).

comb_phases

Tries all phase assignments printing their evaluations.

Collectors

Collectors run a bot through a level, possibly moving back in time by using checkpoints, observe various things such as states, rewards or optimal paths and save them for later processing.

Processors

Processors are meant to analyze the data gathered by collectors. Such a split of responsibilities lets you quickly realize new investigations.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
levels		levels
packs		packs
params		params
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
bot_base.pxd		bot_base.pxd
bot_base.pyx		bot_base.pyx
bot_belief_1.pxd		bot_belief_1.pxd
bot_belief_1.pyx		bot_belief_1.pyx
bot_belief_4.pxd		bot_belief_4.pxd
bot_belief_4.pyx		bot_belief_4.pyx
bot_belief_f.pxd		bot_belief_f.pxd
bot_belief_f.pyx		bot_belief_f.pyx
bot_diffs_1.pxd		bot_diffs_1.pxd
bot_diffs_1.pyx		bot_diffs_1.pyx
bot_diffs_1_m.pxd		bot_diffs_1_m.pxd
bot_diffs_1_m.pyx		bot_diffs_1_m.pyx
bot_diffs_2.pxd		bot_diffs_2.pxd
bot_diffs_2.pyx		bot_diffs_2.pyx
bot_diffs_3.pxd		bot_diffs_3.pxd
bot_diffs_3.pyx		bot_diffs_3.pyx
bot_diffs_4.pxd		bot_diffs_4.pxd
bot_diffs_4.pyx		bot_diffs_4.pyx
bot_diffs_4_m.pxd		bot_diffs_4_m.pxd
bot_diffs_4_m.pyx		bot_diffs_4_m.pyx
bot_linear.pxd		bot_linear.pxd
bot_linear.pyx		bot_linear.pyx
bot_linear_m.pxd		bot_linear_m.pxd
bot_linear_m.pyx		bot_linear_m.pyx
bot_phases_1.pxd		bot_phases_1.pxd
bot_phases_1.pyx		bot_phases_1.pyx
bot_quadratic.pxd		bot_quadratic.pxd
bot_quadratic.pyx		bot_quadratic.pyx
bot_random.pxd		bot_random.pxd
bot_random.pyx		bot_random.pyx
bot_random_1.pxd		bot_random_1.pxd
bot_random_1.pyx		bot_random_1.pyx
bot_random_5.pxd		bot_random_5.pxd
bot_random_5.pyx		bot_random_5.pyx
bot_random_n.pxd		bot_random_n.pxd
bot_random_n.pyx		bot_random_n.pyx
bot_simi.pxd		bot_simi.pxd
bot_simi.pyx		bot_simi.pyx
bot_states_1.pxd		bot_states_1.pxd
bot_states_1.pyx		bot_states_1.pyx
bots.pyx		bots.pyx
collect.py		collect.py
collector_base.pxd		collector_base.pxd
collector_base.pyx		collector_base.pyx
collector_srs.pxd		collector_srs.pxd
collector_srs.pyx		collector_srs.pyx
collector_ssa.pxd		collector_ssa.pxd
collector_ssa.pyx		collector_ssa.pyx
collector_sss.pxd		collector_sss.pxd
collector_sss.pyx		collector_sss.pyx
collectors.pyx		collectors.pyx
conftest.py		conftest.py
core.py		core.py
cython_init.py		cython_init.py
iop.py		iop.py
pack.py		pack.py
play.py		play.py
prngs.pyx		prngs.pyx
process.py		process.py
processor_base.pxd		processor_base.pxd
processor_base.pyx		processor_base.pyx
processor_corrs.pxd		processor_corrs.pxd
processor_corrs.pyx		processor_corrs.pyx
processor_init.pxd		processor_init.pxd
processor_init.pyx		processor_init.pyx
processor_phases.pxd		processor_phases.pxd
processor_phases.pyx		processor_phases.pyx
processor_pt.pxd		processor_pt.pxd
processor_pt.pyx		processor_pt.pyx
processor_stats.pxd		processor_stats.pxd
processor_stats.pyx		processor_stats.pyx
processors.pyx		processors.pyx
setup.cfg		setup.cfg
setup.py		setup.py
train.py		train.py
trainer_anneal.pxd		trainer_anneal.pxd
trainer_anneal.pyx		trainer_anneal.pyx
trainer_anneal_d.pxd		trainer_anneal_d.pxd
trainer_anneal_d.pyx		trainer_anneal_d.pyx
trainer_base.pxd		trainer_base.pxd
trainer_base.pyx		trainer_base.pyx
trainer_comb.pxd		trainer_comb.pxd
trainer_comb.pyx		trainer_comb.pyx
trainer_comb_phases.pxd		trainer_comb_phases.pxd
trainer_comb_phases.pyx		trainer_comb_phases.pyx
trainer_local.pxd		trainer_local.pxd
trainer_local.pyx		trainer_local.pyx
trainer_local_d.pxd		trainer_local_d.pxd
trainer_local_d.pyx		trainer_local_d.pyx
trainer_select.pxd		trainer_select.pxd
trainer_select.pyx		trainer_select.pyx
trainers.pyx		trainers.pyx

License

wrwrwr/blackbox

Folders and files

Latest commit

History