Friendly machine learning for LHCb experiment. Project should enable one to train and compare classifiers on some training dataset.
The programming language is python, the analysis is performed in IPython notebooks - commonly used in machine learning interactive shell for python, which is good for development, analysis and presenting results (plots, histograms and so on)
- Dalitz Demo (several uniforming classifiers on dataset from uBoost paper)
- Decay of tau into three muons
- Generation of toy Monte-Carlo
- Any other notebook from repository can be viewed: just paste its link to nbviewer
- working on uniform classifiers - the classifiers with low correlation of predictions and mass (or some other variable(s))
- MSE - the measure of uniformity
- uBoost optimized implementation inside
- uniformGradientBoosting (with different losses, specially FlatnessLoss is very interesting)
- parameter optimization
Seegrid_search
module, there is a simulated annealing-like optimization of parameters on dataset, this optimization can be performed on cluster. - plots, plots, plots
Seereports
module, it is a good way to visualize learning curves, roc curves, flatness of predictions on variables. - there is also procedure to generate toy Monte-Carlo in
toymc
module
(generates new set of events based on the set of events we already have with same distribution) and special notebook 'ToyMonteCarlo' to demonstrate and analyze its results. - parallelism
ClassifiersDict fromreports
can train classifiers on IPython cluster,
uBoost is quite slow, and it has built-in parallelism option: different BDTs inside uBoost can be trained parallelly in cluster.
###Getting this to work To run most the notebooks, only IPython and some python libraries are needed.
To run example notebooks on some machine, one should have
- IPython
- Some python libraries that can be installed using any package manager for python
(
apt-get
will work too, but Ubuntu repo contains quite old versions of libraries), better use pip
The libraries you need are numpy
, scipy
, pandas
, scikit-learn
, rootpy
, root-numpy
and maybe something else, basically the packages are installed via command-line:
sudo pip install numpy scipy pandas scikit-learn rootpy root-numpy
IPython can be installed via pip as well
sudo pip install ipython
To run IPython, there is shell script in IpythonWorkflow/ subfolder
In order to work with ROOT files, you need CERN ROOT, make sure you have it by typing 'root' in the console
###Roadway: We are going to publish notebook on some server to provide easy access from any machine.
Some tests with different decays will be published soon.