LHCb Downstream Tracking study

This repository contains the source code for the studies to improve the LHCb Downstream Tracking algorithm.

Problem description

I am working on improvement of the performance, interpreted as reduction of the ghost track rate, of the Downstream Tracking algorithm.
The downstream algorithm is seeded by tracks that are reconstructed in the T. These T seeds are combined with hits in the TT to make downstream candidates. By default, the algorithm uses all T seeds and TT hits. The schematic view of the LHCb detector with types of track is attached below.

The most important part of this study, apart from classifier performance is classifier evaluation time. Each of these models runs in LHCb High Level Trigger 2 HLT2 so I cannot afford any lose on speed and the chosen model must be implemented in C++. The source code of this algorithm, as well as a place to implement the final model can be found in the directory BrunelCode. This code is "forked" from the CERN's gitlab repository.

The directory SeedClassifier contains notebooks documented study on the first of the classifiers - Seed classifier, depicted in red on above flowchart.
The another directory - TrackClassifier contains study of the second, final classifier.

###The first classifier can be treated as a spam mail detector-like problem.

This means that we don't want to lose any positive signal (good downstream tracks). In the same time the goal is to reduce as much as possible, perfect case all of ghost seeds. It need to works all of the mail spam detectors. If we loose good mail e.g. acceptance for internship by the company XYZ is a serious problem. On the other hand if we accept such mail as free holiday advert it is not so painful for our user.

###To choose the best model I will train and tune various of available Machine Learning models. I will focus on:

Baseline kNN. Just to get some intuition about datasets and classification score.
Boosted Decision Trees (BDT) based on sklearn GradientBoostedClassifier.
BDT based on xgboost library

This is the most important model. It has the best performance measured as area under ROC curve.
I also focused on improvement of the classifier evaluation timing. I implement idea of bonsai Boosted Decision Trees(bBDT). The concept of bBDT is to transfer the base classifier (BDT) into lookup table. In this case the classifier evaluation time is ~ O(1)!

Linear model - Logistic Regression
Deep Neural Network based on Lasagne and Theano
Deep Neural Network based on Keras and Theano

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
BrunelCode		BrunelCode
SeedClassifier		SeedClassifier
TrackClassifier		TrackClassifier
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BrunelCode

BrunelCode

SeedClassifier

SeedClassifier

TrackClassifier

TrackClassifier

scripts

scripts

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

LHCb Downstream Tracking study

Problem description

About

Releases

Packages

Languages

akucia/DownstreamTracking

Folders and files

Latest commit

History

Repository files navigation

LHCb Downstream Tracking study

Problem description

About

Resources

Stars

Watchers

Forks

Languages