Skip to content

guitargeek/geeksw

Repository files navigation

Geeksw
Python package to facilitate High Energy Physics analysis work with focus on the CMS experiment.

Build Status

Introduction

In the CMS experiment, battle tested workflows create user friendly datasets for all collaborators to analyze. A well maintained software release, called CMSSW, provides all the (mostly C++) code to do so. However, after all the event data is ready in columnar data formats like NanoAOD, there is much less consensus within the collaboration on the framework to analyze this data with. Many groups have their own C++ based analysis frameworks which plug into CMSSW, but in times where the Python ecosystem for data analysis is so vast and powerful, some people believe that all analysis work should be done with Python libraries.

Analyzing CMS data in Python is made possible by powerful standard libraries like numpy, matplotlib, scipy and pandas, machine learling specific libraries like sklearn, keras or pytorch and more specific libraries coming from the HEP comminity like uproot, uproot-methods and awkward-array. On top of all that, jupyter-notebook provides an interactive environments for the actual work, which make it possible to write analysis code with markdown comments in between that is alomst as easy as english, enabling easier analysis review at code leven. In fact, analysis with Python can even be faster than basic analyses with C++, since the columar instead of row (event) oriented paradigm allow for a lot of optimizations like parallelism. The geeksw package builds on this existing ecosystem and offers functionailty that often used in HEP analyses in general, but also functionality which is more specific to analysis work within the CMS collaboration.

The following sections will give an overview on the features of geeksw. It is targeted to have as much documentation within the code as possible, but the main way how features are explained are example notebooks in the dedicated examples directory.

Installation

You should use Python 3 with geeksw, so if you are on llruicm01 please consider putting this into your .bashrc:

alias python="/opt/exp_soft/llr/python/3.7.0/el7/bin/python"
alias pip="/opt/exp_soft/llr/python/3.7.0/el7/bin/pip3"

The geeksw framework can be installed like any other python package:

git clone git@github.com:guitargeek/geeksw.git
cd geeksw
pip install --user .

Submodules

Plotting

Analysis framework

Physics tools

Utilities

NanoAOD data loading

Fitting

A few tools are provided to make the most out of scipy.optimize.curve_fit. It is possible to automatically obtain the Jacobian to any fitting function with using pytorch, which is wrapped by the wrap_jac function. An example of this can be found in a jupyter notebook.

HGCal

The HGCal package includes tools to load and regroup HGCal beam test ntuples. The environment variable HGCAL_TESTBEAM_NTUPLE_DIR needs to be set to the path of the testbeam data ntuples you want to analyze. An array of examples can be found in examples/hgcal.

Developer guide

Plese format the edited python sources with black before making any pull request, setting the --line-length 120 argument.

About

Python package to facilitate analysis work in in the CMS collaboration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published