Skip to content

ntuples generation with DaVinci and in-house offline components

License

Notifications You must be signed in to change notification settings

umd-lhcb/lhcb-ntuples-gen

Repository files navigation

lhcb-ntuples-gen github CI

ntuples generation with DaVinci and in-house offline components. Please refer to project wiki for more details about installation, usage, and data sources of this project.

Quick set up

Type in a terminal

git clone git@github.com:umd-lhcb/lhcb-ntuples-gen
cd lhcb-ntuples-gen
git remote add julian git@lhcb.physics.umd.edu:lhcb-ntuples-gen 
git remote add glacier git@10.229.60.85:lhcb-ntuples-gen
git annex init --version=7
git submodule update --init  # Do this before git annex sync to avoid potential mess-up of submodule pointers!
git annex sync

nix develop  ## Can take an hour
make install-dep
make install-dep-pip ## To install packages needed for JpsiK reweighting, including zfit

Generation of step-1 ntuples (DaVinci)

Development of the DaVinci scripts can be done locally in your laptop by running our docker image of DaVinci. Install docker as described in the wiki and pull the image with

docker pull umdlhcb/lhcb-stack-cc7:DaVinci-v45r6-SL

For instance, to test the standard data script you would first pull the example .dst files, would then enter docker, and run the script

git annex get run2-rdx/data/data-2016-md/00102837*
make docker-dv
cd run2-rdx
./run.sh conds/cond-std-2016.py

After your script does what you want, you are ready to send ganga jobs to the LHCb grid as detailed in the wiki.

Generation of step-2 ntuples (babies)

The step-1 ntuples coming out of DaVinci are processed with the babymaker, a neat script that allows for easy branch renaming and deleting, as well as cut selection and calculation of new branches. This is configured in YAML files.

For instance, the tracker-only MC ntuples used to produce the fit templates use postprocess/rdx-run2/rdx-run2_oldcut.yml. These ntuples are currently produced by first downloading the step-1 ntuples from the annex. Since these are over 1 TB, this is typically done in glacier inside a tmux

tmux
git annex get ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only

The generation of the step-2 babies can be quite slow, currently taking about two days to run, mainly because of the normalization (and likely becaue HAMMER FF weights are recalculated each time--TODO to avoid this, these ought to be cached by saving them to the subfolders in ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only). The ntupling is run with the following (specific options can be found inside workflows/rdx.py):

tmux
cd workflows
## Takes 37 hours, output is 422GB
./rdx.py rdx-ntuple-run2-mc-to-sig-norm    | tee step2-ntuple_mc-to-sig-norm.log 
## Takes 75 min, output is 58GB
./rdx.py rdx-ntuple-run2-mc-to-ddx         | tee step2-ntuple_mc-to-ddx.log
## Takes 11hours, output is 81GB
./rdx.py rdx-ntuple-run2-mc-to-dstst       | tee step2-ntuple_mc-to-dstst.log 
## Takes 45 min, output is 2.7GB
./rdx.py rdx-ntuple-run2-mc-to-d_s         | tee step2-ntuple_mc-to-d_s.log
## Takes 45 min, output is 23GB
./rdx.py rdx-ntuple-run2-mc-to-dstst-heavy | tee step2-ntuple_mc-to-dstst-heavy.log 
## Takes ??, output is 10GB
./rdx.py rdx-ntuple-run2-data              | tee step2-ntuple_data.log
## Takes ??, output is 22GB
./rdx.py rdx-ntuple-run2-mu_misid          | tee step2-ntuple_mu_misid.log

This generation relies on various auxiliary ntuples and weights. Some aux ntuples need to be generated prior to running the above commands. Namely:

  • B occupancy/kinematic MC correction weights (from B -> J/psi K events)--described in run2-JpsiK/README.md--are stored in run2-rdx/reweight/JpsiK/root-run2-JpsiK
  • Long track reco eff MC correction weights (from J/psi -> mu mu events)--described a bit more in this comment; makes use of LHCb's TrackCalib package--are stored in run2-rdx/reweight/tracking/root-run2-general
  • PID weights to implement the PID cuts (DLLK, DLLmu, DLLe, isMuon, uBDT) and skim PID selections (NNK, NNghost) present in data for our tracker-only MC--makes use of LHCb's PIDCalib (we also have a local fork to incorporate uBDT); generated with these shell scripts for mu PID, K/pi PID, skim sel PID and with all efficiencies shifted positive--are stored in run2-rdx/reweight/pid/root-run2-rdx_oldcut-shifted
  • Vertex smearing weights to compensate for the incomplete MC final reweighting of vertex resolution (smears the B flight vector according to data-driven corrections)--currently run1 corrections used, stored in run2-rdx/reweight/vertex/smearing_vec.root (weights calculated in our vertex-resolution repo)
  • misID efficiencies and DiF smearing weights, used in misID unfolding (calculated in and then applied using a script in our misid-unfold repo) are stored in run2-rdx/reweight/misid/histos

The other auxiliary ntuples are calculated on the fly if not cached:

  • Form-factor weights, calculated in Hammer (via code in our hammer-reweight repo) and applied to signal, normalization, and D**(s)
  • Trigger emulation weights to implement L0Hadron TOS, L0Global TIS, HLT1 triggers for our tracker-only MC, calculated in our TrackerOnlyEmu repo

The step-2 ntuples (outputted to ntuple_merged folders) can then be copied to rdx-run2-analysis/ntuples and annexed, and will be used in that repository to produce the fit templates and other studies.