Gamma + jets analysis

Table of Contents

Presentation
New instructions (13 TeV)

Presentation

This is a documentation for the gamma + jets framework.

New instructions (13 TeV)

Get a working area

export SCRAM_ARCH=slc6_amd64_gcc530
cmsrel CMSSW_8_0_5_patch1
cd CMSSW_8_0_5_patch1/src
git clone https://github.com/FPreiato/GammaJet.git JetMETCorrections/GammaJetFilter/
cd ..
scram b -j 9

The new analysis code starts from MINIAOD format instead of AOD. MINIAOD contains PAT objets: this allow to use basically the old structure skipping Step 1. There is a flag in the code for data that allows to run on AOD and produce MiniAODs on-the-fly.

General notes (work in progress)

Energy regression for the photon is now directly implemented in the photon→energy()
For flavor studies, QG-tagging, b-tagging, reclustering is needed and have to be implemented. Now ak4 PAT jets stored in the MINIAOD are used directly, changing on-the-fly the JEC for jets and MET.

Step 1

This step will convert MiniAOD tuples to root trees, performing a simple selection :

Select events with only one good photon : the photon ID is done at this step
Choose the first and second jet of the event, with a loose delta phi cut
Additionnaly, if requested, the JEC can be redone at this step, as well as the TypeI MET corrections. More details about that later.

The CMSSW python configuration files can be found in 'analysis/2ndLevel/', 'runFilter_MC.py' to run on MC and 'runFilter.py' to run on Data. All they do is to run the GammaJetFilter responsible of the miniAOD → trees conversion.

runFilter[_MC].py

Theses config. files configure the GammaJetFilter. A list of options with their meaning is available below.

isMC: If True, indicates we are running on MC.
photons: The input tag of the photons collection.
Others tag for the others collections implemented since 76X
filterData (only for data): If True, the json parameter file will be used to filter run and lumisection according to the content of the file. Json file not read in the code — Put it false
runOn[Non]CHS: If True, run the filter on (Non) CHS collection.
runPFAK4: If True, run the filter on PF AK4 jets.
runPFAK8: If True, run the filter on PF AK8 jets.
runCaloAK4: If True, run the filter on calo AK4 jets. Not saved in miniAOD, need to run on AOD.
runCaloAK8: If True, run the filter on calo AK8 jets. Not saved in miniAOD, need to run on AOD.
doJetCorrection: If True, redo the jet correction from scratch. The jet correction factors will be read from global tag (by default), or from an external database if configured correctly.
correctJecFromRaw: If True, the new JEC factory is computed taking the raw jet. Turn off only if you know what you are doing.
correctorLabel: The corrector label to use for computing the new JEC. The default should be fine for PF AK4 CHS jets.
redoTypeIMETCorrection: If True, TypeI MET is recomputed. Automatically True if doJetCorrection is True.
doFootprintMETCorrection: If True, MET is calculated from pf Candidate removing the pfCand(photon) and adding the reco photon

You can find the code for the GammaJetFilter in 'src/GammaJetFilter.cc'. If an event does not pass the preselection, it’s dumped. Resulting root trees contains only potential gamma + jets events, with exactly one good photon.

Running crab

In the 'analysis/2ndLevel/submitJobWithCrab3' folder, you’ll find the scripts createAndSubmit[Data][MC]. These scripts read a txt file with the sample to use. The format used is:

[for Data] dataset_DASformat processed_events files_per_job GlobalTag

/SinglePhoton/Run2015D-PromptReco-v4/MINIAOD -1 300 74X_dataRun2_Prompt_v2

[for MC] dataset_DASformat processed_events cross_section ptHatMin ptHatMax files_per_job GlobalTag

/GJet_Pt-15To6000_TuneCUETP8M1-Flat_13TeV_pythia8/RunIISpring15MiniAODv2-74X_mcRun2_asymptotic_v2-v1/MINIAODSIM 1 1 0 10000 10 74X_mcRun2_asymptotic_v2

Cross section is not really used at this step (relics from old code) Don’t forget to also edit the template files in the directory 'Inputs', 'crab3_template_data.py' and 'crab3_template_mc.py' to change your storage element and for data JSON file and eventually the run range.

Once crab is done, the only remaining step is to merge the output in order to have one file per dataset. For that, you can used 'mergeAndAddWeight.py' and 'mergeData.py' in the folder 'scripts'. You can create the list with the files to merge with the script 'createList_T2.py', passing the path of crab output.

==== Merging

For MC the outputs are merged and the weight weight for total normalization is added. The weight that to be applied at MC is defined as evtWeightTot = xsec / sum_of_generatorWeights This has to be done in a separate step because it’s necessary to run once over the full dataset in order to calculate the sum of generator weights. In the output of Step 1 we stored an histogram filled using generator weights, in order to extract the sum of weights at the end with Integral(). The cross section is given from the outside (is an option of the code)

python mergeAndAddWeights.py -i [list_to_merge.txt] -o [output_directory] --xsec [number_from_DAS]

The merging will update the tree "analysis" with a new branch called "evtWeightTot". This number is used in the following steps to fill histograms and to draw plots.

For Data the outputs are merged and the luminosity from BrilCalc is upload. In order to calculate the integrated luminosity the official recipe is followed.

Firstly get from crab the lumiSummary.json. To calculate the integrated luminosity, follow the BrilCalc recipe: http://cms-service-lumi.web.cern.ch/cms-service-lumi/brilwsdoc.html

1) Produce lumiSummary.json from crab

crab report -d crab_folder

2) Execute brilcalc

Command:

brilcalc lumi --normtag /afs/cern.ch/user/c/cmsbril/public/normtag_json/OfflineNormtagV1.json -u /pb -i lumi_summary.json

In the end you can merge the output for the data with the command:

python mergeData.py -i [list_to_merge.txt] -o [output_directory] --lumi_tot [integrated_luminosity]

You should now have a root file for each MC dataset and one for each data dataset, with a prefix PhotonJet_2ndLevel_. Copy those files somewhere else. A good place could be the folder 'analysis/tuples/'.

=== Step 2 - PileUp

The MC is reweighting according to data, based on the number of vertices in the event, in order to take into account differences between simulation and data scenario wrt PU. All the utilities to do that are available in the folder 'analysis/PUReweighting'. The relevant scripts are 'generatePUProfileForData.py' and 'generate_mc_pileup.c'.

Firstly you have to create a list on MC sample for which you want to calculate the PU reweighting. This list contains all the MC files produced in the step 1. For example you can create a list as files_GJet_plus_QCD.list which contains the files - [path]/PhotonJet_2ndLevel_GJet_Pythia_25ns_ReReco_2016-02-15.root - [path]/PhotonJet_2ndLevel_QCD_Pt-20toInf_2016-02-26.root

Then to execute the programm generate_mc_pileup.c' you have to compile with Makefile, and then type the command followed by the list name (only central name)

./generate_mc_pileup.exe GJet_plus_QCD

Pile-up in Data

The pile up in data is calculated following the official recipe, written in generatePUProfileForData.py that use pileupCalc.py. At this script must be passed the json file for which you want to calculate the pu reweighting.

./generatePUProfileForData.py pileup_latest.txt

Trigger selection

To avoid any bias in the selection, we explicitely require that, for each bin in pt_gamma, only one trigger was active. For that, we use an XML description of the trigger of the analysis, as you can find in the 'bin/' folder. The description is file named 'triggers.xml'.

The format should be straightforward: you have a separation in run ranges, as well as in triggers. The weight of each HLT is used to reweight various distribution for the prescale. The prescale is saved in the miniAOD and saved in the ntuples from step 1.

You have a similar file for MC, named 'triggers_mc.xml'. On this file, you have no run range, only a list of HLT path. This list is used in order to know with HLT the event should have fired if it was data. 2012 note: You can also specify multiple HLT path for one pt bin if there were multiple active triggers during the data taking period. In this case, you’ll need to provide a weight for each trigger (of course, the sum of the weight must be 1). Each trigger will be choose randolmy in order to respect the probabilities.

Step 3 - Finalization

For this step, I’ll assume you have the following folder structure

+ analysis
|- tuples
 |- toFinalize (containing root files produced at step 1, with prefix PhotonJet_2ndLevel_)
 |- finalized (containing root files we will produce at this step)

The main utility here is the executable named 'gammaJetFinalized'. It’ll produce root files containing a set of histograms for important variable like balancing or MPF. You can find its sources in the folder 'bin/', in the file 'gammaJetFinalizer.cc'. Let’s have a look at the possible options :

gammaJetFinalizer  {-i <string> ... |--input-list <string>}
                      [--chs] [--alpha <float>]
                      [--mc-comp] [--mc] --algo <ak4|ak8> --type <pf|calo>
                      -d <string>

Here’s a brief description of each option :

-i (multiple times): the input root files
--input-list: A text file containing a list of input root files
--mc: Tell the finalizer you run an MC sample
--mc-comp: Apply a cut on pt_gamma > 165 to get rid of trigger prescale. Useful for doing data/MC comparison
--alpha: The alpha cut to apply. 0.2 by default
--chs: Tell the finalizer you ran on a CHS sample
--algo ak4 or ak8: Tell the finalizer if we run on AK4 or AK8 jets
--type pf or calo: Tell the finalizer if we run on PF or Calo jets
-d: The output dataset name. This will create an output file named 'PhotonJet_<name>.root'

An exemple of command line could be :

gammaJetFinalizer -i PhotonJet_2ndLevel_Data_file.root -d SinglePhoton_Run2015 --type pf --algo ak4 --chs --alpha 0.30

This will process the input file 'PhotonJet_2ndLevel_Data_file.root', looking for PF AK4chs jets, using alpha=0.30, and producing an output file named 'PhotonJet_SinglePhoton_Run2015_PFlowAK4chs.root'.

Note

When you have multiple input file (GJet MC for example), the easiest way is to create an input list and then use the --input-list option of the finalizer. For example, suppose you have some files named 'PhotonJet_2ndLevel_GJet_Pt-30to50.root', 'PhotonJet_2ndLevel_GJet_Pt50to80.root', 'PhotonJet_2ndLevel_GJet_Pt-80to120.root', … You can create an input file list doing

ls PhotonJet_2ndLevel_GJet_* > mc_GJet.list

And them pass the 'mc_GJet.list' file to the option --input-list.

Note	You cannot use the --input-list option when running on data, for file structure reasons. If you have multiple data files, you’ll need first to merge them with hadd in a single file, and them use the -i option.

You should now have at least two files (three if you have run on QCD): 'PhotonJet_SinglePhoton_Run2015_PFlowAK4chs.root', 'PhotonJet_GJet_PFlowAK4chs.root', and optionnaly 'PhotonJet_QCD_PFlowAK4chs.root'. You are now ready to produce some plots!

Step 4 - The plots

First of all, you need to build the drawing utilities. For that, go into 'analysis/draw' and run make all. You should now have everything built. In order to produce the full set of plots, you’ll have to run 4 differents utility. You need to be in the same folder where the files produced at step 2 are. All of these program don’t use the full name of root file, but only the name assigned by the user. Example: Full name: 'PhotonJet_SinglePhoton_Run2015_PFlowAK4chs.root' Name to be passed at the program (assigne by the user in the previous steps: 'SinglePhoton_Run2015'

+drawPhotonJet_2bkg+produces some comparison plots and the most important plots that are the balancing and the MPF in each pt and eta bins. The plots of these quantities vs pT are also produced. To run the programm:

drawPhotonJet_2bkg [Data_file] [GJet_file] [QCD_file] [jet type] [algorithm] [Normalization]

For the normalization you can choose between - LUMI : normalized MC at the integrated luminosity - SHAPE : normalzed to the units

drawPhotonJet_2bkg [Data_file] [GJet_file] [QCD_file] pf ak4 LUMI

Then, you need to perform the 2nd jet extrapolation using drawPhotonJetExtrap, like this

drawPhotonJetExtrap --type pf --algo ak4 [Data_file] [GJet_file] [QCD_file]

Finally, to produce the final plot and the file for the global fit:

draw_ratios_vs_pt data_file GJet_file QCD_file pf ak4
draw_all_methods_vs_pt Data_file GJet_file QCD_file pf ak4

If everything went fine, you should now have a lot of plots in the folder 'PhotonJetPlots_Data_file_vs_GJet_file_plus_QCD_file_PFlowAK4_LUMI', and some more useful in the folder 'PhotonJetPlots_Data_file_vs_GJet_file_plus_QCD_file_PFlowAK4_LUMI/vs_pt'.

Step5 — File for the global fit

The Finalizer (step 3) and the drawers (step 4) have to be repeated for different alpha cut: 0.10, 0.15, 0.20, 0.25. The last drawer produces in the directory "PhotonJetPlots…../vs_pt/" a root file named plots.root. So you will have a plots.root for each alpha cut, these for files have to be added (simple hadd) and send to Mikko in order to perform the global fit.

Any other business

Others drawers could be found in the 'draw' directory. For example draw_vs_run which draw the time dependence study -→ response vs run number (only for Data).

../../draw/draw_vs_run Data_file pf ak4

Have fun!

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
analysis		analysis
bin		bin
crab		crab
data		data
doc		doc
interface		interface
json		json
plugins		plugins
src		src
.gitignore		.gitignore
BuildFile.xml		BuildFile.xml
README.asciidoc		README.asciidoc

FPreiato/GammaJet

Folders and files

Latest commit

History

Repository files navigation

Gamma + jets analysis

Presentation

New instructions (13 TeV)

Get a working area

General notes (work in progress)

Step 1

Step 3 - Finalization

Step 4 - The plots

Step5 — File for the global fit

Any other business

About

Resources

Stars

Watchers

Forks

Languages