Skip to content

hbprosper/TheNtupleMaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TheNtupleMaker

Contents

  1. Introduction
  2. Documentation
  3. Installation
  4. Tutorial

"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius, and a lot of courage, to move in the opposite direction."
E. F. Schumacher

Introduction

TheNtupleMaker (TNM) is a tool that automates the creation of simple ROOT ntuples from data in the (EDM) format developed and used by the CMS Collaboration. In particular, TNM can be run on CMS mini Analysis Object Data (miniAOD) files. It also automatically generates a C++ and Python analyzer skeleton programs that can be the basis of code for analyzing the contents of the ntuples. This version of TNM works with miniAODs built with ROOT 6, therefore, it is compatible with all versions of CMSSW (https://github.com/cms-sw/cmssw), the CMS Collaboration's codebase, which depends on ROOT 6. (The older ROOT 5 version is under the master branch.)

In spite of the complexity of the data formats used by collaborations such as CMS, and our quarter-centruy infatuation with object oriented programming and C++ objects, in particular, the data that are ultimately used in a physics analysis are simply a collection of numbers each of which is in one-to-one correspondence with an access function that returns the datum: a simple type, typically, a floating point number or an integer. This may require indirection; for example, the reco::PFTau class in CMSSW has a method called jetRef() that returns a C++ object, but the latter has a method that returns the charged hadron energy. Consequently, we can access that number using the compound method jetRef()->chargedHadronEnergy(). Sekmen argued , therefore, that a tool should be built that makes it possible for a user to call automatically any combination of these access functions, which ultimately return simple types, and thereby create the desired combination of data packaged in a ROOT file. In CMSSW these access functions number in the thousands.

TNM, which was developed by Harrison Prosper and Sezen Sekmen starting in 2009, is the first realization of this idea and the first step towards the ultimate goal of creating an online portal, with something like TNM as a backend, in which access to particle physics data would be a matter of making intuitive queries about what data are available, learning their provenance and meaning, selecting them, pressing a button and creating an ntuple that can be transparently accessed using ROOT or whatever ROOT evolves into.

TODO: UPDATE DOCUMENTATION

Documentation

Detailed documentation of TNM including installation instructions and simple and advanced use cases is provided in TheNtupleMaker.pdf (also found under docs/).

Installation

Here, instructions are provided for installing TNM within a docker container (a secure software environment isolated from the host), specifically one that provides access to the CMS open data consistent with CMSSW version CMSSW_5_3_32. (Note that TNM can be used with any version of CMSSW built using ROOT 5.) The instructions are for a Mac, which in addition to docker requires an installation of XQuartz. When active, XQuartz makes it possible for graphical user interfaces to be used within a docker container (that is, it provides X11 forwarding).

1. Configure XQuartz

Run XQuartz (which is located in Applications/Utilities). Click on the XQuartz menu item (generally, at the left of the menubar), then select Preferences. Under the Security tabe check Allow connections from network clients. Exit XQuartz and re-run it to ensure that the settings have taken effect. Now open a terminal window. In that window, be sure to make the host name of your laptop known to X11 using the command

xhost + `hostname`

If that does not work, try restarting your Mac, restart the docker daemon if it doesn't start at boot time as well as XQuartz, and try again. (Note that the host name may already be known to X11. You can see this by executing the command xhost and checking the listing. The host name, if listed, may be in lowercase but this does not seem to matter.)

2. Create and run a docker container

In the new terminal window create a container, here called testme, using the image cmsopendata/cmssw_5_3_32. (Of course, you can choose whatever name you like for the container. By the way, to remove a container do docker rm <container-name>.)

docker run -it -v $HOME/.ssh:/home/cmsur/.ssh -v $HOME/.Xauthority:/home/cmsusr/.Xauthority -v $HOME:/home/cmsusr/hosthome --net=host --env="DISPLAY=`hostname`:0" --name testme cmsopendata/cmssw_5_3_32 /bin/bash

The table below briefly describes the various switches used with the docker command. Note the use of backslashes with the command hostname.

switch description
-it run container in interactive mode
-v $HOME/.ssh:/home/cmsur/.ssh mount the host's .ssh folder at the container mount point of the same name
-v $HOME/.Xauthority:/home/cmsusr/.Xauthority mount the host's .Xauthority folder at the container mount point of the same name
-v $HOME:/home/cmsusr/hosthome mount the home folder of the host at container mount point hostname
--net=host allow network connections via host
--env="DISPLAY=`hostname`:0" set environment variable DISPLAY in container to the host name
--name tnm name of container
cmsopendata/cmssw_5_3_32 image to run
/bin/bash shell to be used in container

You may want to add the following commands to .bash_profile in your container

alias ls="ls --color"
PS1="docker/\W> "

and do source ~/.bash_profile to tidy up the command line prompt. You should already be in $HOME/CMSSW_3_5_32/src and the command cmsenv may have already been executed while in that folder. If not, cd to that fold and execute the command cmsenv. Then, to check that the X11 forwarding is working execute the command root. The root splash screen should appear. If it does, X11 forwarding is working.

Download and build TheNtupleMaker

Make sure you are in the folder $HOME/CMSSW_5_3_32/src before executing the command cmsenv in order to set up the CMSSW environment. Then do

mkdir PhysicsTools
git clone git://github.com/hbprosper/TheNtupleMaker
cd TheNtupleMaker

CMSSW Data formats are slightly version-dependent. But, TNM is designed to be version-independent, which is achieved by running the command

scripts/initTNM.py

This script makes a valiant attempt to guess which of the hundreds of C++ classes are most likely to be of interest to those doing physics analysis. TNM can now be built using the command below

scram b -j K

where K should be replaced with the number of cores at your disposal. If you don't know just omit the -j switch. If the build succeeds, which should take a few to about 10 minutes, you are ready to use TNM.

Tutorial

To configure the ntuple contents, you will need a sample from the EDM data from which you intend to make the ntuple. The ROOT file needs to be either in your local area, or a soft link must be created (e.g. with the command ln -s path-to-root-file myEDMsample.root). If you do not already have a sample locally, it is very easy to copy a sample with a small number of events from data in CMS storage locations using this configuration file.

The first thing to do is create, either by hand or better still using the script mkntuplecfi.py, a configuration specifying which methods are to be called to extract the desired data from the the EDM file. The script mkntuplecfi.py allows you to make a first pass at building the configuration file. Run the command

makentuplecfi.py 

and, in the GUI that appears, open myEDMsample.root using "File --> Open" or using the dedicated file open button on the top left.
The GUI would look like this:

The methods to be called by TNM are selected (or deselected) from the Methods tab, while the Selected Methods tab can be used to check which methods have been selected. To select a method, first select a class from the list of Classes, select one or more methods, and select one or more categories from the list Category. For example, in the figure above, we have selected the class vectorreco::PFJet, the methods pt(), eta(), and phi(), and the category ak7PFJets, that is, jets created with the anti-kT algorithm with a cone size of 0.7. Repeat the selection for all the methods of interest, then Save the file with the default name of ntuple_cfi.py to the local python folder. Once a configuration file has been created, it can be edited by hand. Note: the GUI is just an aid; it does not list every possible method known to TNM, but just the ones that are most likely to be of interest. You are free to add methods, by hand, to the configuration file created using the GUI. If you add a method that is not known to TNM, the latter will warn you at runtime.

When mkntuplecfi.py runs for the first time, it creates three folders methods, txt, and html. The methods folder lists the accessor methods of a subset of the available CMSSW clases, those most likely to be of interest. The folders txt and html provide similar information but in different formats. Here is an exhaustive listing of all access methods of the CMSSW class reco::PFJet. (Tip: use Command + click to open any link in another tab.)

You can look at python/ntuple_cfi.py to see the ntuple content. As mentioned earlier, the GUI is just a tool to automate the creation of this configuration. Once you have a starter python/ntuple_cfi.py, you can modify it by hand to extend its content. You can also save ntuple_cfi.py with a different name. However, you must make sure that the name change is propagated to TheNtupleMaker_cfg.py in order for TNM to know which ntuple content configuration to work with.

Runing TNM

Simply do

cmsRun TheNtupleMake_cfg.py

after appropriate editing of python/ntuple_cfi.py and TheNtupleMaker_cfg.py. Upon completion of the run, you will see a ROOT file called, by default ntuple.root, and a folder called, by default, analyzer (these can be changed from within python/ntuple_cfi.py). The analyzer folder contains the skeleton analysis programs in C++ and Python. Check that all is well by doing the following

cd analyzer
source setup.sh
make
echo ../ntuple.root > filelist.txt
./analyzer  

and also try the Python version

./analyzer.py

If all goes well, you will find the file analyzer_histograms.root, which of course will be empty since you've not done anything yet!

Interface with ADL

We are currently taking the analysis code generation one step further. We are developing a tool that will automatically produce a complete analysis code given the description of the analysis writtien using the domain specific analysis description language ADL. A prototype tool, a transpiler called adl2tnm, has been developed that can automatically produce a complete, executable, analysis code given an ntuple.root and the analysis description written in ADL without any need for programming. More information can be found in the adl2tnm github repository. Note this tool is still very much at the proof of principle stage!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published