To use the necessary python packages on lxplus, use conda!
Log onto a suitable Eddie node
$ ssh -Y <UUN>@eddie3.ecdf.ed.ac.uk
$ qlogin -pe sharedmem 4 -l h_vmem=10G
Clone the xbbtagger code
$ cd <YOUR-WORKING-DIRECTORY>
$ git clone git@github.com:asogaard/xbbtagger.git
$ cd xbbtagger
In the future, you can use the provided script to log onto Eddie:
$ source login.sh --help
$ source login.sh gpu 20gb
Setup the environment
$ # Install conda...
$ conda env create -f Environments/xbbtagger.yml
$ source activate xbbtagger
Go into the Preprocessing folder
$ cd Preprocessing
Copy over the data
$ mkdir -p /exports/eddie/scratch/<UUN>/xbbtagger/preprocessing/{input,output}
$ ln -s /exports/eddie/scratch/<UUN>/xbbtagger/preprocessing/input input
$ ln -s /exports/eddie/scratch/<UUN>/xbbtagger/preprocessing/output output
$ scp -r <USERNAME>@lxplus.cern.ch:/afs/cern.ch/work/a/asogaard/public/xbbtagger/input/* /exports/eddie/scratch/<UUN>/xbbtagger/preprocessing/input/
Run the preprocessing code
$ python preprocessing.py --masscut --ttbar | tee log_preprocessing.out
$ python reweighting.py --ttbar | tee log_reweighting_0.out
$ python reweighting.py --pt-flat --ttbar | tee log_reweighting_1.out
$ python preparing.py --ttbar | tee log_preparing.out
$ ls -lrt output/
Log onto a suitable Eddie node
$ ssh -Y <UUN>@eddie3.ecdf.ed.ac.uk
$ qlogin -pe gpu 2 -l h_vmem=40G
Setup the environment
$ conda env create -f Environments/xbbtagger-gpu.yml
$ module load cuda
$ source activate xbbtagger-gpu
Go into the Training folder
$ cd ../Training
Run the training code using TensorFlow (GPU should automatically be inferred if available)
$ KERAS_BACKEND=tensorflow python btagging_nn.py --input_file ../Preprocessing/output/prepared_sample_v2.h5 --batch_size=8192
or using Theano on GPU
MKL_THREADING_LAYER=GNU THEANO_FLAGS=device=cuda,floatX=float32 python btagging_nn.py --input ../Preprocessing/output/prepared_sample_v2.h5 --batch_size=8192
If you want to train multiple classifier on individual pT-slices, please read Parameterized Machine Learning for High-Energy Physics by Baldi et al. (2016). If you still want to do it, you can run e.g.
$ KERAS_BACKEND=tensorflow python btagging_nn.py --pt-slice 200 300
which saves the trained model in a unique directory (KerasFiles/*__pT_200_300GeV/
) which allows you to easily distinguish different models.
To run everything, from reweighting to training, in one go, you can use the provided script
$ source run.sh
-
Launch a qlogin session, e.g.
$ qlogin -pe gpu 2
or
source login.sh gpu 20g
-
Once on the qlogin node, run
$ ssh -NR localhost:8882:localhost:8888 login04 &
-
Run
$ jupyter notebook --no-browser
This will give you an output of text that will look like:
... [C 11:59:59.468 NotebookApp] Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=e325bef2289fc2ce991f61a28e36b66c38314f596af30a5f
You can ignore all of it apart from the last line (the URL), which you will need to paste into a web browser at a later step.
-
Leave that terminal open and in a different terminal window on your local computer run:
$ ssh -NL localhost:8888:localhost:8882 <UUN>@login04-ext.ecdf.ed.ac.uk
This will prompt you for your eddie password - enter it there.
-
Open a web browser and paste the URL that was generated in step 3 into your browser. This will start a jupyter notebook in your browser which will contain the contents of the directory you launched it from on eddie.