real data and weights #39

davidwalter2 · 2018-02-19T16:54:15Z

New PR for the domain adaption investigation.
Compared to the old PR, I cleaned a lot of things and simplified the processing steps.

Include preselection module for ttbar selection "DeepNtuplizer_tt_dilep.py". This can run either alone and creates files in MiniAOD format then. These can be useful to see if the fit of MC to data is good. It can also run directly with the DeepNtuplizer to extract the jets.
In this file you have also the possibility to run on data or MC
Add new Branch event_weight with weight information of the event (including cross section, luminosity, efficiency pileup weight and LHE weight)
Add a new Branch isRealData for data events
I also added some smaller programs to compute the pileup weights and the effective event number for the events with lhe weights

…akes a preselection for ttbar events; Add branches isRealData and event_weight; Add small programs PupMCHistProd.py, PupWeightMaker.py and LHEcounter.py for computation of pileup weights and the effective event number for events with lhe weights

mverzett · 2018-02-19T17:11:31Z

@davidwalter2 I went with more detail through the PR and there is quite some work to do before merging, but nothing it can't be solved with 1 or 1.5 weeks of dedicated work on your part.

Please, during this time update this PR once every key item is done, do not delete and create a new one, it is quite useful to me to keep track of where we stand.

There are no striking mistakes, but rather a lot of potentially harmful design choices for the future. Do not worry though, I can recognise them only because I made them myself in the past :). I would say we can start from the following:

put all the event related variables for MC (LHE weights, PU info and all these things) in a single class that gets run only once per event by DeepNtuplizer.cc and gets access to the full event content
the previous point should allow you to revert back to the old signature for all the other ntuples classes
efficiencies and cross sections should not be stored on jet-by-jet casis, but rather reweighted in a second moment, this makes the system more flexible and saves us some space.
you should compute at the beginning of your cfg how many events you process and then dump it in the root file (not the tree, somewhere else), so that we can keep track of the efficiency
There should be only one executable cfg, avoid duplicating it into tt_dilep_selector.py and DeepNtuplizer_tt_dilep.py, it will make the life of who maintains the package (me) a living hell :). One solution is to make a set of customization functions that can be steered from the cfg itself. It's quite convoluted to explain, but rather easy to implement. When you get to this point feel free to contact me.
try to avoid hardcoding values and paths as much as possible. E.g. defining the global tag hardcoded might be a bad idea in the future, it's much better if you add another command line option

That should be it for the moment, once the code is in a better shape I will have a second look around to see if anything else is missing or can be done better.

Thanks!

…DeepNTuples into RealDataAndWeights

…ded array

davidwalter2 · 2018-02-21T13:34:27Z

@mverzett "you should compute at the beginning of your cfg how many events you process and then dump it in the root file (not the tree, somewhere else), so that we can keep track of the efficiency"
I have some problems with this point:

When I use crab, the files get split to the different jobs so I see no way to count the total event number. Is there any?
Anyhow I can count the events in each job, but we have to be careful when merging the output files then.
Is there a disadvantage when I increment the event number as a first process then doing the other processes event by event instead of computing at the beginning of my cfg how many events I process?

Thanks!

bugfixes

mverzett · 2018-02-22T15:12:37Z

@davidwalter2 sorry, I do not quite understand your statement:

Is there a disadvantage when I increment the event number as a first process then doing the other processes event by event instead of computing at the beginning of my cfg how many events I process?

Can you please clarify?

What I would suggest you to do is to create a very simple analyzer that counts how many events it sees and at the end of the job drops that number into a TTree with one branch and one entry. The file merging then becomes trivial. You need to put such analyzer above everything in a different place because later on you might (and you will) apply a selection on the events.

davidwalter2 · 2018-02-22T16:44:45Z

@mverzett Maybe we mean the same :D but I want to explain my thoughts.
I thought you mean, I should compute the total event number first and save it before I do anything else.
What I meant and what I did was that one can add several modules in the cms.Path method. So I added an analyzer which counts the event number first, before all other selections and stuff. What happens is that each event goes first through the analyser that counts the event, then the selections and stuff. Then the next event comes in and goes through all modules and so on. The result would be the same.

Anyway I have finished all points to my best ability, maybe you can look through it if you have the time.
Many Thanks! :)

…moved selection specific configs to DeepNtuplizer_'selection'

…riods

…ght for better workflow; bug fixes

David Walter and others added 3 commits February 19, 2018 10:10

some minor changes for better usage

8b45f06

Delete webServers.xml

28dbcb9

David Walter added 5 commits February 20, 2018 12:52

New module ntuple_eventInfo.cc

aa6f17c

Merge branch 'RealDataAndWeights' of https://github.com/davidwalter2/…

7d95684

…DeepNTuples into RealDataAndWeights

add analyzer globalInfo.cc to keep track of the efficiency

5451a98

compute pileup weights from root histograms instead of using a hardco…

eeabfb0

…ded array

add pileup histograms

047ab1e

David Walter added 2 commits February 22, 2018 13:56

add first lepton scalefactor

66a9965

bugfixes

include more lepton scalefactors

4593649

save effective event number in a new tree with one branch and one entry

5aa953d

David Walter added 8 commits March 2, 2018 11:08

changes in plotting tools

1c80c17

Merge branch 'master' of https://github.com/CMSDeepFlavour/DeepNTuples

5a2aa59

include ttbar semileptonic eventselection, include muon scalefactor, …

9b6649d

…moved selection specific configs to DeepNtuplizer_'selection'

new workflow for eventweights, bug fixes and small changes

8a38c07

include trigger information to use different triggers in different pe…

8ed0d92

…riods

updates for the plotting scripts

4c68849

clean up

90af460

new ttbar semileptonic single electron selection; take total eventwei…

bbbbf15

…ght for better workflow; bug fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real data and weights #39

real data and weights #39

davidwalter2 commented Feb 19, 2018

mverzett commented Feb 19, 2018

davidwalter2 commented Feb 21, 2018

mverzett commented Feb 22, 2018

davidwalter2 commented Feb 22, 2018

real data and weights #39

Are you sure you want to change the base?

real data and weights #39

Conversation

davidwalter2 commented Feb 19, 2018

mverzett commented Feb 19, 2018

davidwalter2 commented Feb 21, 2018

mverzett commented Feb 22, 2018

davidwalter2 commented Feb 22, 2018