Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real data and weights #39

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

davidwalter2
Copy link

New PR for the domain adaption investigation.
Compared to the old PR, I cleaned a lot of things and simplified the processing steps.

  • Include preselection module for ttbar selection "DeepNtuplizer_tt_dilep.py". This can run either alone and creates files in MiniAOD format then. These can be useful to see if the fit of MC to data is good. It can also run directly with the DeepNtuplizer to extract the jets.
    In this file you have also the possibility to run on data or MC
  • Add new Branch event_weight with weight information of the event (including cross section, luminosity, efficiency pileup weight and LHE weight)
  • Add a new Branch isRealData for data events
  • I also added some smaller programs to compute the pileup weights and the effective event number for the events with lhe weights

David Walter and others added 3 commits February 19, 2018 10:10
…akes a preselection for ttbar events; Add branches isRealData and event_weight; Add small programs PupMCHistProd.py, PupWeightMaker.py and LHEcounter.py for computation of pileup weights and the effective event number for events with lhe weights
@mverzett
Copy link

@davidwalter2 I went with more detail through the PR and there is quite some work to do before merging, but nothing it can't be solved with 1 or 1.5 weeks of dedicated work on your part.

Please, during this time update this PR once every key item is done, do not delete and create a new one, it is quite useful to me to keep track of where we stand.

There are no striking mistakes, but rather a lot of potentially harmful design choices for the future. Do not worry though, I can recognise them only because I made them myself in the past :). I would say we can start from the following:

  • put all the event related variables for MC (LHE weights, PU info and all these things) in a single class that gets run only once per event by DeepNtuplizer.cc and gets access to the full event content
  • the previous point should allow you to revert back to the old signature for all the other ntuples classes
  • efficiencies and cross sections should not be stored on jet-by-jet casis, but rather reweighted in a second moment, this makes the system more flexible and saves us some space.
  • you should compute at the beginning of your cfg how many events you process and then dump it in the root file (not the tree, somewhere else), so that we can keep track of the efficiency
  • There should be only one executable cfg, avoid duplicating it into tt_dilep_selector.py and DeepNtuplizer_tt_dilep.py, it will make the life of who maintains the package (me) a living hell :). One solution is to make a set of customization functions that can be steered from the cfg itself. It's quite convoluted to explain, but rather easy to implement. When you get to this point feel free to contact me.
  • try to avoid hardcoding values and paths as much as possible. E.g. defining the global tag hardcoded might be a bad idea in the future, it's much better if you add another command line option

That should be it for the moment, once the code is in a better shape I will have a second look around to see if anything else is missing or can be done better.

Thanks!

@davidwalter2
Copy link
Author

@mverzett "you should compute at the beginning of your cfg how many events you process and then dump it in the root file (not the tree, somewhere else), so that we can keep track of the efficiency"
I have some problems with this point:

  • When I use crab, the files get split to the different jobs so I see no way to count the total event number. Is there any?
  • Anyhow I can count the events in each job, but we have to be careful when merging the output files then.
  • Is there a disadvantage when I increment the event number as a first process then doing the other processes event by event instead of computing at the beginning of my cfg how many events I process?

Thanks!

@mverzett
Copy link

@davidwalter2 sorry, I do not quite understand your statement:

Is there a disadvantage when I increment the event number as a first process then doing the other processes event by event instead of computing at the beginning of my cfg how many events I process?

Can you please clarify?

What I would suggest you to do is to create a very simple analyzer that counts how many events it sees and at the end of the job drops that number into a TTree with one branch and one entry. The file merging then becomes trivial. You need to put such analyzer above everything in a different place because later on you might (and you will) apply a selection on the events.

@davidwalter2
Copy link
Author

@mverzett Maybe we mean the same :D but I want to explain my thoughts.
I thought you mean, I should compute the total event number first and save it before I do anything else.
What I meant and what I did was that one can add several modules in the cms.Path method. So I added an analyzer which counts the event number first, before all other selections and stuff. What happens is that each event goes first through the analyser that counts the event, then the selections and stuff. Then the next event comes in and goes through all modules and so on. The result would be the same.

Anyway I have finished all points to my best ability, maybe you can look through it if you have the time.
Many Thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants