Skip to content

Code repository for the paper "Background Modeling for Double Higgs Boson Production: Density Ratios and Optimal Transport".

Notifications You must be signed in to change notification settings

tmanole/HH4bsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HH4bsim

Data-Driven Background Modeling for Double Higgs Boson Production in the 4b Final State.

Requirements

Python 3.5, ROOT, pyROOT, numpy, pandas, sklearn, pytorch, scipy, pot , energyflow, PlotTools.

Overview

This repository is based on the methodologies developed in the preprint

[1] Manole, T., Bryant, P., Alison, J., Kuusela, M., Wasserman, L. (2022). Background Modeling for Double Higgs Boson Production: Density Ratios and Optimal Transport. arXiv preprint arXiv:2208.02807. [PDF]

The following three background modeling methods are currently implemented.

  • FvT: Train the FvT (Four vs. Three) classifier to discriminate 4b from 3b events in a Control Region, and extrapolate to the Signal Region.
  • OT-kNN: Estimate an optimal transport coupling between the Control and Signal Regions of the 3b events, and extrapolate to 4b events using nearest neighbor methods.
  • OT-FvT: Estimate an optimal transport coupling between the Control and Signal Regions of the 3b events, and extrapolate to 4b events using the FvT classifier in the Control Region

This repository also contains a simulated dataset of events, which is described in Section 6 of [1], and was generated using this script.

Getting Started

To set up the repository for use, run the preprocess_dataset.sh script:

cd python/event_scripts
source preprocess_dataset.sh

The main script for fitting and plotting background models is background_analysis.py. In order to reproduce the plots and validation results in Section 6 of [1], run the following commands.

cd python
python background_analysis.py -m method_name -p True -rh True -v True

Here, -m refers to the method name, which can be one of HH-FvT, HH-Comb-FvT, HH-OT, or benchmark which simply refers to a rescaling of the 3b dataset. The -p switch indicates that histograms of the fit should produced. These plots require .hist ROOT files which are reproduced whenever the -rh switch is specified. The -v switch calls the FvT classifier to be trained between the fit and true 4b data as a proxy for their discrepancy. This leads to Figure X of paper [1]. To produce plots which summarize the three methods, run

python background_analysis.py -sp True

To refit any of the methods, run

python background_analysis.py -m method_name -f True  

Note that for the OT-kNN and OT-FvT methods, the above command will require couplings between the 3b Control and Signal Region distributions. These large files are available upon request, or can be reproduced using the script python/make_large_coupling.py. For the FvT method, the FvT classifier can be retrained using the following commands:

cd python/classifier/FvT
python train.py
cd ..
python make_tree_weights.py -c FvT

About

Code repository for the paper "Background Modeling for Double Higgs Boson Production: Density Ratios and Optimal Transport".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published