ϟ

Kevin Pedro

Here are all the papers by Kevin Pedro that you can download and read on OA.mg.
Kevin Pedro’s last known institution is . Download Kevin Pedro PDFs here.

Claim this Profile →
DOI: 10.23731/cyrm-2019-007.1
2019
Cited 78 times
Report from Working Group 1 : Standard Model Physics at the HL-LHC and HE-LHC
The successful operation of the Large Hadron Collider (LHC) and the excellent performance of the ATLAS, CMS, LHCb and ALICE detectors in Run-1 and Run-2 with $pp$ collisions at center-of-mass energies of 7, 8 and 13 TeV as well as the giant leap in precision calculations and modeling of fundamental interactions at hadron colliders have allowed an extraordinary breadth of physics studies including precision measurements of a variety physics processes. The LHC results have so far confirmed the validity of the Standard Model of particle physics up to unprecedented energy scales and with great precision in the sectors of strong and electroweak interactions as well as flavour physics, for instance in top quark physics. The upgrade of the LHC to a High Luminosity phase (HL-LHC) at 14 TeV center-of-mass energy with 3 ab$^{-1}$ of integrated luminosity will probe the Standard Model with even greater precision and will extend the sensitivity to possible anomalies in the Standard Model, thanks to a ten-fold larger data set, upgraded detectors and expected improvements in the theoretical understanding. This document summarises the physics reach of the HL-LHC in the realm of strong and electroweak interactions and top quark physics, and provides a glimpse of the potential of a possible further upgrade of the LHC to a 27 TeV $pp$ collider, the High-Energy LHC (HE-LHC), assumed to accumulate an integrated luminosity of 15 ab$^{-1}$.
DOI: 10.1088/2632-2153/aba042
2020
Cited 60 times
Compressing deep neural networks on FPGAs to binary and ternary precision with <tt>hls4ml</tt>
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with field-programmable gate arrays (FPGA) firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
DOI: 10.1088/2632-2153/ac0ea1
2021
Cited 53 times
Fast convolutional neural networks on FPGAs with hls4ml
Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µ s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
DOI: 10.3389/fdata.2020.598927
2021
Cited 41 times
Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$\mu\mathrm{s}$ on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the $\mathtt{hls4ml}$ library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage.
DOI: 10.1007/s41781-019-0027-2
2019
Cited 43 times
FPGA-Accelerated Machine Learning Inference as a Service for Particle Physics Computing
Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) ms with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600–700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
DOI: 10.48550/arxiv.2003.11603
2020
Cited 39 times
Graph Neural Networks for Particle Reconstruction in High Energy Physics detectors
Pattern recognition problems in high energy physics are notably different from traditional machine learning applications in computer vision. Reconstruction algorithms identify and measure the kinematic properties of particles produced in high energy collisions and recorded with complex detector systems. Two critical applications are the reconstruction of charged particle trajectories in tracking detectors and the reconstruction of particle showers in calorimeters. These two problems have unique challenges and characteristics, but both have high dimensionality, high degree of sparsity, and complex geometric layouts. Graph Neural Networks (GNNs) are a relatively new class of deep learning architectures which can deal with such data effectively, allowing scientists to incorporate domain knowledge in a graph structure and learn powerful representations leveraging that structure to identify patterns of interest. In this work we demonstrate the applicability of GNNs to these two diverse particle reconstruction problems.
DOI: 10.1140/epjc/s10052-022-11048-8
2022
Cited 17 times
Theory, phenomenology, and experimental avenues for dark showers: a Snowmass 2021 report
Abstract In this work, we consider the case of a strongly coupled dark/hidden sector, which extends the Standard Model (SM) by adding an additional non-Abelian gauge group. These extensions generally contain matter fields, much like the SM quarks, and gauge fields similar to the SM gluons. We focus on the exploration of such sectors where the dark particles are produced at the LHC through a portal and undergo rapid hadronization within the dark sector before decaying back, at least in part and potentially with sizeable lifetimes, to SM particles, giving a range of possibly spectacular signatures such as emerging or semi-visible jets. Other, non-QCD-like scenarios leading to soft unclustered energy patterns or glueballs are also discussed. After a review of the theory, existing benchmarks and constraints, this work addresses how to build consistent benchmarks from the underlying physical parameters and present new developments for the pythia Hidden Valley module, along with jet substructure studies. Finally, a series of improved search strategies is presented in order to pave the way for a better exploration of the dark showers at the LHC.
DOI: 10.1007/jhep02(2022)074
2022
Cited 15 times
Autoencoders for semivisible jet detection
A bstract The production of dark matter particles from confining dark sectors may lead to many novel experimental signatures. Depending on the details of the theory, dark quark production in proton-proton collisions could result in semivisible jets of particles: collimated sprays of dark hadrons of which only some are detectable by particle collider experiments. The experimental signature is characterised by the presence of reconstructed missing momentum collinear with the visible components of the jets. This complex topology is sensitive to detector inefficiencies and mis-reconstruction that generate artificial missing momentum. With this work, we propose a signal-agnostic strategy to reject ordinary jets and identify semivisible jets via anomaly detection techniques. A deep neural autoencoder network with jet substructure variables as input proves highly useful for analyzing anomalous jets. The study focuses on the semivisible jet signature; however, the technique can apply to any new physics model that predicts signatures with anomalous jets from non-SM particles.
DOI: 10.1088/2632-2153/acca5f
2023
Cited 5 times
DeepAstroUDA: Semi-Supervised Universal Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection
Abstract Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, DeepAstroUDA , as an approach to overcome this challenge. This algorithm performs semi-supervised domain adaptation (DA) and can be applied to datasets with different data distributions and class overlaps. Non-overlapping classes can be present in any of the two datasets (the labeled source domain, or the unlabeled target domain), and the method can even be used in the presence of unknown classes. We apply our method to three examples of galaxy morphology classification tasks of different complexities (three-class and ten-class problems), with anomaly detection: (1) datasets created after different numbers of observing years from a single survey (Legacy Survey of Space and Time mock data of one and ten years of observations); (2) data from different surveys (Sloan Digital Sky Survey (SDSS) and DECaLS); and (3) data from observing fields with different depths within one survey (wide field and Stripe 82 deep field of SDSS). For the first time, we demonstrate the successful use of DA between very discrepant observational datasets. DeepAstroUDA is capable of bridging the gap between two astronomical surveys, increasing classification accuracy in both domains (up to <?CDATA $40\%$?> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mn>40</mml:mn> <mml:mi mathvariant="normal">%</mml:mi> </mml:math> on the unlabeled data), and making model performance consistent across datasets. Furthermore, our method also performs well as an anomaly detection algorithm and successfully clusters unknown class samples even in the unlabeled target dataset.
DOI: 10.1088/2632-2153/abec21
2021
Cited 16 times
GPU coprocessors as a service for deep learning inference in high energy physics
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running.
DOI: 10.3389/fdata.2020.604083
2021
Cited 15 times
GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments
Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.
DOI: 10.1088/2632-2153/ac7f1a
2022
Cited 9 times
DeepAdversaries: examining the robustness of deep learning models for galaxy morphology classification
Abstract With increased adoption of supervised deep learning methods for work with cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: (a) increased observational noise as represented by higher levels of Poisson noise and (b) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness in the face of these perturbations. For deep learning models without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy up to 23% on data with higher observational noise. Domain adaptation also increases up to a factor of <?CDATA ${\approx}2.3$?> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:mo>≈</mml:mo> </mml:mrow> <mml:mn>2.3</mml:mn> </mml:math> the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations. Successful development and implementation of methods that increase model robustness in astronomical survey pipelines will help pave the way for many more uses of deep learning for astronomy.
DOI: 10.1088/1742-6596/2438/1/012090
2023
GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter
Abstract We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance.
DOI: 10.1103/physrevd.108.072014
2023
Denoising diffusion models with geometry adaptation for high fidelity calorimeter simulation
Simulation is crucial for all aspects of collider data analysis, but the available computing budget in the High Luminosity LHC era will be severely constrained. Generative machine learning models may act as surrogates to replace physics-based full simulation of particle detectors, and diffusion models have recently emerged as the state of the art for other generative tasks. We introduce CaloDiffusion, a denoising diffusion model trained on the public CaloChallenge datasets to generate calorimeter showers. Our algorithm employs 3D cylindrical convolutions, which take advantage of symmetries of the underlying data representation. To handle irregular detector geometries, we augment the diffusion model with a new geometry latent mapping (GLaM) layer to learn forward and reverse transformations to a regular geometry that is suitable for cylindrical convolutions. The showers generated by our approach are nearly indistinguishable from the full simulation, as measured by several different metrics.
DOI: 10.48550/arxiv.2404.02100
2024
Analysis Facilities White Paper
This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HSF pre-CHEP workshop, that took place in May 2023. The paper attempts to cover all the aspects of an analysis facility.
DOI: 10.1051/epjconf/202429509032
2024
Refining fast simulation using machine learning
At the CMS experiment, a growing reliance on the fast Monte Carlo application (FastSim) will accompany the high luminosity and detector granularity expected in Phase 2. The FastSim chain is roughly 10 times faster than the application based on the Geant4 detector simulation and full reconstruction referred to as FullSim. However, this advantage comes at the price of decreased accuracy in some of the final analysis observables. In this contribution, a machine learning-based technique to refine those observables is presented. We employ a regression neural network trained with a sophisticated combination of multiple loss functions to provide post-hoc corrections to samples produced by the FastSim chain. The results show considerably improved agreement with the FullSim output and an improvement in correlations among output observables and external parameters. This technique is a promising replacement for existing correction factors, providing higher accuracy and thus contributing to the wider usage of FastSim.
DOI: 10.1051/epjconf/202429503017
2024
Full Simulation of CMS for Run-3 and Phase-2
In this contribution we report the status of the CMS Geant4 simulation and the prospects for Run-3 and Phase-2. Firstly, we report about our experience during the start of Run-3 with Geant4 10.7.2, the common software package DD4hep for geometry description, and VecGeom runtime geometry library. In addition, FTFP_BERT_EMM Physics List and CMS configuration for tracking in magnetic field have been utilized. For the first time, for the Grid mass production of Monte-Carlo, this combination of components is used. Further simulation improvements are under development targeting Run-3 such as the switch to the new Geant4 11.1 in production, that provides several features important for the optimization of simulation, for example the new transportation process with built-in multiple scattering, neutron general process, custom tracking manager, G4HepEm sub-library, and others. We will present evaluation of various options, validation results, and the final choice of simulation configuration for 2023 production and beyond. The performance of the CMS full simulation for Run-2 and Run-3 will also be discussed. CMS development plan for the Phase-2 Geant4 based simulation is very ambitious, and it includes a new geometry description, physics, and simulation configurations. The progress on new detector descriptions and full simulation will be presented as well as the R&amp;D in progress to reduce compute capacity needs.
DOI: 10.1051/epjconf/202024506012
2020
Cited 12 times
Coffea Columnar Object Framework For Effective Analysis
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.
DOI: 10.48550/arxiv.1812.07638
2018
Cited 14 times
Opportunities in Flavour Physics at the HL-LHC and HE-LHC
Motivated by the success of the flavour physics programme carried out over the last decade at the Large Hadron Collider (LHC), we characterize in detail the physics potential of its High-Luminosity and High-Energy upgrades in this domain of physics. We document the extraordinary breadth of the HL/HE-LHC programme enabled by a putative Upgrade II of the dedicated flavour physics experiment LHCb and the evolution of the established flavour physics role of the ATLAS and CMS general purpose experiments. We connect the dedicated flavour physics programme to studies of the top quark, Higgs boson, and direct high-$p_T$ searches for new particles and force carriers. We discuss the complementarity of their discovery potential for physics beyond the Standard Model, affirming the necessity to fully exploit the LHC's flavour physics potential throughout its upgrade eras.
DOI: 10.48550/arxiv.2203.08806
2022
Cited 5 times
New directions for surrogate models and differentiable programming for High Energy Physics detector simulation
The computational cost for high energy physics detector simulation in future experimental facilities is going to exceed the current available resources. To overcome this challenge, new ideas on surrogate models using machine learning methods are being explored to replace computationally expensive components. Additionally, differentiable programming has been proposed as a complementary approach, providing controllable and scalable simulation routines. In this document, new and ongoing efforts for surrogate models and differential programming applied to detector simulation are discussed in the context of the 2021 Particle Physics Community Planning Exercise (`Snowmass').
DOI: 10.1109/h2rc51942.2020.00010
2020
Cited 10 times
FPGAs-as-a-Service Toolkit (FaaST)
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
DOI: 10.1088/1742-6596/2438/1/012079
2023
Denoising Convolutional Networks to Accelerate Detector Simulation
Abstract The high accuracy of detector simulation is crucial for modern particle physics experiments. However, this accuracy comes with a high computational cost, which will be exacerbated by the large datasets and complex detector upgrades associated with next-generation facilities such as the High Luminosity LHC. We explore the viability of regression-based machine learning (ML) approaches using convolutional neural networks (CNNs) to “denoise” faster, lower-quality detector simulations, augmenting them to produce a higher-quality final result with a reduced computational burden. The denoising CNN works in concert with classical detector simulation software rather than replacing it entirely, increasing its reliability compared to other ML approaches to simulation. We obtain promising results from a prototype based on photon showers in the CMS electromagnetic calorimeter. Future directions are also discussed.
DOI: 10.21468/scipostphyscore.6.4.067
2023
Optimal mass variables for semivisible jets
Strongly coupled hidden sector theories predict collider production of invisible, composite dark matter candidates mixed with standard model hadrons in the form of semivisible jets. Classical mass reconstruction techniques may not be optimal for these unusual topologies, in which the missing transverse momentum comes from massive particles and has a nontrivial relationship to the visible jet momentum. We apply the artificial event variable network, a semisupervised, interpretable machine learning technique that uses an information bottleneck, to derive superior mass reconstruction functions for several cases of resonant semivisible jet production. We demonstrate that the technique can extrapolate to unknown signal model parameter values. We further demonstrate the viability of conducting an actual search for new physics using this method, by applying the learned functions to standard model background events from quantum chromodynamics.
DOI: 10.1051/epjconf/201921402036
2019
Cited 7 times
Current and Future Performance of the CMS Simulation
The CMS full simulation using Geant4 has delivered billions of simulated events for analysis during Runs 1 and 2 of the LHC. However, the HL-LHC dataset will be an order of magnitude larger, with a similar increase in occupancy per event. In addition, the upgraded CMS detector will be considerably more complex, with an extended silicon tracker and a high granularity calorimeter in the endcap region. Increases in conventional computing resources are subject to both technological and budgetary limitations, so novel approaches are needed to improve software efficiency and to take advantage of new architectures and heterogeneous resources. Several projects are in development to address these needs, including the vectorized geometry library Vec-Geom and the GeantV transport engine, which uses track-level parallelization. The current computing performance of the CMS simulation will be presented as a baseline, along with an overview of the various optimizations already available for Geant4. Finally, the progress and outlook for integrating VecGeom and GeantV in the CMS software framework will be discussed.
DOI: 10.48550/arxiv.2005.00949
2020
Cited 4 times
GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP
Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&amp;D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&amp;D, as well as the conclusions and lessons learnt from the beta prototype.
DOI: 10.48550/arxiv.2008.13636
2020
Cited 4 times
HL-LHC Computing Review: Common Tools and Community Software
Common and community software packages, such as ROOT, Geant4 and event generators have been a key part of the LHC's success so far and continued development and optimisation will be critical in the future. The challenges are driven by an ambitious physics programme, notably the LHC accelerator upgrade to high-luminosity, HL-LHC, and the corresponding detector upgrades of ATLAS and CMS. In this document we address the issues for software that is used in multiple experiments (usually even more widely than ATLAS and CMS) and maintained by teams of developers who are either not linked to a particular experiment or who contribute to common software within the context of their experiment activity. We also give space to general considerations for future software and projects that tackle upcoming challenges, no matter who writes it, which is an area where community convergence on best practice is extremely useful.
DOI: 10.1051/epjconf/202024502020
2020
Cited 3 times
Integration and Performance of New Technologies in the CMS Simulation
The HL-LHC and the corresponding detector upgrades for the CMS experiment will present extreme challenges for the full simulation. In particular, increased precision in models of physics processes may be required for accurate reproduction of particle shower measurements from the upcoming High Granularity Calorimeter. The CPU performance impacts of several proposed physics models will be discussed. There are several ongoing research and development efforts to make efficient use of new computing architectures and high performance computing systems for simulation. The integration of these new R&amp;D products in the CMS software framework and corresponding CPU performance improvements will be presented.
DOI: 10.1007/s41781-020-00048-6
2021
Cited 3 times
GeantV
Abstract Full detector simulation was among the largest CPU consumers in all CERN experiment software stacks for the first two runs of the Large Hadron Collider. In the early 2010s, it was projected that simulation demands would scale linearly with increasing luminosity, with only partial compensation from increasing computing resources. The extension of fast simulation approaches to cover more use cases that represent a larger fraction of the simulation budget is only part of the solution, because of intrinsic precision limitations. The remainder corresponds to speeding up the simulation software by several factors, which is not achievable by just applying simple optimizations to the current code base. In this context, the GeantV R&amp;D project was launched, aiming to redesign the legacy particle transport code in order to benefit from features of fine-grained parallelism, including vectorization and increased locality of both instruction and data. This paper provides an extensive presentation of the results and achievements of this R&amp;D project, as well as the conclusions and lessons learned from the beta version prototype.
DOI: 10.1007/s41781-023-00101-0
2023
Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing
We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements.
DOI: 10.2172/1915406
2023
Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection
the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets (from SDSS and DECaLS). We show that our method is capable of bridging the gap between two astronomical surveys, and also performs well for anomaly detection and clustering of unknown data in the unlabeled dataset. We apply our model to two examples of galaxy morphology classification tasks with anomaly detection: 1) classifying spiral and elliptical galaxies with detection of merging galaxies (three classes including one unknown anomaly class); 2) a more granular problem where the classes describe more detailed morphological properties of galaxies, with the detection of gravitational lenses (ten classes including one unknown anomaly class).
DOI: 10.48550/arxiv.2303.16253
2023
Optimal Mass Variables for Semivisible Jets
Strongly coupled hidden sector theories predict collider production of invisible, composite dark matter candidates mixed with standard model hadrons in the form of semivisible jets. Classical mass reconstruction techniques may not be optimal for these unusual topologies, in which the missing transverse momentum comes from massive particles and has a nontrivial relationship to the visible jet momentum. We apply the artificial event variable network, a semisupervised, interpretable machine learning technique that uses an information bottleneck, to derive superior mass reconstruction functions for several cases of resonant semivisible jet production. We demonstrate that the technique can extrapolate to unknown signal model parameter values. We further demonstrate the viability of conducting an actual search for new physics using this method, by applying the learned functions to standard model background events from quantum chromodynamics.
DOI: 10.2172/1975521
2023
AI and Beyond: New Techniques for Simulation and Design in HEP
DOI: 10.48550/arxiv.2306.08106
2023
Applications of Deep Learning to physics workflows
Modern large-scale physics experiments create datasets with sizes and streaming rates that can exceed those from industry leaders such as Google Cloud and Netflix. Fully processing these datasets requires both sufficient compute power and efficient workflows. Recent advances in Machine Learning (ML) and Artificial Intelligence (AI) can either improve or replace existing domain-specific algorithms to increase workflow efficiency. Not only can these algorithms improve the physics performance of current algorithms, but they can often be executed more quickly, especially when run on coprocessors such as GPUs or FPGAs. In the winter of 2023, MIT hosted the Accelerating Physics with ML at MIT workshop, which brought together researchers from gravitational-wave physics, multi-messenger astrophysics, and particle physics to discuss and share current efforts to integrate ML tools into their workflows. The following white paper highlights examples of algorithms and computing frameworks discussed during this workshop and summarizes the expected computing needs for the immediate future of the involved fields.
DOI: 10.2172/1988513
2023
Fast &amp;amp; Accurate Calorimeter Simulation With Diffusion Models
LHC increases, the scintillator tiles used in the CMS Hadronic Endcap calorimeter will lose their efficiency. This report outlines two possible radiation hard upgrade scenarios based on replacing the HE scintillators with quartz plates.
DOI: 10.48550/arxiv.2308.03876
2023
Denoising diffusion models with geometry adaptation for high fidelity calorimeter simulation
Simulation is crucial for all aspects of collider data analysis, but the available computing budget in the High Luminosity LHC era will be severely constrained. Generative machine learning models may act as surrogates to replace physics-based full simulation of particle detectors, and diffusion models have recently emerged as the state of the art for other generative tasks. We introduce CaloDiffusion, a denoising diffusion model trained on the public CaloChallenge datasets to generate calorimeter showers. Our algorithm employs 3D cylindrical convolutions, which take advantage of symmetries of the underlying data representation. To handle irregular detector geometries, we augment the diffusion model with a new geometry latent mapping (GLaM) layer to learn forward and reverse transformations to a regular geometry that is suitable for cylindrical convolutions. The showers generated by our approach are nearly indistinguishable from the full simulation, as measured by several different metrics.
DOI: 10.48550/arxiv.2309.12919
2023
Refining fast simulation using machine learning
At the CMS experiment, a growing reliance on the fast Monte Carlo application (FastSim) will accompany the high luminosity and detector granularity expected in Phase 2. The FastSim chain is roughly 10 times faster than the application based on the GEANT4 detector simulation and full reconstruction referred to as FullSim. However, this advantage comes at the price of decreased accuracy in some of the final analysis observables. In this contribution, a machine learning-based technique to refine those observables is presented. We employ a regression neural network trained with a sophisticated combination of multiple loss functions to provide post-hoc corrections to samples produced by the FastSim chain. The results show considerably improved agreement with the FullSim output and an improvement in correlations among output observables and external parameters. This technique is a promising replacement for existing correction factors, providing higher accuracy and thus contributing to the wider usage of FastSim.
DOI: 10.48550/arxiv.2312.06838
2023
Optimizing High Throughput Inference on Graph Neural Networks at Shared Computing Facilities with the NVIDIA Triton Inference Server
With machine learning applications now spanning a variety of computational tasks, multi-user shared computing facilities are devoting a rapidly increasing proportion of their resources to such algorithms. Graph neural networks (GNNs), for example, have provided astounding improvements in extracting complex signatures from data and are now widely used in a variety of applications, such as particle jet classification in high energy physics (HEP). However, GNNs also come with an enormous computational penalty that requires the use of GPUs to maintain reasonable throughput. At shared computing facilities, such as those used by physicists at Fermi National Accelerator Laboratory (Fermilab), methodical resource allocation and high throughput at the many-user scale are key to ensuring that resources are being used as efficiently as possible. These facilities, however, primarily provide CPU-only nodes, which proves detrimental to time-to-insight and computational throughput for workflows that include machine learning inference. In this work, we describe how a shared computing facility can use the NVIDIA Triton Inference Server to optimize its resource allocation and computing structure, recovering high throughput while scaling out to multiple users by massively parallelizing their machine learning inference. To demonstrate the effectiveness of this system in a realistic multi-user environment, we use the Fermilab Elastic Analysis Facility augmented with the Triton Inference Server to provide scalable and high throughput access to a HEP-specific GNN and report on the outcome.
DOI: 10.2172/1436702
2018
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
DOI: 10.1051/epjconf/201921402031
2019
Electromagnetic physics vectorization in the GeantV transport framework
The development of the GeantV Electromagnetic (EM) physics package has evolved following two necessary paths towards code modernization. A first phase required the revision of the main electromagnetic physics models and their implementation. The main objectives were to improve their accuracy, extend them to the new high-energy frontier posed by the Future Circular Collider (FCC) programme and allow a better adaptation to a multi-particle flow. Most of the EM physics models in GeantV have been reviewed from theoretical perspective and rewritten with vector-friendly implementations, being now available in scalar mode in the alpha release. The second phase consists of a thorough investigation on the possibility to vectorise the most CPU-intensive physics code parts, such as final state sampling. We have shown the feasibility of implementing electromagnetic physics models that take advantage of SIMD/SIMT architectures, thus obtaining gains in performance. After this phase, the time has come for the GeantV project to take a step forward towards the final proof of concept. This takes shape through the testing of the full simulation chain (transport + physics + geometry) running in vectorized mode. In this paper we will present the first benchmark results obtained after vectorizing a full set of electromagnetic physics models.
DOI: 10.1051/epjconf/202125103016
2021
CMS Full Simulation for Run 3
We report the status of the CMS full simulation for Run 3. During the long shutdown of the LHC a significant update has been introduced to the CMS code for simulation. The CMS geometry description is reviewed. Several important modifications were needed. CMS detector description software is migrated to the DD4Hep community developed tool. We will report on our experience obtained during the process of this migration. Geant4 10.7 is the CMS choice for Run 3 simulation productions. We will discuss arguments for this choice, the strategy of adaptation of a new Geant4 version, and will report on the physics performance of the CMS simulation. A special Geant4 Physics List configuration FTFP_BERT_EMM will be described, which provides a compromise between simulation accuracy and CPU performance. A significant fraction of time for simulation of CMS events is spent on tracking of charged particles in a magnetic field. In the CMS simulation a dynamic choice of Geant4 parameters for tracking in field is implemented. A new method is introduced into simulation of electromagnetic components of hadronic showers in the electromagnetic calorimeter of CMS. For low-energy electrons and positrons a parametrization of GFlash type is applied. Results of tests of this method will be discussed. In summary, we expect about 25% speedup of the CMS simulation production for Run 3 compared to the Run 2 simulations.
DOI: 10.2172/1437300
2018
HEP Software Foundation Community White Paper Working Group - Detector Simulation
A working group on detector simulation was formed as part of the high-energy physics (HEP) Software Foundation's initiative to prepare a Community White Paper that describes the main software challenges and opportunities to be faced in the HEP field over the next decade. The working group met over a period of several months in order to review the current status of the Full and Fast simulation applications of HEP experiments and the improvements that will need to be made in order to meet the goals of future HEP experimental programmes. The scope of the topics covered includes the main components of a HEP simulation application, such as MC truth handling, geometry modeling, particle propagation in materials and fields, physics modeling of the interactions of particles with matter, the treatment of pileup and other backgrounds, as well as signal processing and digitisation. The resulting work programme described in this document focuses on the need to improve both the software performance and the physics of detector simulation. The goals are to increase the accuracy of the physics models and expand their applicability to future physics programmes, while achieving large factors in computing performance gains consistent with projections on available computing resources.
DOI: 10.2172/1633739
2019
COFFEA - Columnar Object Framework For Effective Analysis [Slides]
The COFFEA Framework provides a new approach to HEP analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language and commodity big data technologies such as Apache Spark and NoSQL databases. To achieve this suite of improvements across many use cases, COFFEA takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will present published results from analysis of CMS data using the COFFEA framework along with a discussion of metrics and the user experience of arriving at those results with columnar analysis.
DOI: 10.3389/fphy.2022.913510
2022
Detector Simulation Challenges for Future Accelerator Experiments
Detector simulation is a key component for studies on prospective future high-energy colliders, the design, optimization, testing and operation of particle physics experiments, and the analysis of the data collected to perform physics measurements. This review starts from the current state of the art technology applied to detector simulation in high-energy physics and elaborates on the evolution of software tools developed to address the challenges posed by future accelerator programs beyond the HL-LHC era, into the 2030–2050 period. New accelerator, detector, and computing technologies set the stage for an exercise in how detector simulation will serve the needs of the high-energy physics programs of the mid 21st century, and its potential impact on other research domains.
DOI: 10.2172/1895409
2022
CompF2: Theoretical Calculations and Simulation Topical Group Report
This report summarizes the work of the Computational Frontier topical group on theoretical calculations and simulation for Snowmass 2021. We discuss the challenges, potential solutions, and needs facing six diverse but related topical areas that span the subject of theoretical calculations and simulation in high energy physics (HEP): cosmic calculations, particle accelerator modeling, detector simulation, event generators, perturbative calculations, and lattice QCD (quantum chromodynamics). The challenges arise from the next generations of HEP experiments, which will include more complex instruments, provide larger data volumes, and perform more precise measurements. Calculations and simulations will need to keep up with these increased requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.
DOI: 10.2172/1592156
2018
Response to NITRD, NCO, NSF Request for Information on "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan"
We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.
DOI: 10.1051/epjconf/201921402007
2019
Recent progress with the top to bottom approach to vectorization in GeantV
SIMD acceleration can potentially boost by factors the application throughput. Achieving efficient SIMD vectorization for scalar code with complex data flow and branching logic, goes however way beyond breaking some loop dependencies and relying on the compiler. Since the refactoring effort scales with the number of lines of code, it is important to understand what kind of performance gains can be expected in such complex cases. We started to investigate a couple of years ago a top to bottom vectorization approach to particle transport simulation. Percolating vector data to algorithms was mandatory since not all the components can internally vectorize. Vectorizing low-level algorithms is certainly necessary, but not sufficient to achieve relevant SIMD gains. In addition, the overheads for maintaining the concurrent vector data flow and copy data have to be minimized. In the context of a vectorization R&amp;D for simulation we developed a framework to allow different categories of scalar and vectorized components to co-exist, dealing with data flow management and real-time heuristic optimizations. The paper describes our approach on coordinating SIMD vectorization at framework level, making a detailed quantitative analysis of the SIMD gain versus overheads, with a breakdown by components in terms of geometry, physics and magnetic field propagation. We also present the more general context of this R&amp;D work and goals for 2018.
DOI: 10.2172/1570210
2019
FPGAs as a Service to Accelerate Machine Learning Inference [PowerPoint]
Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC, and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave b y Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) milliseconds with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600--700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
2020
FPGAs-as-a-Service Toolkit (FaaST)
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
DOI: 10.2172/1633741
2019
Integration and Performance of New Technologies in the CMS Simulation [Slides]
CMS upgrade simulation for HL-LHC may be 2–3×slower than Run 2. CMS integration tests of GeantV met all goals: 1. Co-development ensured compatible threading models & interfaces, 2. Similar speedup measured in full experiment software framework, and 3. Efficient path to integration was established. GeantV prototype is a useful demonstrator. The next step is a GeantX project for HPCs w/ GPUs.
DOI: 10.13016/m26c9q
2014
Search for Pair Production of Third-Generation Scalar Leptoquarks and R-Parity Violating Top Squarks in Proton-Proton Collisions at $\sqrt{s}$ = 8 TeV
DOI: 10.1088/1748-0221/11/11/p11018
2016
Liquid scintillator tiles for calorimetry
Future experiments in high energy and nuclear physics may require large, inexpensive calorimeters that can continue to operate after receiving doses of 50 Mrad or more. The light output of liquid scintillators suffers little degradation under irradiation. However, many challenges exist before liquids can be used in sampling calorimetry, especially regarding developing a packaging that has sufficient efficiency and uniformity of light collection, as well as suitable mechanical properties. We present the results of a study of a scintillator tile based on the EJ-309 liquid scintillator using cosmic rays and test beam on the light collection efficiency and uniformity, and some preliminary results on radiation hardness.
2013
CMS HCAL Endcap Simulations for the High Luminosity LHC
DOI: 10.5281/zenodo.1133438
2017
delphes/delphes: Delphes-3.4.2pre11
updated DenseTrackFilter workflow updated muon iD (pT > 2) added card with run2 calorimeter end-caps
DOI: 10.2172/1835859
2022
Denoising Convolutional Networks to Accelerate Detector Simulation
The high accuracy of detector simulation is crucial for modern particle physics experiments. However, this accuracy comes with a high computational cost, which will be exacerbated by the large datasets and complex detector upgrades associated with next-generation facilities such as the High Luminosity LHC. We explore the viability of regression-based machine learning (ML) approaches using convolutional neural networks (CNNs) to "denoise" faster, lower-quality detector simulations, augmenting them to produce a higher-quality final result with a reduced computational burden. The denoising CNN works in concert with classical detector simulation software rather than replacing it entirely, increasing its reliability compared to other ML approaches to simulation. We obtain promising results from a prototype based on photon showers in the CMS electromagnetic calorimeter. Future directions are also discussed.
DOI: 10.48550/arxiv.2203.07614
2022
Detector and Beamline Simulation for Next-Generation High Energy Physics Experiments
The success of high energy physics programs relies heavily on accurate detector simulations and beam interaction modeling. The increasingly complex detector geometries and beam dynamics require sophisticated techniques in order to meet the demands of current and future experiments. Common software tools used today are unable to fully utilize modern computational resources, while data-recording rates are often orders of magnitude larger than what can be produced via simulation. In this paper, we describe the state, current and future needs of high energy physics detector and beamline simulations and related challenges, and we propose a number of possible ways to address them.
2022
GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter
2022
HL-LHC Computing Review Stage 2, Common Software Projects: Data Science Tools for Analysis
This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community.
DOI: 10.48550/arxiv.2203.16255
2022
Physics Community Needs, Tools, and Resources for Machine Learning
Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years.
2022
New directions for surrogate models and differentiable programming for High Energy Physics detector simulation
2022
GNN-based end-to-end reconstruction in the CMS Phase 2 High-Granularity Calorimeter
We present the current stage of research progress towards a one-pass, completely Machine Learning (ML) based imaging calorimeter reconstruction. The model used is based on Graph Neural Networks (GNNs) and directly analyzes the hits in each HGCAL endcap. The ML algorithm is trained to predict clusters of hits originating from the same incident particle by labeling the hits with the same cluster index. We impose simple criteria to assess whether the hits associated as a cluster by the prediction are matched to those hits resulting from any particular individual incident particles. The algorithm is studied by simulating two tau leptons in each of the two HGCAL endcaps, where each tau may decay according to its measured standard model branching probabilities. The simulation includes the material interaction of the tau decay products which may create additional particles incident upon the calorimeter. Using this varied multiparticle environment we can investigate the application of this reconstruction technique and begin to characterize energy containment and performance.
DOI: 10.48550/arxiv.2202.02194
2022
HL-LHC Computing Review Stage 2, Common Software Projects: Data Science Tools for Analysis
This paper was prepared by the HEP Software Foundation (HSF) PyHEP Working Group as input to the second phase of the LHCC review of High-Luminosity LHC (HL-LHC) computing, which took place in November, 2021. It describes the adoption of Python and data science tools in HEP, discusses the likelihood of future scenarios, and recommendations for action by the HEP community.
DOI: 10.48550/arxiv.2207.00122
2022
Snowmass '21 Community Engagement Frontier 6: Public Policy and Government Engagement: Congressional Advocacy for HEP Funding (The "DC Trip'')
This document has been prepared as a Snowmass contributed paper by the Public Policy \& Government Engagement topical group (CEF06) within the Community Engagement Frontier. The charge of CEF06 is to review all aspects of how the High Energy Physics (HEP) community engages with government at all levels and how public policy impacts members of the community and the community at large, and to assess and raise awareness within the community of direct community-driven engagement of the U.S. federal government (\textit{i.e.} advocacy). The focus of this paper is the advocacy undertaken by the HEP community that pertains directly to the funding of the field by the U.S. federal government.
DOI: 10.48550/arxiv.2207.00124
2022
Snowmass '21 Community Engagement Frontier 6: Public Policy and Government Engagement: Congressional Advocacy for Areas Beyond HEP Funding
This document has been prepared as a Snowmass contributed paper by the Public Policy \&amp; Government Engagement topical group (CEF06) within the Community Engagement Frontier. The charge of CEF06 is to review all aspects of how the High Energy Physics (HEP) community engages with government at all levels and how public policy impacts members of the community and the community at large, and to assess and raise awareness within the community of direct community-driven engagement of the US federal government (\textit{i.e.} advocacy). The focus of this paper is the potential for HEP community advocacy on topics other than funding for basic research.
DOI: 10.48550/arxiv.2207.00125
2022
Snowmass '21 Community Engagement Frontier 6: Public Policy and Government Engagement: Non-Congressional Government Engagement
This document has been prepared as a Snowmass contributed paper by the Public Policy & Government Engagement topical group (CEF06) within the Community Engagement Frontier. The charge of CEF06 is to review all aspects of how the High Energy Physics (HEP) community engages with government at all levels and how public policy impacts members of the community and the community at large, and to assess and raise awareness within the community of direct community-driven engagement of the US federal government (i.e. advocacy). The focus of this paper is HEP community engagement of government entities other than the U.S. federal legislature (i.e. Congress).
DOI: 10.48550/arxiv.2202.05320
2022
Denoising Convolutional Networks to Accelerate Detector Simulation
The high accuracy of detector simulation is crucial for modern particle physics experiments. However, this accuracy comes with a high computational cost, which will be exacerbated by the large datasets and complex detector upgrades associated with next-generation facilities such as the High Luminosity LHC. We explore the viability of regression-based machine learning (ML) approaches using convolutional neural networks (CNNs) to "denoise" faster, lower-quality detector simulations, augmenting them to produce a higher-quality final result with a reduced computational burden. The denoising CNN works in concert with classical detector simulation software rather than replacing it entirely, increasing its reliability compared to other ML approaches to simulation. We obtain promising results from a prototype based on photon showers in the CMS electromagnetic calorimeter. Future directions are also discussed.
DOI: 10.48550/arxiv.2209.08177
2022
CompF2: Theoretical Calculations and Simulation Topical Group Report
This report summarizes the work of the Computational Frontier topical group on theoretical calculations and simulation for Snowmass 2021. We discuss the challenges, potential solutions, and needs facing six diverse but related topical areas that span the subject of theoretical calculations and simulation in high energy physics (HEP): cosmic calculations, particle accelerator modeling, detector simulation, event generators, perturbative calculations, and lattice QCD (quantum chromodynamics). The challenges arise from the next generations of HEP experiments, which will include more complex instruments, provide larger data volumes, and perform more precise measurements. Calculations and simulations will need to keep up with these increased requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.
DOI: 10.2172/1881236
2022
CompF2 Recommendations [Slides]
requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.
DOI: 10.48550/arxiv.2211.00677
2022
Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology Classification and Anomaly Detection
In the era of big astronomical surveys, our ability to leverage artificial intelligence algorithms simultaneously for multiple datasets will open new avenues for scientific discovery. Unfortunately, simply training a deep neural network on images from one data domain often leads to very poor performance on any other dataset. Here we develop a Universal Domain Adaptation method DeepAstroUDA, capable of performing semi-supervised domain alignment that can be applied to datasets with different types of class overlap. Extra classes can be present in any of the two datasets, and the method can even be used in the presence of unknown classes. For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets (from SDSS and DECaLS). We show that our method is capable of bridging the gap between two astronomical surveys, and also performs well for anomaly detection and clustering of unknown data in the unlabeled dataset. We apply our model to two examples of galaxy morphology classification tasks with anomaly detection: 1) classifying spiral and elliptical galaxies with detection of merging galaxies (three classes including one unknown anomaly class); 2) a more granular problem where the classes describe more detailed morphological properties of galaxies, with the detection of gravitational lenses (ten classes including one unknown anomaly class).
DOI: 10.1109/escience.2018.00090
2018
Strategies for Modeling Extreme Luminosities in the CMS Simulation
The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.
2019
Searches for new physics with unconventional signatures at ATLAS and CMS
Selected results from searches for new physics with unconventional signatures using the ATLAS and CMS detectors are presented. Such signatures include emerging jets, heavy charged particles, displaced or delayed objects, and disappearing tracks. These signatures may arise from hidden sectors or supersymmetric models. The searches use proton-proton collision data from Run 2 of the LHC with a center-of-mass energy of 13 TeV.
DOI: 10.2172/1570209
2019
The Case for Columnar Analysis (a Two-Part Series) [PowerPoint]
This talk provides prologue terminology and technology. Part I covers analyzer experience including user experience, code samples, domain of applicability, and scalability. Part II discusses technical underpinnings including theoretical motivation, the Coffea framework, factorized data delivery, package ecosystem, performance, and future directions.
DOI: 10.2172/1592124
2019
Accelerated Machine Learning as a Service for Particle Physics Computing
Accelerated Machine Learning as a Service for Particle Physics Computing: • Amount and complexity of high-energy physics data increases dramatically from 2025 onward • Traditional algorithms will require too much CPU time • Machine learning can solve combinatorially-scaling problems in constant time, but must be fast enough
DOI: 10.2172/1570203
2019
Searches for new physics with unconventional signatures at ATLAS and CMS [PowerPoint]
Selected results from searches for new physics with unconventional signatures using the ATLAS and CMS detectors are presented. Such signatures include emerging jets, heavy charged particles, displaced or delayed objects, and disappearing tracks. These signatures may arise from hidden sectors or supersymmetric models. The searches use proton-proton collision data from Run 2 of the LHC with a center-of-mass energy of 13 TeV.
DOI: 10.5281/zenodo.2565840
2019
delphes/delphes: Delphes-3.4.2pre17
2019
FPGA-Accelerated Machine Learning Inference as a Service for Particle Physics Computing
2019
COFFEA - Columnar Object Framework For Effective Analysis [Slides]
2019
Integration and Performance of New Technologies in the CMS Simulation [Slides]
DOI: 10.5281/zenodo.3895029
2019
Accelerated Machine Learning as a Service for Particle Physics Computing
DOI: 10.48550/arxiv.1912.04180
2019
Searches for new physics with unconventional signatures at ATLAS and CMS
Selected results from searches for new physics with unconventional signatures using the ATLAS and CMS detectors are presented. Such signatures include emerging jets, heavy charged particles, displaced or delayed objects, and disappearing tracks. These signatures may arise from hidden sectors or supersymmetric models. The searches use proton-proton collision data from Run 2 of the LHC with a center-of-mass energy of 13 TeV.
2020
Coffea -- Columnar Object Framework For Effective Analysis
DOI: 10.48550/arxiv.1804.03983
2018
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
DOI: 10.2172/1781074
2021
Scaling Inference Using Triton to Accelerate Particle Physics at the LHC and DUNE
DUNE and the LHC experiments consist of unique and cutting-edge particle detectors that create massive, complex, and rich datasets with billions of events. They require sophisticated algorithms to reconstruct and interpret the data. Modern machine learning algorithms provide a powerful toolset to detect and classify particles, from familiar image processing convolutional neural networks to newer graph neural network architectures. A full reconstruction of these particle collisions requires novel approaches to handle the computing challenge of processing so much raw data. In a series of studies, physicists from Fermilab, CERN, and university groups explored how to accelerate their data processing using the Triton Inference Server.
DOI: 10.2172/1825304
2021
AI Acceleration of HEP Collider Simulation
its high-luminosity upgrade (HL-LHC), while the intensity frontier HEP research is focused on studies of neutrinos at the MW-scale beam power accelerator facilities, such as Fermilab Main Injector with the planned PIP-II SRF linac project. A number of next generation accelerator facilities have been proposed and are currently under consideration for the medium- and long-term future programs of accelerator-based HEP research. In this paper, we briefly review the post-LHC energy frontier options, both for lepton and hadron colliders in various regions of the world, as well as possible future intensity frontier accelerator facilities.
DOI: 10.2172/1838484
2021
Robustness of deep learning algorithms in astronomy - galaxy morphology studies
a one-pixel attack as a proxy for compression or telescope errors on performance of ResNet18 trained to distinguish between galaxies of different morphologies in LSST mock data. We also explore how domain adaptation techniques can help improve model robustness in case of this type of naturally occurring attacks and help scientists build more trustworthy and stable models.
DOI: 10.48550/arxiv.2112.14299
2021
DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification
With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.
DOI: 10.2172/1854798
2021
Denoising Convolutional Networks to Accelerate Detector Simulation [Poster]
The high accuracy of detector simulation is crucial for modern particle physics experiments. However, this accuracy comes with a high computational cost, which will be exacerbated by the large datasets and complex detector upgrades associated with next-generation facilities such as the High Luminosity LHC. We explore the viability of regression-based machine learning (ML) approaches using convolutional neural networks (CNN) to ``denoise'' faster, lower-quality detector simulations, augmenting them to produce a higher-quality final result with a reduced computational burden. The denoising CNN works in concert with classical detector simulation software rather than replacing it entirely, increasing its reliability compared to other ML approaches to simulation. We obtain promising results from a prototype based on photon showers in the CMS electromagnetic calorimeter. Future directions are also discussed.