ϟ

Nhan Viet Tran

Here are all the papers by Nhan Viet Tran that you can download and read on OA.mg.
Nhan Viet Tran’s last known institution is . Download Nhan Viet Tran PDFs here.

Claim this Profile →
DOI: 10.1088/1748-0221/13/07/p07027
2018
Cited 269 times
Fast inference of deep neural networks in FPGAs for particle physics
Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.
DOI: 10.1103/physrevd.81.075022
2010
Cited 244 times
Spin determination of single-produced resonances at hadron colliders
We study the production of a single resonance at the LHC and its decay into a pair of Z bosons. We demonstrate how full reconstruction of the final states allows us to determine the spin and parity of the resonance and restricts its coupling to vector gauge bosons. Full angular analysis is illustrated with the simulation of the production and decay chain including all spin correlations and the most general couplings of spin-zero, -one, and -two resonances to Standard Model matter and gauge fields. We note implications for analysis of a resonance decaying to other final states.
DOI: 10.1007/jhep10(2014)059
2014
Cited 237 times
Pileup per particle identification
We propose a new method for pileup mitigation by implementing “pileup per particle identification” (PUPPI). For each particle we first define a local shape α which probes the collinear versus soft diffuse structure in the neighborhood of the particle. The former is indicative of particles originating from the hard scatter and the latter of particles originating from pileup interactions. The distribution of α for charged pileup, assumed as a proxy for all pileup, is used on an event-by-event basis to calculate a weight for each particle. The weights describe the degree to which particles are pileup-like and are used to rescale their four-momenta, superseding the need for jet-based corrections. Furthermore, the algorithm flexibly allows combination with other, possibly experimental, probabilistic information associated with particles such as vertexing and timing performance. We demonstrate the algorithm improves over existing methods by looking at jet p T and jet mass. We also find an improvement on non-jet quantities like missing transverse energy.
DOI: 10.1103/physrevd.86.095031
2012
Cited 206 times
Spin and parity of a single-produced resonance at the LHC
The experimental determination of the properties of the newly discovered boson at the Large Hadron Collider is currently the most crucial task in high-energy physics. We show how information about the spin, parity, and, more generally, the tensor structure of the boson couplings can be obtained by studying angular and mass distributions of events in which the resonance decays to pairs of gauge bosons, $ZZ$, $WW$, and $\ensuremath{\gamma}\ensuremath{\gamma}$. A complete Monte Carlo simulation of the process $pp\ensuremath{\rightarrow}X\ensuremath{\rightarrow}VV\ensuremath{\rightarrow}4f$ is performed and verified by comparing it to an analytic calculation of the decay amplitudes $X\ensuremath{\rightarrow}VV\ensuremath{\rightarrow}4f$. Our studies account for all spin correlations and include general couplings of a spin $J=0$, 1, 2 resonance to Standard Model particles. We also discuss how to use angular and mass distributions of the resonance decay products for optimal background rejection. It is shown that by the end of the 8 TeV run of the LHC, it might be possible to separate extreme hypotheses of the spin and parity of the new boson with a confidence level of 99% or better for a wide range of models. We briefly discuss the feasibility of testing scenarios where the resonance is not a parity eigenstate.
DOI: 10.1103/revmodphys.91.045003
2019
Cited 171 times
Jet substructure at the Large Hadron Collider
Jet substructure has emerged to play a central role at the Large Hadron Collider, where it has provided numerous innovative ways to search for new physics and to probe the Standard Model, particularly in extreme regions of phase space. In this article we focus on a review of the development and use of state-of-the-art jet substructure techniques by the ATLAS and CMS experiments.
DOI: 10.1140/epjc/s10052-014-2792-8
2014
Cited 143 times
Boosted objects and jet substructure at the LHC. Report of BOOST2012, held at IFIC Valencia, 23rd–27th of July 2012
This report of the BOOST2012 workshop presents the results of four working groups that studied key aspects of jet substructure. We discuss the potential of first-principle QCD calculations to yield a precise description of the substructure of jets and study the accuracy of state-of-the-art Monte Carlo tools. Limitations of the experiments' ability to resolve substructure are evaluated, with a focus on the impact of additional (pile-up) proton proton collisions on jet substructure performance in future LHC operating scenarios. A final section summarizes the lessons learnt from jet substructure analyses in searches for new physics in the production of boosted top quarks.
DOI: 10.1103/physrevd.89.035007
2014
Cited 116 times
Constraining anomalous<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mi>H</mml:mi><mml:mi>V</mml:mi><mml:mi>V</mml:mi></mml:math>interactions at proton and lepton colliders
In this paper, we study the extent to which CP parity of a Higgs boson, and more generally its anomalous couplings to gauge bosons, can be measured at the LHC and a future electron-positron collider. We consider several processes, including Higgs boson production in gluon and weak boson fusion and production of a Higgs boson in association with an electroweak gauge boson. We consider decays of a Higgs boson including $ZZ, WW, \gamma \gamma$, and $Z \gamma$. Matrix element approach to three production and decay topologies is developed and applied in the analysis. A complete Monte Carlo simulation of the above processes at proton and $e^+e^-$ colliders is performed and verified by comparing it to an analytic calculation. Prospects for measuring various tensor couplings at existing and proposed facilities are compared.
DOI: 10.1140/epjc/s10052-015-3587-2
2015
Cited 112 times
Towards an understanding of the correlations in jet substructure
Over the past decade, a large number of jet substructure observables have been proposed in the literature, and explored at the LHC experiments. Such observables attempt to utilize the internal structure of jets in order to distinguish those initiated by quarks, gluons, or by boosted heavy objects, such as top quarks and W bosons. This report, originating from and motivated by the BOOST2013 workshop, presents original particle-level studies that aim to improve our understanding of the relationships between jet substructure observables, their complementarity, and their dependence on the underlying jet properties, particularly the jet radius and jet transverse momentum. This is explored in the context of quark/gluon discrimination, boosted W boson tagging and boosted top quark tagging.
DOI: 10.1007/jhep05(2016)156
2016
Cited 102 times
Thinking outside the ROCs: Designing Decorrelated Taggers (DDT) for jet substructure
We explore the scale-dependence and correlations of jet substructure observables to improve upon existing techniques in the identification of highly Lorentz-boosted objects. Modified observables are designed to remove correlations from existing theoretically well-understood observables, providing practical advantages for experimental measurements and searches for new phenomena. We study such observables in $W$ jet tagging and provide recommendations for observables based on considerations beyond signal and background efficiencies.
DOI: 10.1007/jhep09(2018)153
2018
Cited 86 times
M3: a new muon missing momentum experiment to probe (g − 2)μ and dark matter at Fermilab
A bstract New light, weakly-coupled particles are commonly invoked to address the persistent ∼ 4 σ anomaly in ( g −2) μ and serve as mediators between dark and visible matter. If such particles couple predominantly to heavier generations and decay invisibly, much of their best-motivated parameter space is inaccessible with existing experimental techniques. In this paper, we present a new fixed-target, missing-momentum search strategy to probe invisibly decaying particles that couple preferentially to muons. In our setup, a relativistic muon beam impinges on a thick active target. The signal consists of events in which a muon loses a large fraction of its incident momentum inside the target without initiating any detectable electromagnetic or hadronic activity in downstream veto systems. We propose a two-phase experiment, M 3 (Muon Missing Momentum), based at Fermilab. Phase 1 with ∼ 10 10 muons on target can test the remaining parameter space for which light invisibly-decaying particles can resolve the ( g − 2) μ anomaly, while Phase 2 with ∼ 10 13 muons on target can test much of the predictive parameter space over which sub-GeV dark matter achieves freeze-out via muon-philic forces, including gauged U (1) Lμ − Lτ .
DOI: 10.1088/2632-2153/aba042
2020
Cited 60 times
Compressing deep neural networks on FPGAs to binary and ternary precision with <tt>hls4ml</tt>
We present the implementation of binary and ternary neural networks in the hls4ml library, designed to automatically convert deep neural network models to digital circuits with field-programmable gate arrays (FPGA) firmware. Starting from benchmark models trained with floating point precision, we investigate different strategies to reduce the network's resource consumption by reducing the numerical precision of the network parameters to binary or ternary. We discuss the trade-off between model accuracy and resource consumption. In addition, we show how to balance between latency and accuracy by retaining full precision on a selected subset of network components. As an example, we consider two multiclass classification tasks: handwritten digit recognition with the MNIST data set and jet identification with simulated proton-proton collisions at the CERN Large Hadron Collider. The binary and ternary implementation has similar performance to the higher precision implementation while using drastically fewer FPGA resources.
DOI: 10.1088/2632-2153/ac0ea1
2021
Cited 53 times
Fast convolutional neural networks on FPGAs with hls4ml
Abstract We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µ s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.
DOI: 10.3389/fdata.2020.598927
2021
Cited 41 times
Distance-Weighted Graph Neural Networks on FPGAs for Real-Time Particle Reconstruction in High Energy Physics
Graph neural networks have been shown to achieve excellent performance for several crucial tasks in particle physics, such as charged particle tracking, jet tagging, and clustering. An important domain for the application of these networks is the FGPA-based first layer of real-time data filtering at the CERN Large Hadron Collider, which has strict latency and resource constraints. We discuss how to design distance-weighted graph networks that can be executed with a latency of less than 1$\mu\mathrm{s}$ on an FPGA. To do so, we consider a representative task associated to particle reconstruction and identification in a next-generation calorimeter operating at a particle collider. We use a graph network architecture developed for such purposes, and apply additional simplifications to match the computing constraints of Level-1 trigger systems, including weight quantization. Using the $\mathtt{hls4ml}$ library, we convert the compressed models into firmware to be implemented on an FPGA. Performance of the synthesized models is presented both in terms of inference accuracy and resource usage.
DOI: 10.3389/fdata.2022.787421
2022
Cited 24 times
Applications and Techniques for Fast Machine Learning in Science
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science-the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.
DOI: 10.1088/1361-6668/acbb34
2023
Cited 12 times
Roadmap on artificial intelligence and big data techniques for superconductivity
Abstract This paper presents a roadmap to the application of AI techniques and big data (BD) for different modelling, design, monitoring, manufacturing and operation purposes of different superconducting applications. To help superconductivity researchers, engineers, and manufacturers understand the viability of using AI and BD techniques as future solutions for challenges in superconductivity, a series of short articles are presented to outline some of the potential applications and solutions. These potential futuristic routes and their materials/technologies are considered for a 10–20 yr time-frame.
DOI: 10.1016/s2215-0366(23)00370-x
2024
Cited 3 times
The WHO Mental Health Gap Action Programme for mental, neurological, and substance use conditions: the new and updated guideline recommendations
<h2>Summary</h2> The WHO Mental Health Gap Action Programme (mhGAP) guideline update reflects 15 years of investment in reducing the treatment gap and scaling up care for people with mental, neurological, and substance use (MNS) conditions. It was produced by a guideline development group and steering group, with support from topic experts, using quantitative and qualitative evidence and a systematic review of use of mhGAP. 90 recommendations from the 2015 guideline update were validated and endorsed for use in their current format. These are joined by 30 revised recommendations and 18 new recommendations, including a new module on anxiety. Psychological interventions are emphasised as treatments and digitally delivered interventions feature across many modules, as well as updated recommendations for psychotropic medicines. Research gaps identified include the need for evidence from low-resource settings and on the views of people with lived experience of MNS conditions. The revised recommendations ensure that mhGAP continues to offer high-quality, timely, transparent, and evidence-based guidance to support non-specialist health workers in low-income and middle-income countries in providing care to individuals with MNS conditions.
2013
Cited 74 times
Handbook of LHC Higgs Cross Sections: 3. Higgs Properties
This Report summarizes the results of the activities in 2012 and the first half of 2013 of the LHC Higgs Cross Section Working Group. The main goal of the working group was to present the state of the art of Higgs Physics at the LHC, integrating all new results that have appeared in the last few years. This report follows the first working group report Handbook of LHC Higgs Cross Sections: 1. Inclusive Observables (CERN-2011-002) and the second working group report Handbook of LHC Higgs Cross Sections: 2. Differential Distributions (CERN-2012-002). After the discovery of a Higgs boson at the LHC in mid-2012 this report focuses on refined prediction of Standard Model (SM) Higgs phenomenology around the experimentally observed value of 125-126 GeV, refined predictions for heavy SM-like Higgs bosons as well as predictions in the Minimal Supersymmetric Standard Model and first steps to go beyond these models. The other main focus is on the extraction of the characteristics and properties of the newly discovered particle such as couplings to SM particles, spin and CP-quantum numbers etc.
DOI: 10.1007/s41781-019-0027-2
2019
Cited 43 times
FPGA-Accelerated Machine Learning Inference as a Service for Particle Physics Computing
Large-scale particle physics experiments face challenging demands for high-throughput computing resources both now and in the future. New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning algorithms in particle physics for simulation, reconstruction, and analysis are naturally deployed on such platforms. We demonstrate that the acceleration of machine learning inference as a web service represents a heterogeneous computing solution for particle physics experiments that potentially requires minimal modification to the current computing model. As examples, we retrain the ResNet-50 convolutional neural network to demonstrate state-of-the-art performance for top quark jet tagging at the LHC and apply a ResNet-50 model with transfer learning for neutrino event classification. Using Project Brainwave by Microsoft to accelerate the ResNet-50 image classification model, we achieve average inference times of 60 (10) ms with our experimental physics software framework using Brainwave as a cloud (edge or on-premises) service, representing an improvement by a factor of approximately 30 (175) in model inference latency over traditional CPU inference in current experimental hardware. A single FPGA service accessed by many CPUs achieves a throughput of 600–700 inferences per second using an image batch of one, comparable to large batch-size GPU throughput and significantly better than small batch-size GPU throughput. Deployed as an edge or cloud service for the particle physics computing model, coprocessor accelerators can have a higher duty cycle and are potentially much more cost-effective.
DOI: 10.1088/1748-0221/15/05/p05026
2020
Cited 41 times
Fast inference of Boosted Decision Trees in FPGAs for particle physics
We describe the implementation of Boosted Decision Trees in the hls4ml library, which allows the translation of a trained model into FPGA firmware through an automated conversion process. Thanks to its fully on-chip implementation, hls4ml performs inference of Boosted Decision Tree models with extremely low latency. With a typical latency less than 100 ns, this solution is suitable for FPGA-based real-time processing, such as in the Level-1 Trigger system of a collider experiment. These developments open up prospects for physicists to deploy BDTs in FPGAs for identifying the origin of jets, better reconstructing the energies of muons, and enabling better selection of rare signal processes.
DOI: 10.48550/arxiv.2003.11603
2020
Cited 39 times
Graph Neural Networks for Particle Reconstruction in High Energy Physics detectors
Pattern recognition problems in high energy physics are notably different from traditional machine learning applications in computer vision. Reconstruction algorithms identify and measure the kinematic properties of particles produced in high energy collisions and recorded with complex detector systems. Two critical applications are the reconstruction of charged particle trajectories in tracking detectors and the reconstruction of particle showers in calorimeters. These two problems have unique challenges and characteristics, but both have high dimensionality, high degree of sparsity, and complex geometric layouts. Graph Neural Networks (GNNs) are a relatively new class of deep learning architectures which can deal with such data effectively, allowing scientists to incorporate domain knowledge in a graph structure and learn powerful representations leveraging that structure to identify patterns of interest. In this work we demonstrate the applicability of GNNs to these two diverse particle reconstruction problems.
DOI: 10.1109/tns.2021.3087100
2021
Cited 29 times
A Reconfigurable Neural Network ASIC for Detector Front-End Data Compression at the HL-LHC
Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications.
DOI: 10.1007/jhep01(2013)182
2013
Cited 38 times
Scrutinizing the Higgs signal and background in the 2e2μ golden channel
A bstract Kinematic distributions in the decays of the newly discovered resonance to four leptons are a powerful probe of the tensor structure of its couplings to electroweak gauge bosons. We present analytic calculations for both signal and background of the fully differential cross section for the ‘Golden Channel’ e + e − μ + μ − final state. We include all interference effects between intermediate gauge bosons and allow them to be on- or off-shell. For the signal we compute the fully differential decay width for general scalar couplings to ZZ ,γγ,and Z γ. For the background we compute the leading order fully differential cross section for q q annihilation into Z and γ gauge bosons, including the contribution from the resonant Z → 2 e 2 μ process. We also present singly and doubly differential projections and study the interference effects on the differential spectra. These expressions can be used in a variety of ways to uncover the nature of the newly discovered resonance or any new scalars decaying to neutral gauge bosons which might be discovered in the future.
DOI: 10.1103/physrevaccelbeams.24.104601
2021
Cited 19 times
Real-time artificial intelligence for accelerator control: A study at the Fermilab Booster
We describe a method for precisely regulating the gradient magnet power supply (GMPS) at the Fermilab Booster accelerator complex using a neural network trained via reinforcement learning. We demonstrate preliminary results by training a surrogate machine-learning model on real accelerator data to emulate the GMPS, and using this surrogate model in turn to train the neural network for its regulation task. We additionally show how the neural networks to be deployed for control purposes may be compiled to execute on field-programmable gate arrays (FPGAs), and show the first machine-learning based control algorithm implemented on an FPGA for controls at the Fermilab accelerator complex. As there are no surprise latencies on an FPGA, this capability is important for operational stability in complicated environments such as an accelerator facility.
DOI: 10.1016/s2214-109x(22)00202-9
2022
Cited 11 times
Urban design is key to healthy environments for all
Rapidly increasing urbanisation along with ageing populations, climate change, environmental degradation, COVID-19, and other pandemics present substantial challenges for people living in cities and other communities. The capacity to identify and respond to urban challenges related to health, equity, and sustainability varies greatly across national and subnational governments around the globe, because of the available human and financial resources, structures of governance and participation, and existing policy frameworks, which are all important determinants of healthy and sustainable urban environments.
DOI: 10.1088/2632-2153/abec21
2021
Cited 16 times
GPU coprocessors as a service for deep learning inference in high energy physics
In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two issues will confront one another as the collider is upgraded for high luminosity running. Alternative processors such as graphics processing units (GPUs) can resolve this confrontation provided that algorithms can be sufficiently accelerated. In many cases, algorithmic speedups are found to be largest through the adoption of deep learning algorithms. We present a comprehensive exploration of the use of GPU-based hardware acceleration for deep learning inference within the data reconstruction workflow of high energy physics. We present several realistic examples and discuss a strategy for the seamless integration of coprocessors so that the LHC can maintain, if not exceed, its current performance throughout its running.
DOI: 10.3389/frai.2021.676564
2021
Cited 16 times
Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference
Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term quantization-aware pruning , and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.
DOI: 10.3389/fdata.2020.604083
2021
Cited 15 times
GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments
Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution.
DOI: 10.1103/physrevd.101.053004
2020
Cited 16 times
Lepton-nucleus cross section measurements for DUNE with the LDMX detector
We point out that the LDMX (Light Dark Matter eXperiment) detector design, conceived to search for sub-GeV dark matter, will also have very advantageous characteristics to pursue electron-nucleus scattering measurements of direct relevance to the neutrino program at DUNE and elsewhere. These characteristics include a 4-GeV electron beam, a precision tracker, electromagnetic and hadronic calorimeters with near $2\ensuremath{\pi}$ azimuthal acceptance from the forward beam axis out to $\ensuremath{\sim}40\ifmmode^\circ\else\textdegree\fi{}$ angle, and low reconstruction energy threshold. LDMX thus could provide (semi)exclusive cross section measurements, with detailed information about final-state electrons, pions, protons, and neutrons. We compare the predictions of two widely used neutrino generators (genie, gibuu) in the LDMX region of acceptance to illustrate the large modeling discrepancies in electron-nucleus interactions at DUNE-like kinematics. We argue that discriminating between these predictions is well within the capabilities of the LDMX detector.
DOI: 10.48550/arxiv.2012.01563
2020
Cited 15 times
Accelerated Charged Particle Tracking with Graph Neural Networks on FPGAs
We develop and study FPGA implementations of algorithms for charged particle tracking based on graph neural networks. The two complementary FPGA designs are based on OpenCL, a framework for writing programs that execute across heterogeneous platforms, and hls4ml, a high-level-synthesis-based compiler for neural network to firmware conversion. We evaluate and compare the resource usage, latency, and tracking performance of our implementations based on a benchmark dataset. We find a considerable speedup over CPU-based execution is possible, potentially enabling such algorithms to be used effectively in future computing workflows and the FPGA-based Level-1 trigger at the CERN Large Hadron Collider.
DOI: 10.1103/physrevd.107.116026
2023
New searches for muonphilic particles at proton beam dump spectrometers
We introduce a new search strategy for visibly decaying muonphilic particles using a proton beam spectrometer modeled after the SpinQuest experiment at Fermilab. In this setup, a $\ensuremath{\sim}100\text{ }\text{ }\mathrm{GeV}$ primary proton beam impinges on a thick fixed target and yields a secondary muon beam. As these muons traverse the target material, they scatter off nuclei and can radiatively produce hypothetical muonphilic particles as initial- and final-state radiation. If such new states decay to dimuons, their combined invariant mass can be measured with a downstream spectrometer immersed in a Tesla-scale magnetic field. For a representative setup with $3\ifmmode\times\else\texttimes\fi{}{10}^{14}$ muons on target with typical energies of $\ensuremath{\sim}20\text{ }\text{ }\mathrm{GeV}$, a 15% invariant mass resolution, and an effective 100 cm target length, this strategy can probe the entire parameter space for which $\ensuremath{\sim}200\text{ }\text{ }\mathrm{MeV}\ensuremath{-}\mathrm{GeV}$ scalar particles resolve the muon $g\ensuremath{-}2$ anomaly. We present sensitivity to these scalar particles at the SpinQuest experiment where no additional hardware is needed and the search could be parasitically executed within the primary nuclear physics program. Future proton beam dump experiments with optimized beam and detector configurations could have even greater sensitivity.
DOI: 10.1007/s41781-024-00113-4
2024
Artificial Intelligence for the Electron Ion Collider (AI4EIC)
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
DOI: 10.1016/j.ceramint.2023.11.099
2024
Oxygen vacancy-activated thermoelectric properties of ZnO ceramics
Understanding the structural and thermoelectric (TE) properties of a pure material is essential for developing a practical approach to improve its TE performance. This study focuses on the high-temperature TE properties of ZnO ceramics synthesized by the spark plasma sintering (SPS) technique. The analyses of crystallinity and microstructure along with photoluminescence and Raman spectroscopy indicated an increase in oxygen vacancy in the ZnO ceramics with SPS temperature, which resulted in unit-cell shrinkage, lattice modification, grain densification, and TE modification. Specifically, oxygen vacancies were found to be a crucial factor affecting the TE performance of the spark-plasma-sintered ZnO ceramics at temperatures exceeding 773 K. Oxygen vacancies can act as carrier donors when they are ionized, contributing to an increase in electrical conductivity and a decrease in thermal conductivity. Additionally, this study proposes a potential way to engineer native defects in ZnO ceramics by controlling the SPS process.
DOI: 10.48550/arxiv.2401.08777
2024
Robust Anomaly Detection for Particle Physics Using Multi-Background Representation Learning
Anomaly, or out-of-distribution, detection is a promising tool for aiding discoveries of new particles or processes in particle physics. In this work, we identify and address two overlooked opportunities to improve anomaly detection for high-energy physics. First, rather than train a generative model on the single most dominant background process, we build detection algorithms using representation learning from multiple background types, thus taking advantage of more information to improve estimation of what is relevant for detection. Second, we generalize decorrelation to the multi-background setting, thus directly enforcing a more complete definition of robustness for anomaly detection. We demonstrate the benefit of the proposed robust multi-background anomaly detection algorithms on a high-dimensional dataset of particle decays at the Large Hadron Collider.
DOI: 10.2172/2282589
2024
Smart pixel sensors Towards on-sensor filtering of pixel clusters with deep learning
High granularity silicon pixel sensors are at the heart of energy frontier particle physics collider experiments. At an collision rate of 40\,MHz, these detectors create massive amounts of data. Signal processing that handles data incoming at those rate and intelligently reduces the data within the pixelated region of the detector \textit{at rate} will enhance physics performance and enable physics analyses that are not currently possible. Using the shape of charge clusters deposited in an array of small pixels, the physical properties of the traversing particle can be extracted with locally customized neural networks. In this first work, we present a neural network that can be embedded into the on-sensor readout and filter out hits from low momentum tracks, reducing the detector's data volume by 54.4-75.4\%. The network is designed and simulated as a custom readout integrated circuit with 28\,nm CMOS technology and is expected to operate at less than 300\,$\mu W$ with an area of less than 0.2\,mm$^2$.
DOI: 10.1007/s10934-024-01559-y
2024
Post-synthesis of curcumin-embedded zeolitic imidazole framework for copper ions detection
DOI: 10.1088/1748-0221/19/02/c02066
2024
A demonstrator for a real-time AI-FPGA-based triggering system for sPHENIX at RHIC
Abstract The RHIC interaction rate at sPHENIX will reach around 3 MHz in pp collisions and requires the detector readout to reject events by a factor of over 200 to fit the DAQ bandwidth of 15 kHz. Some critical measurements, such as heavy flavor production in pp collisions, often require the analysis of particles produced at low momentum. This prohibits adopting the traditional approach, where data rates are reduced through triggering on rare high momentum probes. We explore a new approach based on real-time AI technology, adopt an FPGA-based implementation using a custom designed FELIX-712 board with the Xilinx Kintex Ultrascale FPGA, and deploy the system in the detector readout electronics loop for real-time trigger decision.
2024
ACE Science Workshop Report
We summarize the Fermilab Accelerator Complex Evolution (ACE) Science Workshop, held on June 14-15, 2023. The workshop presented the strategy for the ACE program in two phases: ACE Main Injector Ramp and Target (MIRT) upgrade and ACE Booster Replacement (BR) upgrade. Four plenary sessions covered the primary experimental physics thrusts: Muon Collider, Neutrinos, Charged Lepton Flavor Violation, and Dark Sectors. Additional physics and technology ideas were presented from the community that could expand or augment the ACE science program. Given the physics framing, a parallel session at the workshop was dedicated to discussing priorities for accelerator R\&D. Finally, physics discussion sessions concluded the workshop where experts from the different experimental physics thrusts were brought together to begin understanding the synergies between the different physics drivers and technologies. In December of 2023, the P5 report was released setting the physics priorities for the field in the next decade and beyond, and identified ACE as an important component of the future US accelerator-based program. Given the presentations and discussions at the ACE Science Workshop and the findings of the P5 report, we lay out the topics for study to determine the physics priorities and design goals of the Fermilab ACE project in the near-term.
DOI: 10.48550/arxiv.2403.08980
2024
Architectural Implications of Neural Network Inference for High Data-Rate, Low-Latency Scientific Applications
With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.
DOI: 10.1088/1748-0221/13/01/t01003
2018
Cited 14 times
The importance of calorimetry for highly-boosted jet substructure
Jet substructure techniques are playing an essential role in exploring the TeV scale at the Large Hadron Collider (LHC), since they facilitate the efficient reconstruction and identification of highly-boosted objects. Both for the LHC and for future colliders, there is a growing interest in using jet substructure methods based only on charged-particle information. The reason is that silicon-based tracking detectors offer excellent granularity and precise vertexing, which can improve the angular resolution on highly-collimated jets and mitigate the impact of pileup. In this paper, we assess how much jet substructure performance degrades by using track-only information, and we demonstrate physics contexts in which calorimetry is most beneficial. Specifically, we consider five different hadronic final states—W bosons, Z bosons, top quarks, light quarks, gluons—and test the pairwise discrimination power with a multi-variate combination of substructure observables. In the idealized case of perfect reconstruction, we quantify the loss in discrimination performance when using just charged particles compared to using all detected particles. We also consider the intermediate case of using charged particles plus photons, which provides valuable information about neutral pions. In the more realistic case of a segmented calorimeter, we assess the potential performance gains from improving calorimeter granularity and resolution, comparing a CMS-like detector to more ambitious future detector concepts. Broadly speaking, we find large performance gains from neutral-particle information and from improved calorimetry in cases where jet mass resolution drives the discrimination power, whereas the gains are more modest if an absolute mass scale calibration is not required.
DOI: 10.3390/jlpea8030025
2018
Cited 12 times
Multi-Vdd Design for Content Addressable Memories (CAM): A Power-Delay Optimization Analysis
In this paper, we characterize the interplay between power consumption and performance of a matchline-based Content Addressable Memory and then propose the use of a multi-Vdd design to save power and increase post-fabrication tunability. Exploration of the power consumption behavior of a CAM chip shows the drastically different behavior among the components and suggests the use of different and independent power supplies. The complete design, simulation and testing of a multi-Vdd CAM chip along with an exploration of the multi-Vdd design space are presented. Our analysis has been applied to simulated models on two different technology nodes (130 nm and 45 nm), followed by experiments on a 246-kb test chip fabricated in 130 nm Global Foundries Low Power CMOS technology. The proposed design, operating at an optimal operating point in a triple-Vdd configuration, increases the power-delay operation range by 2.4 times and consumes 25.3% less dynamic power when compared to a conventional single-Vdd design operating over the same voltage range with equivalent noise margin. Our multi-Vdd design also helps save 51.3% standby power. Measurement results from the test chip combined with the simulation analysis at the two nodes validate our thesis.
DOI: 10.1007/jhep04(2020)003
2020
Cited 10 times
A high efficiency photon veto for the Light Dark Matter eXperiment
Fixed-target experiments using primary electron beams can be powerful discovery tools for light dark matter in the sub-GeV mass range. The Light Dark Matter eXperiment (LDMX) is designed to measure missing momentum in high-rate electron fixed-target reactions with beam energies of 4 GeV to 16 GeV. A prerequisite for achieving several important sensitivity milestones is the capability to efficiently reject backgrounds associated with few-GeV bremsstrahlung, by twelve orders of magnitude, while maintaining high efficiency for signal. The primary challenge arises from events with photo-nuclear reactions faking the missing-momentum property of a dark matter signal. We present a methodology developed for the LDMX detector concept that is capable of the required rejection. By employing a detailed Geant4-based model of the detector response, we demonstrate that the sampling calorimetry proposed for LDMX can achieve better than 10−13 rejection of few-GeV photons. This suggests that the luminosity-limited sensitivity of LDMX can be realized at 4 GeV and higher beam energies.
DOI: 10.1007/jhep08(2016)038
2016
Cited 10 times
Dissecting jets and missing energy searches using n-body extended simplified models
Simplified Models are a useful way to characterize new physics scenarios for the LHC. Particle decays are often represented using non-renormalizable operators that involve the minimal number of fields required by symmetries. Generalizing to a wider class of decay operators allows one to model a variety of final states. This approach, which we dub the n-body extension of Simplified Models, provides a unifying treatment of the signal phase space resulting from a variety of signals. In this paper, we present the first application of this framework in the context of multijet plus missing energy searches. The main result of this work is a global performance study with the goal of identifying which set of observables yields the best discriminating power against the largest Standard Model backgrounds for a wide range of signal jet multiplicities. Our analysis compares combinations of one, two and three variables, placing emphasis on the enhanced sensitivity gain resulting from non-trivial correlations. Utilizing boosted decision trees, we compare and classify the performance of missing energy, energy scale and energy structure observables. We demonstrate that including an observable from each of these three classes is required to achieve optimal performance. This work additionally serves to establish the utility of n-body extended Simplified Models as a diagnostic for unpacking the relative merits of different search strategies, thereby motivating their application to new physics signatures beyond jets and missing energy.
DOI: 10.1109/h2rc51942.2020.00010
2020
Cited 10 times
FPGAs-as-a-Service Toolkit (FaaST)
Computing needs for high energy physics are already intensive and are expected to increase drastically in the coming years. In this context, heterogeneous computing, specifically as-a-service computing, has the potential for significant gains over traditional computing models. Although previous studies and packages in the field of heterogeneous computing have focused on GPUs as accelerators, FPGAs are an extremely promising option as well. A series of workflows are developed to establish the performance capabilities of FPGAs as a service. Multiple different devices and a range of algorithms for use in high energy physics are studied. For a small, dense network, the throughput can be improved by an order of magnitude with respect to GPUs as a service. For large convolutional networks, the throughput is found to be comparable to GPUs as a service. This work represents the first open-source FPGAs-as-a-service toolkit.
DOI: 10.1088/1748-0221/12/06/p06009
2017
Cited 10 times
Initial performance studies of a general-purpose detector for multi-TeV physics at a 100 TeV<i>pp</i>collider
This paper describes simulations of detector response to multi-TeV particles and jets at the Future Circular Collider (FCC-hh) or Super proton-proton Collider (SppC) which aim to collide proton beams with a centre-of-mass energy of 100 TeV. The unprecedented energy regime of these future experiments imposes new requirements on detector technologies which can be studied using the detailed GEANT4 simulations presented in this paper. The initial performance of a detector designed for physics studies at the FCC-hh or SppC experiments is described with an emphasis on measurements of single particles up to 33 TeV in transverse momentum. The reconstruction of hadronic jets has also been studied in the transverse momentum range from 50 GeV to 26 TeV. The granularity requirements for calorimetry are investigated using the two-particle spatial resolution achieved for hadron showers.
2018
Cited 10 times
Light Dark Matter eXperiment (LDMX)
We present an initial design study for LDMX, the Light Dark Matter Experiment, a small-scale accelerator experiment having broad sensitivity to both direct dark matter and mediator particle production in the sub-GeV mass region. LDMX employs missing momentum and energy techniques in multi-GeV electro-nuclear fixed-target collisions to explore couplings to electrons in uncharted regions that extend down to and below levels that are motivated by direct thermal freeze-out mechanisms. LDMX would also be sensitive to a wide range of visibly and invisibly decaying dark sector particles, thereby addressing many of the science drivers highlighted in the 2017 US Cosmic Visions New Ideas in Dark Matter Community Report. LDMX would achieve the required sensitivity by leveraging existing and developing detector technologies from the CMS, HPS and Mu2e experiments. In this paper, we present our initial design concept, detailed GEANT-based studies of detector performance, signal and background processes, and a preliminary analysis approach. We demonstrate how a first phase of LDMX could expand sensitivity to a variety of light dark matter, mediator, and millicharge particles by several orders of magnitude in coupling over the broad sub-GeV mass range.
DOI: 10.1088/1748-0221/10/02/c02029
2015
Cited 9 times
Design and testing of the first 2D Prototype Vertically Integrated Pattern Recognition Associative Memory
An associative memory-based track finding approach has been proposed for a Level 1 tracking trigger to cope with increasing luminosities at the LHC. The associative memory uses a massively parallel architecture to tackle the intrinsically complex combinatorics of track finding algorithms, thus avoiding the typical power law dependence of execution time on occupancy and solving the pattern recognition in times roughly proportional to the number of hits. This is of crucial importance given the large occupancies typical of hadronic collisions. The design of an associative memory system capable of dealing with the complexity of HL-LHC collisions and with the short latency required by Level 1 triggering poses significant, as yet unsolved, technical challenges. For this reason, an aggressive R&D program has been launched at Fermilab to advance state of-the-art associative memory technology, the so called VIPRAM (Vertically Integrated Pattern Recognition Associative Memory) project. The VIPRAM leverages emerging 3D vertical integration technology to build faster and denser Associative Memory devices. The first step is to implement in conventional VLSI the associative memory building blocks that can be used in 3D stacking; in other words, the building blocks are laid out as if it is a 3D design. In this paper, we report on the first successful implementation of a 2D VIPRAM demonstrator chip (protoVIPRAM00). The results show that these building blocks are ready for 3D stacking.
DOI: 10.1140/epjc/s10052-022-11083-5
2023
Semi-supervised graph neural networks for pileup noise removal
Abstract The high instantaneous luminosity of the CERN Large Hadron Collider leads to multiple proton–proton interactions in the same or nearby bunch crossings (pileup). Advanced pileup mitigation algorithms are designed to remove this noise from pileup particles and improve the performance of crucial physics observables. This study implements a semi-supervised graph neural network for particle-level pileup noise removal, by identifying individual particles produced from pileup. The graph neural network is firstly trained on charged particles with known labels, which can be obtained from detector measurements on data or simulation, and then inferred on neutral particles for which such labels are missing. This semi-supervised approach does not depend on the neutral particle pileup label information from simulation, and thus allows us to perform training directly on experimental data. The performance of this approach is found to be consistently better than widely-used domain algorithms and comparable to the fully-supervised training using simulation truth information. The study serves as the first attempt at applying semi-supervised learning techniques to pileup mitigation, and opens up a new direction of fully data-driven machine learning pileup mitigation studies.
DOI: 10.1016/j.nima.2023.168665
2023
In-pixel AI for lossy data compression at source for X-ray detectors
Integrating neural networks for data compression directly in the Read-Out Integrated Circuits (ROICs), i.e. the pixelated front-end, would result in a significant reduction in off-chip data transfer, overcoming the I/O bottleneck. Our ROIC test chip (AI-In-Pixel-65) is designed in a 65 nm Low Power CMOS process for the readout of pixelated X-ray detectors. Each pixel consists of an analog front-end for signal processing and a 10b analog-to-digital converter operating at 100KSPS. We compare two non-reconfigurable techniques, Principal Component Analysis (PCA) and an AutoEncoder (AE) as lossy data compression engines implemented within the pixelated area. The PCA algorithm achieves 50× compression, adds one clock cycle latency, and results in a 21% increase in the pixel area. The AE achieves 70× compression, adds 30 clock cycle latency, and results in a similar area increase.
DOI: 10.1088/1742-6596/898/7/072012
2017
Cited 8 times
Big Data in HEP: A comprehensive use case study
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity.
DOI: 10.3390/pr7110789
2019
Cited 8 times
The Synthesis of N-(Pyridin-2-yl)-Benzamides from Aminopyridine and Trans-Beta-Nitrostyrene by Fe2Ni-BDC Bimetallic Metal–Organic Frameworks
A bimetallic metal–organic framework material, which was generated by bridging iron (III) cations and nickel (II) cations with 1,4-Benzenedicarboxylic anions (Fe2Ni-BDC), was synthesized by a solvothermal approach using nickel (II) nitrate hexahydrate and iron (III) chloride hexahydrate as the mixed metal source and 1,4-Benzenedicarboxylic acid (H2BDC) as the organic ligand source. The structure of samples was determined by X-ray powder diffraction (XRD), Fourier transform infrared spectroscopy (FT-IR), Raman spectroscopy, and nitrogen physisorption measurements. The catalytic activity and recyclability of the Fe2Ni-BDC catalyst for the Michael addition amidation reaction of 2-aminopyridine and nitroolefins were estimated. The results illustrated that the Fe2Ni-BDC catalyst demonstrated good efficiency in the reaction under optimal conditions. Based on these results, a reaction mechanism was proposed. When the molar ratio of 2-aminopyridine and trans-β-nitrostyrene was 1:1, and the solvent was dichloromethane, the isolated yield of pyridyl benzamide reached 82%; at 80 °C over 24 h. The catalyst can be reused without a substantial reduction in catalytic activity with 77% yield after six times of reuse.
DOI: 10.1016/j.nima.2012.10.011
2013
Cited 8 times
Test-beam studies of diamond sensors for SLHC
Abstract Diamond sensors are studied as an alternative to silicon sensors to withstand the high radiation doses that are expected in future upgrades of the pixel detectors for the SLHC. Diamond pixel sensors are intrinsically radiation hard and are considered as a possible solution for the innermost tracker layers close to the interaction point where current silicon sensors cannot cope with the harsh radiation environment.An effort to study possible candidates for the upgrades is undergoing using the Fermilab test-beam facility (FTBF), where diamonds and 3D silicon sensors have been studied. Using a CMS pixel-based telescope built and installed at the FTBF, we are studying charge collection efficiencies for un-irradiated and irradiated devices bump-bonded to the CMS PSI46 pixel readout chip. A description of the test-beam effort and preliminary results on diamond sensors will be presented.
DOI: 10.1109/tasc.2021.3058229
2021
Cited 6 times
Intelliquench: An Adaptive Machine Learning System for Detection of Superconducting Magnet Quenches
In superconducting magnets, the irreversible transition of a portion of the conductor to resistive state is called a “quench.” Having large stored energy, magnets can be damaged by quenches due to localized heating, high voltage, or large force transients. Unfortunately, current quench protection systems can only detect a quench after it happens, and mitigating risks in Low Temperature Superconducting (LTS) accelerator magnets often requires fast response (down to ms). Additionally, protection of High Temperature Superconducting (HTS) magnets is still suffering from prohibitively slow quench detection. In this study, we lay the groundwork for a quench prediction system using an auto-encoder fully-connected deep neural network. After dynamically trained with data features extracted from acoustic sensors around the magnet, the system detects anomalous events seconds before the quench in most of our data. While the exact nature of the events is under investigation, we show that the system can “forecast” a quench before it happens under magnet training conditions through a randomized experiment. This opens up the way of integrated data processing, potentially leading to faster and better diagnostics and detection of magnet quenches.
DOI: 10.5281/zenodo.3602260
2020
Cited 6 times
HLS4ML LHC Jet dataset (150 particles)
Dataset of high-pT jets from simulations of LHC proton-proton collisions Prepared for FastML/HLS4ML studies: https://fastmachinelearning.org Includes: High level features (see https://arxiv.org/abs/1804.06913) Images: jet images with up to 150 particles/jet (see https://arxiv.org/abs/1908.05318) List: list of jet features with up to 150 particles/jet (see https://arxiv.org/abs/1908.05318)
DOI: 10.48550/arxiv.2204.13223
2022
Cited 3 times
Smart sensors using artificial intelligence for on-detector electronics and ASICs
Cutting edge detectors push sensing technology by further improving spatial and temporal resolution, increasing detector area and volume, and generally reducing backgrounds and noise. This has led to a explosion of more and more data being generated in next-generation experiments. Therefore, the need for near-sensor, at the data source, processing with more powerful algorithms is becoming increasingly important to more efficiently capture the right experimental data, reduce downstream system complexity, and enable faster and lower-power feedback loops. In this paper, we discuss the motivations and potential applications for on-detector AI. Furthermore, the unique requirements of particle physics can uniquely drive the development of novel AI hardware and design tools. We describe existing modern work for particle physics in this area. Finally, we outline a number of areas of opportunity where we can advance machine learning techniques, codesign workflows, and future microelectronics technologies which will accelerate design, performance, and implementations for next generation experiments.
DOI: 10.48550/arxiv.2206.07527
2022
Cited 3 times
QONNX: Representing Arbitrary-Precision Quantized Neural Networks
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.
DOI: 10.3389/fphy.2022.897719
2022
Cited 3 times
Jets and Jet Substructure at Future Colliders
Even though jet substructure was not an original design consideration for the Large Hadron Collider (LHC) experiments, it has emerged as an essential tool for the current physics program. We examine the role of jet substructure on the motivation for and design of future energy Frontier colliders. In particular, we discuss the need for a vibrant theory and experimental research and development program to extend jet substructure physics into the new regimes probed by future colliders. Jet substructure has organically evolved with a close connection between theorists and experimentalists and has catalyzed exciting innovations in both communities. We expect such developments will play an important role in the future energy Frontier physics program.
DOI: 10.48550/arxiv.2206.11791
2022
Cited 3 times
Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark
We present our development experience and recent results for the MLPerf Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms. We use the open-source hls4ml and FINN workflows, which aim to democratize AI-hardware codesign of optimized neural networks on FPGAs. We present the design and implementation process for the keyword spotting, anomaly detection, and image classification benchmark tasks. The resulting hardware implementations are quantized, configurable, spatial dataflow architectures tailored for speed and efficiency and introduce new generic optimizations and common workflows developed as a part of this work. The full workflow is presented from quantization-aware training to FPGA implementation. The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms. The resulting submissions achieve latencies as low as 20 $\mu$s and energy consumption as low as 30 $\mu$J per inference. We demonstrate how emerging ML benchmarks on heterogeneous hardware platforms can catalyze collaboration and the development of new techniques and more accessible tools.
DOI: 10.1109/qcs56647.2022.00010
2022
Cited 3 times
Neural network accelerator for quantum control
Efficient quantum control is necessary for practical quantum computing implementations with current technologies. Conventional algorithms for determining optimal control parameters are computationally expensive, largely excluding them from use outside of the simulation. Existing hardware solutions structured as lookup tables are imprecise and costly. By designing a machine learning model to approximate the results of traditional tools, a more efficient method can be produced. Such a model can then be synthesized into a hardware accelerator for use in quantum systems. In this study, we demonstrate a machine learning algorithm for predicting optimal pulse parameters. This algorithm is lightweight enough to fit on a low-resource FPGA and perform inference with a latency of 175 ns and pipeline interval of 5 ns with > 0.99 gate fidelity. In the long term, such an accelerator could be used near quantum computing hardware where traditional computers cannot operate, enabling quantum control at a reasonable cost at low latencies without incurring large data bandwidths outside of the cryogenic environment.
DOI: 10.1109/iccd.2015.7357156
2015
Cited 5 times
A methodology for power characterization of associative memories
Content Addressable Memories (CAM) have become increasingly more important in applications requiring high speed memory search due to their inherent massively parallel processing architecture. We present a complete power analysis methodology for CAM systems to aid the exploration of their power-performance trade-offs in future systems. Our proposed methodology uses detailed transistor level circuit simulation of power behavior and a handful of input data types to simulate full chip power consumption. Furthermore, we applied our power analysis methodology on a custom designed associative memory test chip. This chip was developed by Fermilab for the purpose of developing high performance real-time pattern recognition on high volume data produced by a future large-scale scientific experiment. We applied our methodology to configure a power model for this test chip. Our model is capable of predicting the total average power within 4% of actual power measurements. Our power analysis methodology can be generalized and applied to other CAM-like memory systems and accurately characterize their power behavior.
DOI: 10.1109/fccm48280.2020.00072
2020
Cited 5 times
AIgean: An Open Framework for Machine Learning on Heterogeneous Clusters
Machine learning (ML) in the past decade has been one of the most popular topics of research within the computing community. Interest within the computing field ranges across all levels of the computation stack. We show this stack in Figure 1. This work introduces an open framework, called AIgean, to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs). Users can flexibly modify any layer of the machine learning stack in Figure 1 to suit their need. This allows both machine learning domain experts to focus on higher algorithmic layers, and distributed systems experts to create the communication layers below.
DOI: 10.1016/j.nima.2014.06.029
2014
Cited 4 times
Pre- and post-irradiation performance of FBK 3D silicon pixel detectors for CMS
In preparation for the tenfold luminosity upgrade of the Large Hadron Collider (the HL-LHC) around 2020, three-dimensional (3D) silicon pixel sensors are being developed as a radiation-hard candidate to replace the planar ones currently being used in the CMS pixel detector. This study examines an early batch of FBK sensors (named ATLAS08) of three 3D pixel geometries: 1E, 2E, and 4E, which respectively contain one, two, and four readout electrodes for each pixel, passing completely through the bulk. We present electrical characteristics and beam test performance results for each detector before and after irradiation. The maximum fluence applied is 3.5×1015 n eq/cm2.
DOI: 10.1145/3289602.3293986
2019
Cited 4 times
Fast Inference of Deep Neural Networks for Real-time Particle Physics Applications
Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of such techniques in low-latency, low-power FPGA (Field Programmable Gate Array) hardware has only just begun. FPGA-based trigger and data acquisition systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable many new physics measurements. While we focus on a specific example, the lessons are far-reaching. A compiler package is developed based on High-Level Synthesis (HLS) called HLS4ML to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to allow for directed resource tuning in the low latency environment and assess the impact on our benchmark Physics performance scenario For our example jet substructure model, we fit well within the available resources of modern FPGAs with latency on the scale of 100~ns.
DOI: 10.5281/zenodo.3601436
2020
Cited 4 times
HLS4ML LHC Jet dataset (30 particles)
Dataset of high-pT jets from simulations of LHC proton-proton collisions Prepared for FastML/HLS4ML studies: https://fastmachinelearning.org Includes: High level features (see https://arxiv.org/abs/1804.06913) Images: jet images with up to 30 particles/jet (see https://arxiv.org/abs/1908.05318) List: list of jet features with up to 30 particles/jet (see https://arxiv.org/abs/1908.05318)
DOI: 10.48550/arxiv.2209.04671
2022
Dark Sector Physics at High-Intensity Experiments
Is Dark Matter part of a Dark Sector? The possibility of a dark sector neutral under Standard Model (SM) forces furnishes an attractive explanation for the existence of Dark Matter (DM), and is a compelling new-physics direction to explore in its own right, with potential relevance to fundamental questions as varied as neutrino masses, the hierarchy problem, and the Universe's matter-antimatter asymmetry. Because dark sectors are generically weakly coupled to ordinary matter, and because they can naturally have MeV-to-GeV masses and respect the symmetries of the SM, they are only mildly constrained by high-energy collider data and precision atomic measurements. Yet upcoming and proposed intensity-frontier experiments will offer an unprecedented window into the physics of dark sectors, highlighted as a Priority Research Direction in the 2018 Dark Matter New Initiatives (DMNI) BRN report. Support for this program -- in the form of dark-sector analyses at multi-purpose experiments, realization of the intensity-frontier experiments receiving DMNI funds, an expansion of DMNI support to explore the full breadth of DM and visible final-state signatures (especially long-lived particles) called for in the BRN report, and support for a robust dark-sector theory effort -- will enable comprehensive exploration of low-mass thermal DM milestones, and greatly enhance the potential of intensity-frontier experiments to discover dark-sector particles decaying back to SM particles.
DOI: 10.1103/physrevd.81.079905
2010
Cited 3 times
Publisher’s Note: Spin determination of single-produced resonances at hadron colliders [Phys. Rev. D<b>81</b>, 075022 (2010)]
DOI: 10.1109/mwscas.2017.8052945
2017
Cited 3 times
A content addressable memory with multi-Vdd scheme for low power tunable operation
This paper reports on a content addressable memory (CAM) employing a multi-Vdd scheme for low power pattern recognition applications. The complete design, simulation and testing of the chip is presented along with an exploration of the multi-Vdd design space. The proposed design, operating at an optimal operating point in a triple-Vdd configuration, increases the delay range by 2.4 times and consumes 25.3% less power when compared to a conventional single-Vdd design operating over the same voltage range. Measurement results from a 246 kb test chip fabricated in 130nm Global Foundries Low Power CMOS technology are presented to validate the model and analysis.
DOI: 10.2172/1606538
2020
Cited 3 times
5G Enabled Energy Innovation: Advanced Wireless Networks for Science (Workshop Report)
Rapidly expanding, new telecommunications infrastructure based on 5G technologies will disrupt and transform how we design, build, operate, and optimize scientific infrastructure and the experiments and services enabled by that infrastructure, from continental-scale sensor networks to centralized scientific user facilities, from intelligent Internet of Things devices to supercomputers. Concurrently, 5G will introduce, or exacerbate, challenges related to protecting infrastructure and associated scientific data as well as to fully leveraging opportunities related to expanded infrastructure scale and complexity. The U.S. Department of Energy (DOE) Office of Science operates scientific infrastructure, supporting some of the nation’s most advanced intellectual discoveries, spanning the country and including 30 world-class user facilities from supercomputers to accelerators. Along with field experiments and remote observatories, every aspect of DOE’s scientific enterprise will be affected by 5G, which amounts to a complete renovation of the underpinnings of the nation’s information infrastructure. In this report we explore the scientific opportunities and new research challenges associated with 5G, ranging from scalability to heterogeneity to cybersecurity. The rapid commercial deployment of 5G opens the opportunity to rethink and reinvent DOE’s scientific infrastructure and experimentation, from intelligent sensor networks at unprecedented scales to a digital continuum of cyberinfrastructure spanning low-power sensors, high-performance computing embedded within and at the edge of the network, and DOE’s large-scale user instrument and computing facilities. New programming paradigms, workflow and data frameworks, and AI-based system design, operation, and autonomous adaptation and optimization will be necessary in order to exploit these new opportunities. Field deployments and centralized scientific instruments can also be revolutionized, moving (without traditional performance penalties) from wired to wireless connectivity for data and control systems, improving flexibility, and opening new sensing modalities, including the use of the 5G electromagnetic spectrum itself as an environmental probe. For DOE science, in contrast to commercial 5G applications and settings, devices will be deployed in extreme environments such as cryogenically cooled instrument control systems and in remote settings with harsh conditions, requiring the design of new materials for RF communication and edge processing to operate in these regimes. Concurrently, 5G infrastructure comprises both hardware and sophisticated software systems - currently closed and proprietary. The cybersecurity challenges to 5G-empowered reinvention mirror the complexity and variety of new 5G features, from virtualization to private network slices to ubiquitous access. Research is also needed in order to accelerate the development of secure and open 5G software infrastructure, reducing reliance on hardware and software produced outside the United States and providing the transparency and rigorous evaluation and testing afforded through open software. Twelve broad research thrusts are laid out in four chapters, with a companion fifth chapter (and three additional research thrusts) underscoring the needs and opportunities for an aggressive testbed program co-designed by networking experts and scientists involved in the 15 research thrusts. The urgency of undertaking this research is fueled by a global, accelerating deployment of new telecommunications infrastructure that is designed for entertainment and commercial applications - barely scratching the surface of what 5G can do to extend U.S. leadership in scientific discovery.
2021
Cited 3 times
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.
DOI: 10.1145/3482854
2021
Cited 3 times
<i>AIgean</i> : An Open Framework for Deploying Machine Learning on Heterogeneous Clusters
AIgean , pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs). We leverage two open source projects: Galapagos , for multi-FPGA deployment, and hls4ml , for generating ML kernels synthesizable using Vivado HLS. AIgean provides a full end-to-end multi-FPGA/CPU implementation of a neural network. The user supplies a high-level neural network description, and our tool flow is responsible for the synthesizing of the individual layers, partitioning layers across different nodes, as well as the bridging and routing required for these layers to communicate. If the user is an expert in a particular domain and would like to tinker with the implementation details of the neural network, we define a flexible implementation stack for ML that includes the layers of Algorithms, Cluster Deployment &amp; Communication, and Hardware. This allows the user to modify specific layers of abstraction without having to worry about components outside of their area of expertise, highlighting the modularity of AIgean . We demonstrate the effectiveness of AIgean with two use cases: an autoencoder, and ResNet-50 running across 10 and 12 FPGAs. AIgean leverages the FPGA’s strength in low-latency computing, as our implementations target batch-1 implementations.
DOI: 10.1016/j.nima.2013.07.042
2013
Testbeam and laboratory test results of irradiated 3D CMS pixel detectors
The CMS silicon pixel detector is the tracking device closest to the LHC p–p collisions, which precisely reconstructs the charged particle trajectories. The planar technology used in the current innermost layer of the pixel detector will reach the design limit for radiation hardness at the end of Phase I upgrade and will need to be replaced before the Phase II upgrade in 2020. Due to its unprecedented performance in harsh radiation environments, 3D silicon technology is under consideration as a possible replacement of planar technology for the High Luminosity-LHC or HL-LHC. 3D silicon detectors are fabricated by the Deep Reactive-Ion-Etching (DRIE) technique which allows p- and n-type electrodes to be processed through the silicon substrate as opposed to being implanted through the silicon surface. The 3D CMS pixel devices presented in this paper were processed at FBK. They were bump bonded to the current CMS pixel readout chip, tested in the laboratory, and testbeams carried out at FNAL with the proton beam of 120 GeV/c. In this paper we present the laboratory and beam test results for the irradiated 3D CMS pixel devices.
DOI: 10.1016/b978-1-4377-0914-8.00102-8
2012
Chronic Ankle Instability
DOI: 10.1007/s41781-023-00097-7
2023
Snowmass 2021 Computational Frontier CompF4 Topical Group Report Storage and Processing Resource Access
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commercial clouds, federally funded High Performance Computing (HPC) systems for all of science, and systems funded explicitly for a given experimental or theoretical program. This topical group report summarizes the findings and recommendations for the storage, processing, networking and associated software service infrastructures for future high energy physics research, based on the discussions organized through the Snowmass 2021 community study.
DOI: 10.1007/s41781-023-00101-0
2023
Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing
We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements.
DOI: 10.2172/1959815
2023
Neural network accelerator for quantum control
Technology (FAST) facility's low energy beamline using simulated virtual cathode laser images, gun phases, and solenoid strengths.
DOI: 10.48550/arxiv.2304.06745
2023
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware. Our approach leverages Hessian-aware quantization (HAWQ) of NNs, the Quantized Open Neural Network Exchange (QONNX) intermediate representation, and the hls4ml tool flow for transpiling NNs into FPGA and ASIC firmware. This makes efficient NN implementations in hardware accessible to nonexperts, in a single open-sourced workflow that can be deployed for real-time machine learning applications in a wide range of scientific and industrial settings. We demonstrate the workflow in a particle physics application involving trigger decisions that must operate at the 40 MHz collision rate of the CERN Large Hadron Collider (LHC). Given the high collision rate, all data processing must be implemented on custom ASIC and FPGA hardware within a strict area and latency. Based on these constraints, we implement an optimized mixed-precision NN classifier for high-momentum particle jets in simulated LHC proton-proton collisions.
DOI: 10.2172/1972476
2023
Feebly-Interacting Particles: FIPs 2022 Workshop Report
Particle physics today faces the challenge of explaining the mystery of dark matter, the origin of matter over anti-matter in the Universe, the origin of the neutrino masses, the apparent fine-tuning of the electro-weak scale, and many other aspects of fundamental physics. Perhaps the most striking frontier to emerge in the search for answers involves new physics at mass scales comparable to familiar matter, below the GeV-scale, or even radically below, down to sub-eV scales, and with very feeble interaction strength. New theoretical ideas to address dark matter and other fundamental questions predict such feebly interacting particles (FIPs) at these scales, and indeed, existing data provide numerous hints for such possibility. A vibrant experimental program to discover such physics is under way, guided by a systematic theoretical approach firmly grounded on the underlying principles of the Standard Model. This document represents the report of the FIPs 2022 workshop, held at CERN between the 17 and 21 October 2022 and aims to give an overview of these efforts, their motivations, and the decadal goals that animate the community involved in the search for FIPs.
DOI: 10.2172/1974720
2023
Implementing machine learning methods on QICK hardware for qubit readout &amp;amp; control
Quantum readout and control is a fundamental aspect of quantum computing that requires accurate measurement of qubit states. Errors emerge in all stages, from initialization to readout, and identifying errors in post-processing necessitates resource-intensive statistical analysis. In our work, we use a lightweight fully-connected neural network (NN) to classify states of a transmon system with no prior processing. Our NN accelerator yields higher fidelities (92%) than the classical matched filter method (84%). By exploiting the natural parallelism of NNs and their placement near the source of data on field-programmable gate arrays (FPGAs), we can achieve ultra-low latency on the Quantum Instrumentation Control Kit (QICK). Integrating machine learning methods on QICK opens several pathways for efficient real-time processing of quantum circuits.
DOI: 10.48550/arxiv.2306.03221
2023
Structural Re-weighting Improves Graph Domain Adaptation
In many real-world applications, graph-structured data used for training and testing have differences in distribution, such as in high energy physics (HEP) where simulation data used for training may not match real experiments. Graph domain adaptation (GDA) is a method used to address these differences. However, current GDA primarily works by aligning the distributions of node representations output by a single graph neural network encoder shared across the training and testing domains, which may often yield sub-optimal solutions. This work examines different impacts of distribution shifts caused by either graph structure or node attributes and identifies a new type of shift, named conditional structure shift (CSS), which current GDA approaches are provably sub-optimal to deal with. A novel approach, called structural reweighting (StruRW), is proposed to address this issue and is tested on synthetic graphs, four benchmark datasets, and a new application in HEP. StruRW has shown significant performance improvement over the baselines in the settings with large graph structure shifts, and reasonable performance improvement when node attribute shift dominates.
DOI: 10.48550/arxiv.2306.04712
2023
Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC
The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.
DOI: 10.1109/trs.2023.3290039
2023
Digital Pre-Distortion to Reduce Ringing in Ultrawideband Radar Systems
In this paper, we present the development of a digital pre-distortion (DPD) algorithm to reduce ringing and improve received system fidelity factor in ultrawideband (UWB) radar systems. The DPD algorithm uses Wiener filter compensation followed by an iterative feedback loop to decrease measured ringing. To test the effectiveness of this algorithm at reducing distortion, we arranged antennas in a boresight configuration along with amplifiers at the transmitting and receiving nodes. We applied DPD to pulses with 20-dB bandwidths of 4.3 GHz and 8.46 GHz, respectively. Among tested pulses, the DPD method reduced root mean squared (RMS) ringing up to 11.27 dB, lowered peak ringing below -20-dB, and improved system fidelity factor above 90%.
DOI: 10.32628/ijsrset23103140
2023
Combined prefabrication vertical drain (PVD) with variable preloading and vacuuming method to improve soft ground in the Mekong Delta
The treatment of the foundation of construction works on weak soil often raises issues that need to be resolved such as: the low load-bearing capacity of the ground, the large settlement, and stability of the large area. As a result, it is important to develop appropriate methods to treat the foundation of construction works on weak soil. Dealing with weak soil is a pressing issue in the construction industry. Currently, there are various measures that can be adopdetd to increase the soil durability and reduce settlement of construction works such as: reinforced concrete piles, sheet piles, bored piles, cushion of loose materials, soil mixing with cement or lime, and preloading with vertical drainage. The method, which uses a prefabrication vertical drain (PVD) combined variable preloading and vacuuming, has not been widely studied and implemented. This paper proposes a solution, which incorporates preloading and vacuuming in the PVD, to process weak soil in the residential areas of the Mekong Delta region. The results show that the settlement of weak soil at the core of the embankment is 2.36m after 135 days using the PVD with variable preloading and vacuuming. The safety factor is 1.295.
DOI: 10.48550/arxiv.2307.11242
2023
On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments
This work describes the investigation of neuromorphic computing-based spiking neural network (SNN) models used to filter data from sensor electronics in high energy physics experiments conducted at the High Luminosity Large Hadron Collider. We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum with the goal of reducing the amount of data being sent to the downstream electronics. The incoming charge waveforms are converted to streams of binary-valued events, which are then processed by the SNN. We present our insights on the various system design choices - from data encoding to optimal hyperparameters of the training algorithm - for an accurate and compact SNN optimized for hardware deployment. Our results show that an SNN trained with an evolutionary algorithm and an optimized set of hyperparameters obtains a signal efficiency of about 91% with nearly half as many parameters as a deep neural network.
DOI: 10.1109/iscas46773.2023.10181622
2023
A 3D Implementation of Convolutional Neural Network for Fast Inference
Low latency inference has many applications in edge machine learning. In this paper, we present a run-time configurable convolutional neural network (CNN) inference ASIC design for low-latency edge machine learning. By implementing a 5-stage pipelined CNN inference model in a 3D ASIC technology, we demonstrate that the model distributed on two dies utilizing face-to-face (F2F) 3D integration achieves superior performance. Our experimental results show that the design based on 3D integration achieves 43% better energy-delay product when compared to the traditional 2D technology.
DOI: 10.1007/jhep12(2023)092
2023
Photon-rejection power of the Light Dark Matter eXperiment in an 8 GeV beam
A bstract The Light Dark Matter eXperiment (LDMX) is an electron-beam fixed-target experiment designed to achieve comprehensive model independent sensitivity to dark matter particles in the sub-GeV mass region. An upgrade to the LCLS-II accelerator will increase the beam energy available to LDMX from 4 to 8 GeV. Using detailed GEANT4-based simulations, we investigate the effect of the increased beam energy on the capabilities to separate signal and background, and demonstrate that the veto methodology developed for 4 GeV successfully rejects photon-induced backgrounds for at least 2 × 10 14 electrons on target at 8 GeV.
DOI: 10.1109/usnc-ursi52151.2023.10237998
2023
The Impact of Vibration on TNR for a GPR System
In this paper, we analyze the effect of vibration on B-scan image quality for ultra-wideband (UWB) ground penetrating radar (GPR) systems. Specifically, we consider the target to noise ratio (TNR) specification for each B-scan taken, comparing it with a control setting with no vibration. The vibration frequencies are 10 Hz, 27 Hz, and 37 Hz. Compared to the non-vibrating condition, TNR is reduced by at least 3.75 dB in each vibrating scenario. Among all different vibration settings, the TNR values stay within 0.71 dB of each other.
DOI: 10.48550/arxiv.2310.02474
2023
Smart pixel sensors: towards on-sensor filtering of pixel clusters with deep learning
Highly granular pixel detectors allow for increasingly precise measurements of charged particle tracks. Next-generation detectors require that pixel sizes will be further reduced, leading to unprecedented data rates exceeding those foreseen at the High Luminosity Large Hadron Collider. Signal processing that handles data incoming at a rate of O(40MHz) and intelligently reduces the data within the pixelated region of the detector at rate will enhance physics performance at high luminosity and enable physics analyses that are not currently possible. Using the shape of charge clusters deposited in an array of small pixels, the physical properties of the traversing particle can be extracted with locally customized neural networks. In this first demonstration, we present a neural network that can be embedded into the on-sensor readout and filter out hits from low momentum tracks, reducing the detector's data volume by 54.4-75.4%. The network is designed and simulated as a custom readout integrated circuit with 28 nm CMOS technology and is expected to operate at less than 300 $\mu W$ with an area of less than 0.2 mm$^2$. The temporal development of charge clusters is investigated to demonstrate possible future performance gains, and there is also a discussion of future algorithmic and technological improvements that could enhance efficiency, data reduction, and power per area.
DOI: 10.3389/fdata.2023.1301942
2023
Corrigendum: Applications and techniques for fast machine learning in science
[This corrects the article DOI: 10.3389/fdata.2022.787421.].
DOI: 10.23919/eurad58043.2023.10289551
2023
A Non-Linear Transmission Line with Secondary Soliton Decimation
In this paper, we present the design of a nonlinear transmission line (NLTL) that decimates the secondary soliton using a geometric tapering technique while maintaining the Gaussian pulse shape of the input. The experimental results demonstrate that this NLTL sharpens an input pulse with a 1.8 GHz 20-dB bandwidth to a 4.40 GHz bandwidth with a secondary soliton that is only 10% of the peak pulse amplitude.
DOI: 10.1109/isee59483.2023.10299882
2023
A 6-18-GHz GaN High Power Amplifier with Excellent Gain Flatness
This paper presents a monolithic microwave integrated circuit (MMIC) reactive matching power amplifier (RMPA) designed using a 150 nm AlGaN/GaN on SiC technology, showcasing its exceptional gain flatness of only 3dB across the 6-18 GHz band. The three-stage RMPA operates in pulsed mode with a pulse-width of 100-µsec and a duty cycle of 10%, delivering impressive performance metrics. It achieves a maximum output power of 43.5 dBm (equivalent to 22.4 watts) and a maximum efficiency of 25%, while maintaining a linear gain of 22 dB under a bias condition of 28 V drain voltage and 620 mA quiescent drain current. The MMIC PA has a small dimension of 4.5 mm × 4.05 mm.
DOI: 10.59382/pro.intl.con-ibst.2023.ses3-10
2023
Nghiên cứu hoàn thiện phần mềm xử lý số liệu quan trắc lún công trình ở Việt Nam
Currently, there are many software for processing settlement monitoring data in Vietnam. However, these softwares still lack some module to calculate the relative deflection settlement (S/L) as well as the lack of modules for building settlement model over time. This article presents the content of establishing a program to calculate relative deflection settlement and building settlement model over time. The results of the article aim to improve the current settlement monitoring software in Vietnam in order to automate the analysis and evaluation of settlement monitoring results.
DOI: 10.2172/2204990
2023
Edge AI for accelerator controls (READS): beam loss deblending
model may be producible to help de-blend losses between machines. Work is underway as part of the Fermilab Real-time Edge AI for Distributed Systems Project (READS) to develop a ML empowered system that collects streamed BLM data and additional machine readings to infer in real-time, which machine generated beam loss.
DOI: 10.48550/arxiv.2311.05716
2023
ML-based Real-Time Control at the Edge: An Approach Using hls4ml
This study focuses on implementing a real-time control system for a particle accelerator facility that performs high energy physics experiments. A critical operating parameter in this facility is beam loss, which is the fraction of particles deviating from the accelerated proton beam into a cascade of secondary particles. Accelerators employ a large number of sensors to monitor beam loss. The data from these sensors is monitored by human operators who predict the relative contribution of different sub-systems to the beam loss. Using this information, they engage control interventions. In this paper, we present a controller to track this phenomenon in real-time using edge-Machine Learning (ML) and support control with low latency and high accuracy. We implemented this system on an Intel Arria 10 SoC. Optimizations at the algorithm, high-level synthesis, and interface levels to improve latency and resource usage are presented. Our design implements a neural network, which can predict the main source of beam loss (between two possible causes) at speeds up to 575 frames per second (fps) (average latency of 1.74 ms). The practical deployed system is required to operate at 320 fps, with a 3ms latency requirement, which has been met by our design successfully.
DOI: 10.48550/arxiv.2311.09915
2023
Physics Opportunities at a Beam Dump Facility at PIP-II at Fermilab and Beyond
The Fermilab Proton-Improvement-Plan-II (PIP-II) is being implemented in order to support the precision neutrino oscillation measurements at the Deep Underground Neutrino Experiment, the U.S. flagship neutrino experiment. The PIP-II LINAC is presently under construction and is expected to provide 800~MeV protons with 2~mA current. This white paper summarizes the outcome of the first workshop on May 10 through 13, 2023, to exploit this capability for new physics opportunities in the kinematic regime that are unavailable to other facilities, in particular a potential beam dump facility implemented at the end of the LINAC. Various new physics opportunities have been discussed in a wide range of kinematic regime, from eV scale to keV and MeV. We also emphasize that the timely establishment of the beam dump facility at Fermilab is essential to exploit these new physics opportunities.
DOI: 10.2172/2217167
2023
Physics Opportunities at a Beam Dump Facility at PIP-II at Fermilab and Beyond
The Fermilab Proton-Improvement-Plan-II (PIP-II) is being implemented in order to support the precision neutrino oscillation measurements at the Deep Underground Neutrino Experiment, the U.S. flagship neutrino experiment. The PIP-II LINAC is presently under construction and is expected to provide 800~MeV protons with 2~mA current. This white paper summarizes the outcome of the first workshop on May 10 through 13, 2023, to exploit this capability for new physics opportunities in the kinematic regime that are unavailable to other facilities, in particular a potential beam dump facility implemented at the end of the LINAC. Various new physics opportunities have been discussed in a wide range of kinematic regime, from eV scale to keV and MeV. We also emphasize that the timely establishment of the beam dump facility at Fermilab is essential to exploit these new physics opportunities.
DOI: 10.1364/dh.2023.hw4c.2
2023
Real-Time Instability Tracking with Deep Learning on FPGAs in Magnetic Confinement Fusion Devices
This work enables active control and suppression of MHD instabilities in magnetic confinement fusion devices such as the Tokamak with a feedback control system using high speed cameras and deep learning on frame grabber FPGAs.
DOI: 10.1364/3d.2023.jtu4a.40
2023
FKeras: A Fault Tolerance Library for Keras
We present FKeras, an open-source tool that uses Hessian information to quickly find which parameters in a neural network are sensitive to radiation faults, reducing the usual 200% resource overhead needed to protect them.
DOI: 10.1088/2632-2153/ad1139
2023
Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC
Abstract The Earth mover’s distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.
DOI: 10.48550/arxiv.2312.00128
2023
Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak
Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.
DOI: 10.1088/2632-2153/ad1139
2023
Differentiable Earth mover’s distance for data compression at the high-luminosity LHC
DOI: 10.48550/arxiv.2312.05978
2023
Neural Architecture Codesign for Fast Bragg Peak Analysis
We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in high-energy diffraction microscopy. Traditional approaches, notably pseudo-Voigt fitting, demand significant computational resources, prompting interest in deep learning models for more efficient solutions. Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures. Our results match the performance, while achieving a 13$\times$ reduction in bit operations compared to the previous state-of-the-art. We show further speedup through model compression techniques such as quantization-aware-training and neural network pruning. Additionally, our hierarchical search space provides greater flexibility in optimization, which can easily extend to other tasks and domains.