ϟ

M. J. Kortelainen

Here are all the papers by M. J. Kortelainen that you can download and read on OA.mg.
M. J. Kortelainen’s last known institution is . Download M. J. Kortelainen PDFs here.

Claim this Profile →

DOI: 10.1103/physrevc.91.044307

Nuclear moments and charge radii of neutron-deficient francium isotopes and isomers

Collinear laser fluorescence spectroscopy has been performed on the ground and isomeric states of $^{204,206}\mathrm{Fr}$ in order to determine their spins, nuclear moments, and changes in mean-squared charge radii. A new experimental technique has been developed as part of this work which much enhances the data collection rate while maintaining the high resolution. This has permitted the extension of this study to the two isomeric states in each nucleus. The investigation of nuclear $g$ factors and mean-squared charge radii indicates that the neutron-deficient Fr isotopes lie in a transitional region from spherical towards more collective structures.

DOI: 10.3389/fdata.2020.601728

Heterogeneous Reconstruction of Tracks and Primary Vertices With the CMS Pixel Tracker

The High-Luminosity upgrade of the Large Hadron Collider (LHC) will see the accelerator reach an instantaneous luminosity of 7 × 10 34 cm −2 s −1 with an average pileup of 200 proton-proton collisions. These conditions will pose an unprecedented challenge to the online and offline reconstruction software developed by the experiments. The computational complexity will exceed by far the expected increase in processing power for conventional CPUs, demanding an alternative approach. Industry and High-Performance Computing (HPC) centers are successfully using heterogeneous computing platforms to achieve higher throughput and better energy efficiency by matching each job to the most appropriate architecture. In this paper we will describe the results of a heterogeneous implementation of pixel tracks and vertices reconstruction chain on Graphics Processing Units (GPUs). The framework has been designed and developed to be integrated in the CMS reconstruction software, CMSSW. The speed up achieved by leveraging GPUs allows for more complex algorithms to be executed, obtaining better physics output and a higher throughput.

DOI: 10.1016/j.nima.2008.05.012

Silicon beam telescope for LHC upgrade tests

A beam telescope based on the CMS Tracker data acquisition prototype cards has been developed in order to test sensor candidates for S-LHC tracking systems. The telescope consists of up to eight reference silicon microstrip modules and slots for a couple of test modules. Beam tracks, as measured by the reference modules, provide a means of determining the position resolution and efficiency of the test modules. The impact point precision of reference tracks at the location of the test modules is about 4μm. This note presents a detailed description of the silicon beam telescope (SiBT) along with some results from its initial operation in summer 2007 in the CERN H2 beamline.

DOI: 10.1007/s12350-015-0178-4

Dependence of left ventricular functional parameters on image acquisition time in cardiac-gated myocardial perfusion SPECT

DOI: 10.1051/epjconf/202024505009

Bringing heterogeneity to the CMS software framework

The advent of computing resources with co-processors, for example Graphics Processing Units (GPU) or Field-Programmable Gate Arrays (FPGA), for use cases like the CMS High-Level Trigger (HLT) or data processing at leadership-class supercomputers imposes challenges for the current data processing frameworks. These challenges include developing a model for algorithms to offload their computations on the co-processors as well as keeping the traditional CPU busy doing other work. The CMS data processing framework, CMSSW, implements multithreading using the Intel Threading Building Blocks (TBB) library, that utilizes tasks as concurrent units of work. In this paper we will discuss a generic mechanism to interact effectively with non-CPU resources that has been implemented in CMSSW. In addition, configuring such a heterogeneous system is challenging. In CMSSW an application is configured with a configuration file written in the Python language. The algorithm types are part of the configuration. The challenge therefore is to unify the CPU and co-processor settings while allowing their implementations to be separate. We will explain how we solved these challenges while minimizing the necessary changes to the CMSSW framework. We will also discuss on a concrete example how algorithms would offload work to NVIDIA GPUs using directly the CUDA API.

DOI: 10.48550/arxiv.2401.14221

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Next generation High-Energy Physics (HEP) experiments are presented with significant computational challenges, both in terms of data volume and processing power. Using compute accelerators, such as GPUs, is one of the promising ways to provide the necessary computational power to meet the challenge. The current programming models for compute accelerators often involve using architecture-specific programming languages promoted by the hardware vendors and hence limit the set of platforms that the code can run on. Developing software with platform restrictions is especially unfeasible for HEP communities as it takes significant effort to convert typical HEP algorithms into ones that are efficient for compute accelerators. Multiple performance portability solutions have recently emerged and provide an alternative path for using compute accelerators, which allow the code to be executed on hardware from different vendors. We apply several portability solutions, such as Kokkos, SYCL, C++17 std::execution::par and Alpaka, on two mini-apps extracted from the mkFit project: p2z and p2r. These apps include basic kernels for a Kalman filter track fit, such as propagation and update of track parameters, for detectors at a fixed z or fixed r position, respectively. The two mini-apps explore different memory layout formats. We report on the development experience with different portability solutions, as well as their performance on GPUs and many-core CPUs, measured as the throughput of the kernels from different GPU and CPU vendors such as NVIDIA, AMD and Intel.

DOI: 10.1051/epjconf/202429511017

Performance of Heterogeneous Algorithm Scheduling in CMSSW

The CMS experiment started to utilize Graphics Processing Units (GPU) to accelerate the online reconstruction and event selection running on its High Level Trigger (HLT) farm in the 2022 data taking period. The projections of the HLT farm to the High-Luminosity LHC foresee a significant use of compute accelerators in the LHC Run 4 and onwards in order to keep the cost, size, and power budget of the farm under control. This direction of leveraging compute accelerators has synergies with the increasing use of HPC resources in HEP computing, as HPC machines are employing more and more compute accelerators that are predominantly GPUs today. In this work we review the features developed for the CMS data processing framework, CMSSW, to support the effective utilization of both compute accelerators and many-core CPUs within a highly concurrent task-based framework. We measure the impact of various design choices for the scheduling of heterogeneous algorithms on the event processing throughput, using the Run-3 HLT application as a realistic use case.

DOI: 10.1051/epjconf/202429511003

Application of performance portability solutions for GPUs and many-core CPUs to track reconstruction kernels

Next generation High-Energy Physics (HEP) experiments are presented with significant computational challenges, both in terms of data volume and processing power. Using compute accelerators, such as GPUs, is one of the promising ways to provide the necessary computational power to meet the challenge. The current programming models for compute accelerators often involve using architecture-specific programming languages promoted by the hardware vendors and hence limit the set of platforms that the code can run on. Developing software with platform restrictions is especially unfeasible for HEP communities as it takes significant effort to convert typical HEP algorithms into ones that are efficient for compute accelerators. Multiple performance portability solutions have recently emerged and provide an alternative path for using compute accelerators, which allow the code to be executed on hardware from different vendors. We apply several portability solutions, such as Kokkos, SYCL, C++17 std::execution::par, Alpaka, and OpenMP/OpenACC, on two mini-apps extracted from the mkFit project: p2z and p2r. These apps include basic kernels for a Kalman filter track fit, such as propagation and update of track parameters, for detectors at a fixed z or fixed r position, respectively. The two mini-apps explore different memory layout formats. We report on the development experience with different portability solutions, as well as their performance on GPUs and many-core CPUs, measured as the throughput of the kernels from different GPU and CPU vendors such as NVIDIA, AMD and Intel.

DOI: 10.1051/epjconf/202429503019

Generalizing mkFit and its Application to HL-LHC

mkFit is an implementation of the Kalman filter-based track reconstruction algorithm that exploits both threadand data-level parallelism. In the past few years the project transitioned from the R&D phase to deployment in the Run-3 offline workflow of the CMS experiment. The CMS tracking performs a series of iterations, targeting reconstruction of tracks of increasing difficulty after removing hits associated to tracks found in previous iterations. mkFit has been adopted for several of the tracking iterations, which contribute to the majority of reconstructed tracks. When tested in the standard conditions for production jobs, speedups in track pattern recognition are on average of the order of 3.5x for the iterations where it is used (3-7x depending on the iteration). Multiple factors contribute to the observed speedups, including vectorization and a lightweight geometry description, as well as improved memory management and single precision. Efficient vectorization is achieved with both the icc and the gcc (default in CMSSW) compilers and relies on a dedicated library for small matrix operations, Matriplex, which has recently been released in a public repository. While the mkFit geometry description already featured levels of abstraction from the actual Phase-1 CMS tracker, several components of the implementations were still tied to that specific geometry. We have further generalized the geometry description and the configuration of the run-time parameters, in order to enable support for the Phase-2 upgraded tracker geometry for the HL-LHC and potentially other detector configurations. The implementation strategy and high-level code changes required for the HL-LHC geometry are presented. Speedups in track building from mkFit imply that track fitting becomes a comparably time consuming step of the tracking chain. Prospects for an mkFit implementation of the track fit are also discussed.

DOI: 10.1051/epjconf/202429511008

Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code

In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std::par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD GPUs as well as on a CPU, and compare those to what a native version of the code achieves on each platform.

DOI: 10.1016/j.nima.2009.01.189

Off-line calibration and data analysis for the silicon beam telescope on the CERN H2 beam

The Silicon Beam Telescope (SiBT07) at the CERN H2 beam is a position-sensitive beam telescope targeted for LHC upgrade tests. The telescope consists of eight consecutive silicon microstrip detectors and slots for two test detectors. This article describes the reconstruction of reference tracks with the CMS data analysis software CMSSW. The related data analysis and calibration procedures, including pedestal corrections, common-mode corrections, and track-based alignment, are also described.

DOI: 10.1007/s12350-017-0844-9

Respiratory motion reduction with a dual gating approach in myocardial perfusion SPECT: Effect on left ventricular functional parameters

DOI: 10.1088/1742-6596/2438/1/012058

Performance portability for the CMS Reconstruction with Alpaka

Abstract For CMS, Heterogeneous Computing is a powerful tool to face the computational challenges posed by the upgrades of the LHC, and will be used in production at the High Level Trigger during Run 3. In principle, to offload the computational work on non-CPU resources, while retaining their performance, different implementations of the same code are required. This would introduce code-duplication which is not sustainable in terms of maintainability and testability of the software. Performance portability libraries allow to write code once and run it on different architectures with close-to-native performance. The CMS experiment is evaluating performance portability libraries for the near term future.

DOI: 10.48550/arxiv.2306.15869

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration.

DOI: 10.1016/j.nima.2009.08.017

Test beam results of heavily irradiated magnetic Czochralski silicon (MCz-Si) strip detectors

Strip detectors with an area of 16cm2 were processed on high resistivity n-type magnetic Czochralski silicon. In addition, detectors were processed on high resistivity Float Zone wafers with the same mask set for comparison. The detectors were irradiated to several different fluences up to the fluence of 3×10151MeVneq/cm2 with protons or with mixed protons and neutrons. The detectors were fully characterized with CV- and IV-measurements prior to and after the irradiation. The beam test was carried out at the CERN H2 beam line using a silicon beam telescope that determines the tracks of the incoming particles and hence provides a reference measurement for the detector characterization. The n-type MCz-Si strip detectors have an acceptable S/N at least up to the fluence of 1×1015neq/cm2 and thus, they are a feasible option for the strip detector layers in the SLHC tracking systems.

DOI: 10.1016/j.nima.2009.01.071

TCT and test beam results of irradiated magnetic Czochralski silicon (MCz-Si) detectors

Pad and strip detectors processed on high resistivity n-type magnetic Czochralski silicon (MCz-Si) were irradiated to several different fluences with protons.The pad detectors were characterized with the Transient Current Technique (TCT) and the full-size strip detectors with a reference beam telescope and a 225 GeV muon beam.The TCT measurements indicate a double junction structure and space charge sign inversion in MCz-Si detectors after 6 × 10 14 1 MeV n eq /cm 2 fluence.In the beam test a S/N of 50 was measured for a non-irradiated MCz-Si sensor, and a S/N of 20 for the sensors irradiated to the fluences of 1 × 10 14 1 MeV n eq /cm 2 , and 5 × 10 14 1 MeV n eq /cm 2 .

DOI: 10.1016/j.nima.2010.06.327

Czochralski silicon as a detector material for S-LHC tracker volumes

With an expected 10-fold increase in luminosity in S-LHC, the radiation environment in the tracker volumes will be considerably harsher for silicon-based detectors than the already harsh LHC environment. Since 2006, a group of CMS institutes, using a modified CMS DAQ system, has been exploring the use of Magnetic Czochralski silicon as a detector element for the strip tracker layers in S-LHC experiments. Both p+/n-/n+ and n+/p-/p+ sensors have been characterized, irradiated with proton and neutron sources, assembled into modules, and tested in a CERN beamline. There have been three beam studies to date and results from these suggest that both p+/n-/n+ and n+/p-/p+ Magnetic Czochralski silicon are sufficiently radiation hard for the R>25cm regions of S-LHC tracker volumes. The group has also explored the use of forward biasing for heavily irradiated detectors, and although this mode requires sensor temperatures less than −50 °C, the charge collection efficiency appears to be promising.

DOI: 10.1109/tsc.2015.2469292

Secure Cloud Connectivity for Scientific Applications

Cloud computing improves utilization and flexibility in allocating computing resources while reducing the infrastructural costs. However, in many cases cloud technology is still proprietary and tainted by security issues rooted in the multi-user and hybrid cloud environment. A lack of secure connectivity in a hybrid cloud environment hinders the adaptation of clouds by scientific communities that require scaling-out of the local infrastructure using publicly available resources for large-scale experiments. In this article, we present a case study of the DII-HEP secure cloud infrastructure and propose an approach to securely scale-out a private cloud deployment to public clouds in order to support hybrid cloud scenarios. A challenge in such scenarios is that cloud vendors may offer varying and possibly incompatible ways to isolate and interconnect virtual machines located in different cloud networks. Our approach is tenant driven in the sense that the tenant provides its connectivity mechanism. We provide a qualitative and quantitative analysis of a number of alternatives to solve this problem. We have chosen one of the standardized alternatives, Host Identity Protocol, for further experimentation in a production system because it supports legacy applications in a topologically-independent and secure way.

DOI: 10.1007/s12149-019-01335-y

Effect of respiratory motion on cardiac defect contrast in myocardial perfusion SPECT: a physical phantom study

Correction for respiratory motion in myocardial perfusion imaging requires sorting of emission data into respiratory windows where the intra-window motion is assumed to be negligible. However, it is unclear how much intra-window motion is acceptable. The aim of this study was to determine an optimal value of intra-window residual motion.A custom-designed cardiac phantom was created and imaged with a standard dual-detector SPECT/CT system using Tc-99m as the radionuclide. Projection images were generated from the list-mode data simulating respiratory motion blur of several magnitudes from 0 (stationary phantom) to 20 mm. Cardiac defect contrasts in six anatomically different locations, as well as myocardial perfusion of apex, anterior, inferior, septal and lateral walls, were measured at each motion magnitude. Stationary phantom data were compared to motion-blurred data. Two physicians viewed the images and evaluated differences in cardiac defect visibility and myocardial perfusion.Significant associations were observed between myocardial perfusion in the anterior and inferior walls and respiratory motion. Defect contrasts were found to decline as a function of motion, but the magnitude of the decline depended on the location and shape of the defect. Defects located near the cardiac apex lost contrast more rapidly than those located on the anterior, inferior, septal and lateral wall. The contrast decreased by less than 5% at every location when the motion magnitude was 2 mm or less. According to a visual evaluation, there were differences in myocardial perfusion if the magnitude of the motion was greater than 1 mm, and there were differences in the visibility of the cardiac defect if the magnitude of the motion was greater than 9 mm.Intra-window respiratory motion should be limited to 2 mm to effectively correct for respiratory motion blur in myocardial perfusion SPECT.

DOI: 10.48550/arxiv.2203.09945

Portability: A Necessary Approach for Future Scientific Software

Today's world of scientific software for High Energy Physics (HEP) is powered by x86 code, while the future will be much more reliant on accelerators like GPUs and FPGAs. The portable parallelization strategies (PPS) project of the High Energy Physics Center for Computational Excellence (HEP/CCE) is investigating solutions for portability techniques that will allow the coding of an algorithm once, and the ability to execute it on a variety of hardware products from many vendors, especially including accelerators. We think without these solutions, the scientific success of our experiments and endeavors is in danger, as software development could be expert driven and costly to be able to run on available hardware infrastructure. We think the best solution for the community would be an extension to the C++ standard with a very low entry bar for users, supporting all hardware forms and vendors. We are very far from that ideal though. We argue that in the future, as a community, we need to request and work on portability solutions and strive to reach this ideal.

DOI: 10.1016/j.nima.2009.08.006

Test beam results of a heavily irradiated Current Injected Detector (CID)

A heavily irradiated (3×1015 1 MeV neq/cm2) Current Injected Detector (CID) was tested with 225 GeV muon beam at CERN H2 beam line. In the CID concept the current is limited by the space charge. The injected carriers will be trapped by the deep levels and this induces a stable electric field through the entire bulk regardless of the irradiation fluence the detector has been exposed to. The steady-state density of the trapped charge is defined by the balance between the trapping and the emission rates of charge carriers (detrapping). Thus, the amount of charge injection needed for the electric field stabilization depends on the temperature. AC-coupled 16 cm2 detector was processed on high resistivity n-type magnetic Czochralski silicon, and it had 768 strips, 50 μm pitch, 10 μm strip width and 3.9 cm strip length. The beam test was carried out using a silicon beam telescope that is based on the CMS detector readout prototype components, APV25 readout chips, and eight strip sensors made by Hamamatsu having 60 μm pitch and intermediate strips. The tested CID detector was bonded to the APV25 readout, and it was operated at temperatures ranging from −40 to −53 °C. The CID detector irradiated at 3×1015 1 MeV neq/cm2 fluence shows about 40% relative Charge Collection Efficiency with respect to the non-irradiated reference plane sensors.

DOI: 10.1088/1748-0221/15/09/p09030

Speeding up particle track reconstruction using a parallel Kalman filter algorithm

One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is determining the trajectory of charged particles during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster computational throughput, we have adapted Kalman-filter-based methods for highly parallel, many-core SIMD architectures that are now prevalent in high-performance hardware. In this paper, we discuss the design and performance of the improved tracking algorithm, referred to as MKFIT. A key piece of the algorithm is the MATRIPLEX library, containing dedicated code to optimally vectorize operations on small matrices. The physics performance of the MKFIT algorithm is comparable to the nominal CMS tracking algorithm when reconstructing tracks from simulated proton-proton collisions within the CMS detector. We study the scaling of the algorithm as a function of the parallel resources utilized and find large speedups both from vectorization and multi-threading. MKFIT achieves a speedup of a factor of 6 compared to the nominal algorithm when run in a single-threaded application within the CMS software framework.

DOI: 10.1109/tns.2010.2050905

Track-Induced Clustering in Position Sensitive Detector Characterization

The formation of clusters in the data analysis of position-sensitive detectors is traditionally based on signal-to-noise ratio thresholds. For detectors with a very low signal-to-noise ratio, e.g., as a result of radiation damage, the total collected charge obtained from the clusters is biased to the greater signal values resulting from the thresholds. In this paper an unbiased method to measure the charge collection of a silicon strip detector in a test beam environment is presented. The method is based on constructing the clusters on test detectors around the impact point of the reference track.

DOI: 10.1186/s40658-019-0261-z

Time-modified OSEM algorithm for more robust assessment of left ventricular dyssynchrony with phase analysis in ECG-gated myocardial perfusion SPECT

Abstract Background In ordered subsets expectation maximization (OSEM) reconstruction of electrocardiography (ECG)-gated myocardial perfusion single-photon emission computed tomography (SPECT), it is often assumed that the image acquisition time is constant for each projection angle and ECG bin. Due to heart rate variability (HRV), this assumption may lead to errors in quantification of left ventricular mechanical dyssynchrony with phase analysis. We hypothesize that a time-modified OSEM (TOSEM) algorithm provides more robust results. Methods List-mode data of 44 patients were acquired with a dual-detector SPECT/CT system and binned to eight ECG bins. First, activity ratio (AR)—the ratio of total activity in the last OSEM-reconstructed ECG bin and first five ECG bins—was computed, as well as standard deviation SD R-R of the accepted R–R intervals; their association was evaluated with Pearson correlation analysis. Subsequently, patients whose AR was higher than 90% were selected, and their list-mode data were rebinned by omitting a part of the acquired counts to yield AR values of 90%, 80%, 70%, 60% and 50%. These data sets were reconstructed with OSEM and TOSEM algorithms, and phase analysis was performed. Reliability of both algorithms was assessed by computing concordance correlation coefficients (CCCs) between the 90% data and data corresponding to lower AR values. Finally, phase analysis results assessed from OSEM- and TOSEM-reconstructed images were compared. Results A strong negative correlation ( r = -0.749) was found between SD R-R and AR. As AR decreased, phase analysis parameters obtained from OSEM images decreased significantly. On the contrary, reduction of AR had no significant effect on phase analysis parameters obtained from TOSEM images (CCC > 0.88). The magnitude of difference between OSEM and TOSEM results increased as AR decreased. Conclusions TOSEM algorithm minimizes the HRV-related error and can be used to provide more robust phase analysis results.

DOI: 10.1109/tns.2009.2013950

Recent Progress in CERN RD39: Radiation Hard Cryogenic Silicon Detectors for Applications in LHC Experiments and Their Future Upgrades

CERN RD39 Collaboration develops radiation-hard cryogenic silicon detectors. Recently, we have demonstrated improved radiation hardness in novel Current Injected Detectors (CID). For detector characterization, we have applied cryogenic Transient Current Technique (C-TCT). In beam tests, heavily irradiated CID detector showed capability for particle detection. Our results show that the CID detectors are operational at the temperature -50degC after the fluence of 1 times 10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">16</sup> 1 MeV neutron equivalent/cm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> .

DOI: 10.48550/arxiv.2304.05853

Speeding up the CMS track reconstruction with a parallelized and vectorized Kalman-filter-based algorithm during the LHC Run 3

One of the most challenging computational problems in the Run 3 of the Large Hadron Collider (LHC) and more so in the High-Luminosity LHC (HL-LHC) is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods used so far at the LHC and in particular at the CMS experiment are based on the Kalman filter technique. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD architectures. This adapted Kalman-filter-based software, called "mkFit", was shown to provide a significant speedup compared to the traditional algorithm, thanks to its parallelized and vectorized implementation. The mkFit software was recently integrated into the offline CMS software framework, in view of its exploitation during the Run 3 of the LHC. At the start of the LHC Run 3, mkFit will be used for track finding in a subset of the CMS offline track reconstruction iterations, allowing for significant improvements over the existing framework in terms of computational performance, while retaining comparable physics performance. The performance of the CMS track reconstruction using mkFit at the start of the LHC Run 3 is presented, together with prospects of further improvement in the upcoming years of data taking.

DOI: 10.2172/1973419

Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code

memory management can depend heavily on the application. In this paper we evaluate the performance impact of CUDA unified memory using the heterogeneous pixel reconstruction code from the CMS experiment as a realistic use case of a GPU-targeting HEP reconstruction software. We also compare the programming model using CUDA unified memory to the explicit management of separate CPU and GPU memory spaces.

DOI: 10.2172/1973609

Performance of Heterogeneous Algorithm Scheduling in CMSSW

during processing. This was accomplished by amortizing the memory cost of EventSetup data products across multiple concurrent events. To initially accomplish that goal required synchronizing event processing across IOV boundaries, thereby decreasing the scalability of the system. In this presentation we will explain how we used 'limited concurrent task queues' to allow concurrent IOVs while still being able to limit the memory utilized.

DOI: 10.48550/arxiv.2312.11728

Generalizing mkFit and its Application to HL-LHC

mkFit is an implementation of the Kalman filter-based track reconstruction algorithm that exploits both thread- and data-level parallelism. In the past few years the project transitioned from the R&D phase to deployment in the Run-3 offline workflow of the CMS experiment. The CMS tracking performs a series of iterations, targeting reconstruction of tracks of increasing difficulty after removing hits associated to tracks found in previous iterations. mkFit has been adopted for several of the tracking iterations, which contribute to the majority of reconstructed tracks. When tested in the standard conditions for production jobs, speedups in track pattern recognition are on average of the order of 3.5x for the iterations where it is used (3-7x depending on the iteration). Multiple factors contribute to the observed speedups, including vectorization and a lightweight geometry description, as well as improved memory management and single precision. Efficient vectorization is achieved with both the icc and the gcc (default in CMSSW) compilers and relies on a dedicated library for small matrix operations, Matriplex, which has recently been released in a public repository. While the mkFit geometry description already featured levels of abstraction from the actual Phase-1 CMS tracker, several components of the implementations were still tied to that specific geometry. We have further generalized the geometry description and the configuration of the run-time parameters, in order to enable support for the Phase-2 upgraded tracker geometry for the HL-LHC and potentially other detector configurations. The implementation strategy and high-level code changes required for the HL-LHC geometry are presented. Speedups in track building from mkFit imply that track fitting becomes a comparably time consuming step of the tracking chain.

Speeding up Particle Track Reconstruction in the CMS Detector using a Vectorized and Parallelized Kalman Filter Algorithm

Building particle tracks is the most computationally intense step of event reconstruction at the LHC. With the increased instantaneous luminosity and associated increase in pileup expected from the High-Luminosity LHC, the computational challenge of track finding and fitting requires novel solutions. The current track reconstruction algorithms used at the LHC are based on Kalman filter methods that achieve good physics performance. By adapting the Kalman filter techniques for use on many-core SIMD architectures such as the Intel Xeon and Intel Xeon Phi and (to a limited degree) NVIDIA GPUs, we are able to obtain significant speedups and comparable physics performance. New optimizations, including a dedicated post-processing step to remove duplicate tracks, have improved the algorithm's performance even further. Here we report on the current structure and performance of the code and future plans for the algorithm.

DOI: 10.1051/epjconf/202024502013

Reconstruction of Charged Particle Tracks in Realistic Detector Geometry Using a Vectorized and Parallelized Kalman Filter Algorithm

One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is finding and fitting particle tracks during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster computational throughput, we have adapted Kalman-filterbased methods for highly parallel, many-core SIMD and SIMT architectures that are now prevalent in high-performance hardware. Previously we observed significant parallel speedups, with physics performance comparable to CMS standard tracking, on Intel Xeon, Intel Xeon Phi, and (to a limited extent) NVIDIA GPUs. While early tests were based on artificial events occurring inside an idealized barrel detector, we showed subsequently that our mkFit software builds tracks successfully from complex simulated events (including detector pileup) occurring inside a geometrically accurate representation of the CMS-2017 tracker. Here, we report on advances in both the computational and physics performance of mkFit, as well as progress toward integration with CMS production software. Recently we have improved the overall efficiency of the algorithm by preserving short track candidates at a relatively early stage rather than attempting to extend them over many layers. Moreover, mkFit formerly produced an excess of duplicate tracks; these are now explicitly removed in an additional processing step. We demonstrate that with these enhancements, mkFit becomes a suitable choice for the first iteration of CMS tracking, and eventually for later iterations as well. We plan to test this capability in the CMS High Level Trigger during Run 3 of the LHC, with an ultimate goal of using it in both the CMS HLT and offline reconstruction for the HL-LHC CMS tracker.

DOI: 10.1051/epjconf/201921402002

Parallelized and Vectorized Tracking Using Kalman Filters with CMS Detector Geometry and Events

The High-Luminosity Large Hadron Collider at CERN will be characterized by greater pileup of events and higher occupancy, making the track reconstruction even more computationally demanding. Existing algorithms at the LHC are based on Kalman filter techniques with proven excellent physics performance under a variety of conditions. Starting in 2014, we have been developing Kalman-filter-based methods for track finding and fitting adapted for many-core SIMD processors that are becoming dominant in high-performance systems. This paper summarizes the latest extensions to our software that allow it to run on the realistic CMS-2017 tracker geometry using CMSSW-generated events, including pileup. The reconstructed tracks can be validated against either the CMSSW simulation that generated the detector hits, or the CMSSW reconstruction of the tracks. In general, the code’s computational performance has continued to improve while the above capabilities were being added. We demonstrate that the present Kalman filter implementation is able to reconstruct events with comparable physics performance to CMSSW, while providing generally better computational performance. Further plans for advancing the software are discussed.

DOI: 10.1109/cybersa.2016.7503294

Instant message classification in Finnish cyber security themed free-form discussion

Instant messaging enables rapid collaboration between professionals during cyber security incidents. However, monitoring discussion manually becomes challenging as the number of communication channels increases. Failure to identify relevant information from the free-form instant messages may lead to reduced situational awareness. In this paper, the problem was approached by developing a framework for classification of instant message topics of cyber security-themed discussion in Finnish. The program utilizes open source software components in morphological analysis, and subsequently converts the messages into Bag-of-Words representations before classifying them into predetermined incident categories. We compared support vector machines (SVM), multinomial naïve Bayes, and complement naïve Bayes (CNB) classification methods with five-fold cross-validation. A combination of SVM and CNB achieved classification accuracy of over 85 %, while multiclass SVM achieved 87 % accuracy. The implemented program recognizes cyber security-related messages in IRC chat rooms and categorizes them accordingly.

DOI: 10.22619/ijcsa.2016.100105

Instant Message Classification in Finnish Cyber Security Themed Free-Form Discussion

Instant messaging enables rapid collaboration between professionals during cyber security incidents.However, monitoring discussion manually becomes challenging as the number of communication channels increases.Failure to identify relevant information from the free-form instant messages may lead to reduced situational awareness.In this paper, the problem was approached by developing a framework for classification of instant message topics of cyber security-themed discussion in Finnish.The program utilizes open source software components in morphological analysis, and subsequently converts the messages into Bag-of-Words representations before classifying them into predetermined incident categories.We compared Support vector machines (SVM), multinomial naïve Bayes (MNB), complement naïve Bayes classification methods (CNB) with fivefold cross-validation.A combination of SVM and CNB achieved classification accuracy of over 85%, while multiclass SVM achieved 87% accuracy.The implemented program recognizes cyber security related messages in IRC chat rooms and categorizes them accordingly.

DOI: 10.1088/1742-6596/608/1/012010

An overview of the DII-HEP OpenStack based CMS data analysis

An OpenStack based private cloud with the Cluster File System has been built and used with both CMS analysis and Monte Carlo simulation jobs in the Datacenter Indirection Infrastructure for Secure High Energy Physics (DII-HEP) project. On the cloud we run the ARC middleware that allows running CMS applications without changes on the job submission side. Our test results indicate that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.

DOI: 10.2172/1570206

CMS Patatrack Project [PowerPoint]

This talk presents the technical performance and lessons learned of the Patatrack demonstator, where the CMS pixel local reconstruction and pixel-only track reconstruction have been ported to NVIDIA GPUs. The demonstrator is run within the CMS software framework (CMSSW), and the model of integrating CUDA algorithms to CMSSW is discussed as well.

Heterogeneous reconstruction of tracks and primary vertices with the CMS pixel tracker

The High-Luminosity upgrade of the LHC will see the accelerator reach an instantaneous luminosity of $7\times 10^{34} cm^{-2}s^{-1}$ with an average pileup of $200$ proton-proton collisions. These conditions will pose an unprecedented challenge to the online and offline reconstruction software developed by the experiments. The computational complexity will exceed by far the expected increase in processing power for conventional CPUs, demanding an alternative approach. Industry and High-Performance Computing (HPC) centres are successfully using heterogeneous computing platforms to achieve higher throughput and better energy efficiency by matching each job to the most appropriate architecture. In this paper we will describe the results of a heterogeneous implementation of pixel tracks and vertices reconstruction chain on Graphics Processing Units (GPUs). The framework has been designed and developed to be integrated in the CMS reconstruction software, CMSSW. The speed up achieved by leveraging GPUs allows for more complex algorithms to be executed, obtaining better physics output and a higher throughput.

DOI: 10.2172/1630717

Bringing heterogeneity to the CMS software framework [Slides]

Co-processors or accelerators like GPUs and FPGAs are becoming more and more popular. CMS’ data processing framework (CMSSW) implements multi-threading using Intel TBB utilizing tasks as concurrent units of work. We have developed generic mechanisms within the CMSSW framework to interact effectively with non-CPU resources and configure CPU and non-CPU algorithms in a unified way. As a first step to gain experience, we have explored mechanisms for how algorithms could offload work to NVIDIA GPUs with CUDA.

DOI: 10.48550/arxiv.2008.13461

Heterogeneous reconstruction of tracks and primary vertices with the CMS pixel tracker

The High-Luminosity upgrade of the LHC will see the accelerator reach an instantaneous luminosity of $7\times 10^{34} cm^{-2}s^{-1}$ with an average pileup of $200$ proton-proton collisions. These conditions will pose an unprecedented challenge to the online and offline reconstruction software developed by the experiments. The computational complexity will exceed by far the expected increase in processing power for conventional CPUs, demanding an alternative approach. Industry and High-Performance Computing (HPC) centres are successfully using heterogeneous computing platforms to achieve higher throughput and better energy efficiency by matching each job to the most appropriate architecture. In this paper we will describe the results of a heterogeneous implementation of pixel tracks and vertices reconstruction chain on Graphics Processing Units (GPUs). The framework has been designed and developed to be integrated in the CMS reconstruction software, CMSSW. The speed up achieved by leveraging GPUs allows for more complex algorithms to be executed, obtaining better physics output and a higher throughput.

DOI: 10.1186/s40658-021-00355-w

Effect of data conserving respiratory motion compensation on left ventricular functional parameters assessed in gated myocardial perfusion SPECT

Abstract Background Respiratory motion compromises image quality in myocardial perfusion (MP) single-photon emission computed tomography (SPECT) imaging and may affect analysis of left ventricular (LV) functional parameters, including phase analysis-quantified mechanical dyssynchrony parameters. In this paper, we investigate the performance of two algorithms, respiratory blur modeling (RBM) and joint motion-compensated (JMC) ordered-subsets expectation maximization (OSEM), and the effects of motion compensation on cardiac-gated MP-SPECT studies. Methods Image acquisitions were carried out with a dual-detector SPECT/CT system in list-mode format. A cardiac phantom was imaged as stationary and under respiratory motion. The images were reconstructed with OSEM, RBM-OSEM, and JMC-OSEM algorithms, and compared in terms of mean squared error (MSE). Subsequently, MP-SPECT data of 19 patients were binned into dual-gated (respiratory and cardiac gating) projection images. The images of the patients were analyzed with Quantitative Gated SPECT (QGS) 2012 program (Cedars-Sinai Medical Center, USA). The parameters of interest were LV volumes, ejection fraction, wall motion, wall thickening, phase analysis, and perfusion parameters. Results In phantom experiment, compared to the stationary OSEM reconstruction, the MSE values for OSEM, RBM-OSEM, and JMC-OSEM were 8.5406·10 −5 ,2.7190·10 −5 , and 2.0795·10 −5 , respectively. In the analysis of LV function, use of JMC had a small but statistically significant ( p < 0.05) effect on several parameters: it increased LV volumes and standard deviation of phase angle histogram, and it decreased ejection fraction, global wall motion, and lateral, septal, and apical perfusion. Conclusions Compared to standard OSEM algorithm, RBM-OSEM and JMC-OSEM both improve image quality under motion. Motion compensation has a minor effect on LV functional parameters.

DOI: 10.1051/epjconf/202125103034

Porting CMS Heterogeneous Pixel Reconstruction to Kokkos

Programming for a diverse set of compute accelerators in addition to the CPU is a challenge. Maintaining separate source code for each architecture would require lots of effort, and development of new algorithms would be daunting if it had to be repeated many times. Fortunately there are several portability technologies on the market such as Alpaka, Kokkos, and SYCL. These technologies aim to improve the developer’s productivity by making it possible to use the same source code for many different architectures. In this paper we use heterogeneous pixel reconstruction code from the CMS experiment at the CERNL LHC as a realistic use case of a GPU-targeting HEP reconstruction software, and report experience from prototyping a portable version of it using Kokkos. The development was done in a standalone program that attempts to model many of the complexities of a HEP data processing framework such as CMSSW. We also compare the achieved event processing throughput to the original CUDA code and a CPU version of it.

DOI: 10.1051/epjconf/202125104017

Heterogeneous techniques for rescaling energy deposits in the CMS Phase-2 endcap calorimeter

We present the porting to heterogeneous architectures of the algorithm used for applying linear transformations of raw energy deposits in the CMS High Granularity Calorimeter (HGCAL). This is the first heterogeneous algorithm to be fully integrated with HGCAL’s reconstruction chain. After introducing the latter and giving a brief description of the structural components of HGCAL relevant for this work, the role of the linear transformations in the calibration is reviewed. The many ways in which parallelization is achieved are described, and the successful validation of the heterogeneous algorithm is covered. Detailed performance measurements are presented, including throughput and execution time for both CPU and GPU algorithms, therefore establishing the corresponding speedup. We finally discuss the interplay between this work and the porting of other algorithms in the existing reconstruction chain, as well as integrating algorithms previously ported but not yet integrated.

DOI: 10.22323/1.209.0010

Data-driven background measurements in CMS

Development of respiratory motion compensation for gated myocardial perfusion single-photon emission computed tomography

Search for a Light Charged Higgs Boson in the CMS Experiment in pp Collisions at $\sqrt{s}$= 7 TeV

DOI: 10.22323/1.114.0009

Missing ET and jets, trigger and reconstruction efficiency

DOI: 10.22323/1.156.0012

Data-driven background estimation in CMS

DOI: 10.1088/1742-6596/219/3/032010

Ideal<i>τ</i>tagging with the multivariate data-analysis toolkit TMVA

The experience on using ROOT package TMVA for multivariate data analysis is reported for a problem of τ tagging in the framework of heavy charged MSSM Higgs boson searches at the LHC. We investigate with a generator level analysis how the τ tagging could be performed in an ideal case, and hadronic τ decays separated from the hadronic jets of QCD multi-jet background present in LHC experiments. A successful separation of the Higgs signal from the background requires a rejection factor of 105 or better against the QCD background. The τ tagging efficiency and background rejection are studied with various MVA classifiers.

Portability: A Necessary Approach for Future Scientific Software

DOI: 10.5281/zenodo.6323536

DUNE Software Framework Requirements - HSF Review

DOI: 10.2172/1881235

Heterogeneous hardware adoption and portability [Slides]

affecting execution speed, and the length of the OS scheduler time slot. We show how these features of modern multicores can be discovered programmatically. We also show how the features could interfere with each other resulting in incorrect interpretation of the results, and how established classification and statistical analysis techniques reduce experimental noise and aid automatic interpretation of results.

DOI: 10.1109/escience.2018.00090

Strategies for Modeling Extreme Luminosities in the CMS Simulation

The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.

DOI: 10.2172/1668396

Parallelization for HEP Reconstruction

in porting existing serial algorithms to many-core devices. Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the parallel devices.

DOI: 10.1088/1742-6596/1525/1/012078

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures with the CMS Detector

Abstract In the High–Luminosity Large Hadron Collider (HL–LHC), one of the most challenging computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods currently in use at the LHC are based on the Kalman filter. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD (single instruction, multiple data) and SIMT (single instruction, multiple thread) architectures. Our adapted Kalman-filter-based software has obtained significant parallel speedups using such processors, e.g., Intel Xeon Phi, Intel Xeon SP (Scalable Processors) and (to a limited degree) NVIDIA GPUs. Recently, an effort has started towards the integration of our software into the CMS software framework, in view of its exploitation for the Run III of the LHC. Prior reports have shown that our software allows in fact for some significant improvements over the existing framework in terms of computational performance with comparable physics performance, even when applied to realistic detector configurations and event complexity. Here, we demonstrate that in such conditions physics performance can be further improved with respect to our prior reports, while retaining the improvements in computational performance, by making use of the knowledge of the detector and its geometry.

DOI: 10.2172/1623357

Bringing heterogeneity to the CMS software framework [Slides]

Co-processors or accelerators like GPUs and FPGAs are becoming more and more popular. CMS' data processing framework (CMSSW) implements multi-threading using Intel TBB utilizing tasks as concurrent units of work. We have developed generic mechanisms within the CMSSW framework to interact effectively with non-CPU resources and configure CPU and non-CPU algorithms in a unified way. As a first step to gain experience, we have explored mechanisms for how algorithms could offload work to NVIDIA GPUs with CUDA.

Reconstruction of Charged Particle Tracks in Realistic Detector Geometry Using a Vectorized and Parallelized Kalman Filter Algorithm

DOI: 10.48550/arxiv.1906.11744

Speeding up Particle Track Reconstruction in the CMS Detector using a Vectorized and Parallelized Kalman Filter Algorithm

Building particle tracks is the most computationally intense step of event reconstruction at the LHC. With the increased instantaneous luminosity and associated increase in pileup expected from the High-Luminosity LHC, the computational challenge of track finding and fitting requires novel solutions. The current track reconstruction algorithms used at the LHC are based on Kalman filter methods that achieve good physics performance. By adapting the Kalman filter techniques for use on many-core SIMD architectures such as the Intel Xeon and Intel Xeon Phi and (to a limited degree) NVIDIA GPUs, we are able to obtain significant speedups and comparable physics performance. New optimizations, including a dedicated post-processing step to remove duplicate tracks, have improved the algorithm's performance even further. Here we report on the current structure and performance of the code and future plans for the algorithm.

Parallelized and Vectorized Tracking Using Kalman Filters with CMS Detector Geometry and Events

Neljännen sukupolven fissioreaktorit ja transmutaatio

DOI: 10.48550/arxiv.2101.11489

Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs

We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized implementation of the combinatoric Kalman filter algorithm has enabled efficient global reconstruction of the entire event on modern computer architectures. We demonstrate the performance of the new implementation on Intel Xeon and NVIDIA GPU architectures.

DOI: 10.48550/arxiv.2104.06573

Porting CMS Heterogeneous Pixel Reconstruction to Kokkos

Programming for a diverse set of compute accelerators in addition to the CPU is a challenge. Maintaining separate source code for each architecture would require lots of effort, and development of new algorithms would be daunting if it had to be repeated many times. Fortunately there are several portability technologies on the market such as Alpaka, Kokkos, and SYCL. These technologies aim to improve the developer productivity by making it possible to use the same source code for many different architectures. In this paper we use heterogeneous pixel reconstruction code from the CMS experiment at the CERNL LHC as a realistic use case of a GPU-targeting HEP reconstruction software, and report experience from prototyping a portable version of it using Kokkos. The development was done in a standalone program that attempts to model many of the complexities of a HEP data processing framework such as CMSSW. We also compare the achieved event processing throughput to the original CUDA code and a CPU version of it.

DOI: 10.1051/epjconf/202125103035

Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction

The management of separate memory spaces of CPUs and GPUs brings an additional burden to the development of software for GPUs. To help with this, CUDA unified memory provides a single address space that can be accessed from both CPU and GPU. The automatic data transfer mechanism is based on page faults generated by the memory accesses. This mechanism has a performance cost, that can be with explicit memory prefetch requests. Various hints on the inteded usage of the memory regions can also be given to further improve the performance. The overall effect of unified memory compared to an explicit memory management can depend heavily on the application. In this paper we evaluate the performance impact of CUDA unified memory using the heterogeneous pixel reconstruction code from the CMS experiment as a realistic use case of a GPU-targeting HEP reconstruction software. We also compare the programming model using CUDA unified memory to the explicit management of separate CPU and GPU memory spaces.

DOI: 10.2172/1827369

Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction

memory management can depend heavily on the application. In this paper we evaluate the performance impact of CUDA unified memory using the heterogeneous pixel reconstruction code from the CMS experiment as a realistic use case of a GPU-targeting HEP reconstruction software. We also compare the programming model using CUDA unified memory to the explicit management of separate CPU and GPU memory spaces.

DOI: 10.2172/1827400

Porting CMS Heterogeneous Pixel Reconstruction to Kokkos

realistic use case of a GPU-targeting HEP reconstruction software, and report experience from prototyping a portable version of it using Kokkos. The development was done in a standalone program that attempts to model many of the complexities of a HEP data processing framework such as CMSSW. We also compare the achieved event processing throughput to the original CUDA code and a CPU version of it.

Performance of CUDA Unified Memory in CMS Heterogeneous Pixel Reconstruction