ϟ

Jeroen Hegeman

Here are all the papers by Jeroen Hegeman that you can download and read on OA.mg.
Jeroen Hegeman’s last known institution is . Download Jeroen Hegeman PDFs here.

Claim this Profile →
DOI: 10.1051/epjconf/202429502031
2024
Towards a container-based architecture for CMS data acquisition
The CMS data acquisition (DAQ) is implemented as a service-oriented architecture where DAQ applications, as well as general applications such as monitoring and error reporting, are run as self-contained services. The task of deployment and operation of services is achieved by using several heterogeneous facilities, custom configuration data and scripts in several languages. In this work, we restructure the existing system into a homogeneous, scalable cloud architecture adopting a uniform paradigm, where all applications are orchestrated in a uniform environment with standardized facilities. In this new paradigm DAQ applications are organized as groups of containers and the required software is packaged into container images. Automation of all aspects of coordinating and managing containers is provided by the Kubernetes environment, where a set of physical and virtual machines is unified in a single pool of compute resources. We demonstrate that a container-based cloud architecture provides an acrossthe-board solution that can be applied for DAQ in CMS. We show strengths and advantages of running DAQ applications in a container infrastructure as compared to a traditional application model.
DOI: 10.1051/epjconf/202429502013
2024
First year of experience with the new operational monitoring tool for data taking in CMS during Run 3
The Online Monitoring System (OMS) at the Compact Muon Solenoid experiment (CMS) at CERN aggregates and integrates different sources of information into a central place and allows users to view, compare and correlate information. It displays real-time and historical information. The tool is heavily used by run coordinators, trigger experts and shift crews, to ensure the quality and efficiency of data taking. It provides aggregated information for many use cases including data certification. OMS is the successor of Web Based Monitoring (WBM), which was in use during Run 1 and Run 2 of the LHC. WBM started as a small tool and grew substantially over the years so that maintenance became challenging. OMS was developed from scratch following several design ideas: to strictly separate the presentation layer from the data aggregation layer, to use a well-defined standard for the communication between presentation layer and aggregation layer, and to employ widely used frameworks from outside the HEP community. A report on the experience from the operation of OMS for the first year of data taking of Run 3 in 2022 is presented.
DOI: 10.1051/epjconf/202429502020
2024
MiniDAQ-3: Providing concurrent independent subdetector data-taking on CMS production DAQ resources
The data acquisition (DAQ) of the Compact Muon Solenoid (CMS) experiment at CERN, collects data for events accepted by the Level-1 Trigger from the different detector systems and assembles them in an event builder prior to making them available for further selection in the High Level Trigger, and finally storing the selected events for offline analysis. In addition to the central DAQ providing global acquisition functionality, several separate, so-called “MiniDAQ” setups allow operating independent data acquisition runs using an arbitrary subset of the CMS subdetectors. During Run 2 of the LHC, MiniDAQ setups were running their event builder and High Level Trigger applications on dedicated resources, separate from those used for the central DAQ. This cleanly separated MiniDAQ setups from the central DAQ system, but also meant limited throughput and a fixed number of possible MiniDAQ setups. In Run 3, MiniDAQ-3 setups share production resources with the new central DAQ system, allowing each setup to operate at the maximum Level-1 rate thanks to the reuse of the resources and network bandwidth. Configuration management tools had to be significantly extended to support the synchronization of the DAQ configurations needed for the various setups. We report on the new configuration management features and on the first year of operational experience with the new MiniDAQ-3 system.
DOI: 10.1051/epjconf/202429502011
2024
The CMS Orbit Builder for the HL-LHC at CERN
The Compact Muon Solenoid (CMS) experiment at CERN incorporates one of the highest throughput data acquisition systems in the world and is expected to increase its throughput by more than a factor of ten for High-Luminosity phase of Large Hadron Collider (HL-LHC). To achieve this goal, the system will be upgraded in most of its components. Among them, the event builder software, in charge of assembling all the data read out from the different sub-detectors, is planned to be modified from a single event builder to an orbit builder that assembles multiple events at the same time. The throughput of the event builder will be increased from the current 1.6 Tb/s to 51 Tb/s for the HL-LHC orbit builder. This paper presents preliminary network transfer studies in preparation for the upgrade. The key conceptual characteristics are discussed, concerning differences between the CMS event builder in Run 3 and the CMS Orbit Builder for the HL-LHC. For the feasibility studies, a pipestream benchmark, mimicking event-builder-like traffic has been developed. Preliminary performance tests and results are discussed.
DOI: 10.1103/physrevd.102.092013
2020
Cited 13 times
Measurement of the top quark Yukawa coupling from <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mi>t</mml:mi><mml:mover accent="true"><mml:mi>t</mml:mi><mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math> kinematic distributions in the dilepton final state in proton-proton collisions at <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:msqrt><mml:mi>s</mml:mi></mml:msqrt><mml:mo>=</mml:mo><mml:mn>13</mml:mn><mml:mtext> </mml:…
A measurement of the Higgs boson Yukawa coupling to the top quark is presented using proton-proton collision data at $\sqrt{s} =$ 13 TeV, corresponding to an integrated luminosity of 137 fb$^{-1}$, recorded with the CMS detector. The coupling strength with respect to the standard model value, $Y_\mathrm{t}$, is determined from kinematic distributions in $\mathrm{t\bar{t}}$ final states containing ee, $μμ$, or e$μ$ pairs. Variations of the Yukawa coupling strength lead to modified distributions for $\mathrm{t\bar{t}}$ production. In particular, the distributions of the mass of the $\mathrm{t\bar{t}}$ system and the rapidity difference of the top quark and antiquark are sensitive to the value of $Y_\mathrm{t}$. The measurement yields a best fit value of $Y_\mathrm{t} =$ 1.16 $^{+0.24}_{-0.35}$, bounding $Y_\mathrm{t}$ $\lt$ 1.54 at a 95% confidence level.
DOI: 10.1088/1748-0221/17/05/c05003
2022
Cited 6 times
CMS phase-2 DAQ and timing hub prototyping results and perspectives
Abstract This paper describes recent progress on the design of the DAQ and Timing Hub, or DTH, an ATCA (Advanced Telecommunications Computing Architecture) hub board intended for the phase-2 upgrade of the CMS experiment. Prototyping was originally divided into multiple feature lines, spanning all different aspects of the DTH functionality. The second DTH prototype merges all R&amp;D and prototyping lines into a single board, which is intended to be the production candidate. Emphasis is on the process and experience in going from the first to the second DTH prototype, which included a change of the chosen FPGA as well as the integration of a commercial networking solution.
DOI: 10.1109/tns.2015.2426216
2015
Cited 11 times
The New CMS DAQ System for Run-2 of the LHC
The data acquisition (DAQ) system of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s to the high level trigger (HLT) farm. The HLT farm selects interesting events for storage and offline analysis at a rate of around 1 kHz. The DAQ system has been redesigned during the accelerator shutdown in 2013/14. The motivation is twofold: Firstly, the current compute nodes, networking, and storage infrastructure will have reached the end of their lifetime by the time the LHC restarts. Secondly, in order to handle higher LHC luminosities and event pileup, a number of sub-detectors will be upgraded, increasing the number of readout channels and replacing the off-detector readout electronics with a <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$\mu {\hbox {TCA}}$</tex></formula> implementation. The new DAQ architecture will take advantage of the latest developments in the computing industry. For data concentration, 10/40 Gb/s Ethernet technologies will be used, as well as an implementation of a reduced TCP/IP in FPGA for a reliable transport between custom electronics and commercial computing hardware. A Clos network based on 56 Gb/s FDR Infiniband has been chosen for the event builder with a throughput of <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$\sim 4~\hbox{Tb/s}$</tex> </formula> . The HLT processing is entirely file based. This allows the DAQ and HLT systems to be independent, and to use the HLT software in the same way as for the offline processing. The fully built events are sent to the HLT with 1/10/40 Gb/s Ethernet via network file systems. Hierarchical collection of HLT accepted events and monitoring meta-data are stored into a global file system. This paper presents the requirements, technical choices, and performance of the new system.
DOI: 10.1088/1742-6596/513/1/012042
2014
Cited 10 times
10 Gbps TCP/IP streams from the FPGA for High Energy Physics
The DAQ system of the CMS experiment at CERN collects data from more than 600 custom detector Front-End Drivers (FEDs). During 2013 and 2014 the CMS DAQ system will undergo a major upgrade to address the obsolescence of current hardware and the requirements posed by the upgrade of the LHC accelerator and various detector components. For a loss-less data collection from the FEDs a new FPGA based card implementing the TCP/IP protocol suite over 10Gbps Ethernet has been developed. To limit the TCP hardware implementation complexity the DAQ group developed a simplified and unidirectional but RFC 793 compliant version of the TCP protocol. This allows to use a PC with the standard Linux TCP/IP stack as a receiver. We present the challenges and protocol modifications made to TCP in order to simplify its FPGA implementation. We also describe the interaction between the simplified TCP and Linux TCP/IP stack including the performance measurements.
DOI: 10.1088/1748-0221/8/12/c12039
2013
Cited 10 times
10 Gbps TCP/IP streams from the FPGA for the CMS DAQ eventbuilder network
For the upgrade of the DAQ of the CMS experiment in 2013/2014 an interface between the custom detector Front End Drivers (FEDs) and the new DAQ eventbuilder network has to be designed. For a loss-less data collection from more then 600 FEDs a new FPGA based card implementing the TCP/IP protocol suite over 10Gbps Ethernet has been developed. We present the hardware challenges and protocol modifications made to TCP in order to simplify its FPGA implementation together with a set of performance measurements which were carried out with the current prototype.
DOI: 10.1109/tns.2023.3240539
2023
TCLink: A Fully Integrated Open Core for Timing Compensation in FPGA-Based High-Speed Links
The high luminosity expected in the second phase of the upgrades of the Large Hadron Collider (LHC phase-2 upgrades) will pose unprecedented challenges to its four experiments in terms of collisions density—also known as pile-up—per beam crossing. Disentangling the vertices of 200 simultaneous collisions every 25 ns requires high granularity in the detectors, as well as extremely precise and stable timing. While short-term timing stability is usually a concern addressed in timing distribution systems, long-term variations due to changing environmental conditions can accumulate through distribution chains and can dominate the overall timing stability of the systems they serve. Timing distribution systems in LHC experiments typically use high-speed links and clock recovery. This article presents a logic core that can be used to mitigate long-term temperature variations in high-speed links. The timing compensated link (TCLink) is an open-source firmware core fully integrated in Xilinx Ultrascale Field Programmable Gate Arrays (FPGAs). It demonstrates picosecond-level phase precision over timing distribution systems, improving the overall timing stability in physics experiments.
DOI: 10.1140/epjc/s10052-023-11713-6
2023
The Pixel Luminosity Telescope: a detector for luminosity measurement at CMS using silicon pixel sensors
The Pixel Luminosity Telescope is a silicon pixel detector dedicated to luminosity measurement at the CMS experiment at the LHC. It is located approximately 1.75 m from the interaction point and arranged into 16 "telescopes", with eight telescopes installed around the beam pipe at either end of the detector and each telescope composed of three individual silicon sensor planes. The per-bunch instantaneous luminosity is measured by counting events where all three planes in the telescope register a hit, using a special readout at the full LHC bunch-crossing rate of 40 MHz. The full pixel information is read out at a lower rate and can be used to determine calibrations, corrections, and systematic uncertainties for the online and offline measurements. This paper details the commissioning, operational history, and performance of the detector during Run 2 (2015-18) of the LHC, as well as preparations for Run 3, which will begin in 2022.
DOI: 10.1088/1748-0221/6/08/p08005
2011
Cited 9 times
Design, implementation and first measurements with the Medipix2-MXR detector at the Compact Muon Solenoid experiment
The Medipix detector is the first device dedicated to measuring mixed-field radiation in the CMS cavern and able to distinguish between different particle types. Medipix2-MXR chips bump bonded to silicon sensors with various neutron conversion layers developed by the IEAP CTU in Prague were successfully installed for the 2008 LHC start-up in the CMS experimental and services caverns to measure the flux of various particle types, in particular neutrons. They have operated almost continuously during the 2010 run period, and the results shown here are from the proton run between the beginning of July and the end of October 2010. Clear signals are seen and different particle types have been observed during regular LHC luminosity running, and an agreement in the measured flux rate is found with the simulations. These initial results are promising, and indicate that these devices have the potential for further and future LHC and high energy physics applications as radiation monitoring devices for mixed field environments, including neutron flux monitoring. Further extensions are foreseen in the near future to increase the performance of the detector and its coverage for monitoring in CMS.
DOI: 10.1109/nssmic.2015.7581984
2015
Cited 8 times
The CMS Timing and Control Distribution System
The Compact Muon Solenoid (CMS) experiment operating at the CERN (European Laboratory for Nuclear Physics) Large Hadron Collider (LHC) is in the process of upgrading several of its detector systems. Adding more individual detector components brings the need to test and commission those components separately from existing ones so as not to compromise physics data-taking. The CMS Trigger, Timing and Control (TTC) system had reached its limits in terms of the number of separate elements (partitions) that could be supported. A new Timing and Control Distribution System (TCDS) has been designed, built and commissioned in order to overcome this limit. It also brings additional functionality to facilitate parallel commissioning of new detector elements. The new TCDS system and its components will be described and results from the first operational experience with the TCDS in CMS will be shown.
DOI: 10.1088/1742-6596/396/1/012008
2012
Cited 7 times
The CMS High Level Trigger System: Experience and Future Development
The CMS experiment at the LHC features a two-level trigger system. Events accepted by the first level trigger, at a maximum rate of 100 kHz, are read out by the Data Acquisition system (DAQ), and subsequently assembled in memory in a farm of computers running a software high-level trigger (HLT), which selects interesting events for offline storage and analysis at a rate of order few hundred Hz. The HLT algorithms consist of sequences of offline-style reconstruction and filtering modules, executed on a farm of 0(10000) CPU cores built from commodity hardware. Experience from the operation of the HLT system in the collider run 2010/2011 is reported. The current architecture of the CMS HLT, its integration with the CMS reconstruction framework and the CMS DAQ, are discussed in the light of future development. The possible short- and medium-term evolution of the HLT software infrastructure to support extensions of the HLT computing power, and to address remaining performance and maintenance issues, are discussed.
DOI: 10.22323/1.213.0190
2015
Cited 6 times
Boosting Event Building Performance using Infiniband FDR for the CMS Upgrade
As part of the CMS upgrade during CERN's shutdown period (LS1), the CMS data acquisition system is incorporating Infiniband FDR technology to boost event-building performance for operation from 2015 onwards.Infiniband promises to provide substantial increase in data transmission speeds compared to the older 1GE network used during the 2009-2013 LHC run.Several options exist to end user developers when choosing a foundation for software upgrades, including the uDAPL (DAT Collaborative) and Infiniband verbs libraries (OFED).Due to advances in technology, the CMS data acquisition system will be able to achieve the required throughput of 100 kHz with increased event sizes while downsizing the number of nodes by using a combination of 10GE, 40GE and 56 Gb Infiniband FDR.This paper presents the analysis and results of a comparison between GE and Infiniband solutions as well as a look at how they integrate into an event building architecture, while preserving the scalability, efficiency and deterministic latency expected in a high end data acquisition network.
DOI: 10.22323/1.370.0111
2020
Cited 6 times
First measurements with the CMS DAQ and Timing Hub prototype-1
The DAQ and Timing Hub is an ATCA hub board designed for the Phase-2 upgrade of the CMS experiment.In addition to providing high-speed Ethernet connectivity to all back-end boards, it forms the bridge between the sub-detector electronics and the central DAQ, timing, and trigger control systems.One important requirement is the distribution of several high-precision, phasestable, and LHC-synchronous clock signals for use by the timing detectors.The current paper presents first measurements performed on the initial prototype, with a focus on clock quality.It is demonstrated that the current design provides adequate clock quality to satisfy the requirements of the Phase-2 CMS timing detectors.
DOI: 10.1051/epjconf/202125104023
2021
Cited 5 times
The Phase-2 Upgrade of the CMS Data Acquisition
The High Luminosity LHC (HL-LHC) will start operating in 2027 after the third Long Shutdown (LS3), and is designed to provide an ultimate instantaneous luminosity of 7:5 × 10 34 cm −2 s −1 , at the price of extreme pileup of up to 200 interactions per crossing. The number of overlapping interactions in HL-LHC collisions, their density, and the resulting intense radiation environment, warrant an almost complete upgrade of the CMS detector. The upgraded CMS detector will be read out by approximately fifty thousand highspeed front-end optical links at an unprecedented data rate of up to 80 Tb/s, for an average expected total event size of approximately 8 − 10 MB. Following the present established design, the CMS trigger and data acquisition system will continue to feature two trigger levels, with only one synchronous hardware-based Level-1 Trigger (L1), consisting of custom electronic boards and operating on dedicated data streams, and a second level, the High Level Trigger (HLT), using software algorithms running asynchronously on standard processors and making use of the full detector data to select events for offline storage and analysis. The upgraded CMS data acquisition system will collect data fragments for Level-1 accepted events from the detector back-end modules at a rate up to 750 kHz, aggregate fragments corresponding to individual Level- 1 accepts into events, and distribute them to the HLT processors where they will be filtered further. Events accepted by the HLT will be stored permanently at a rate of up to 7.5 kHz. This paper describes the baseline design of the DAQ and HLT systems for the Phase-2 of CMS.
DOI: 10.1088/1742-6596/160/1/012024
2009
Cited 6 times
Jet energy scale calibration in the D0 experiment
Using the Run IIa data set the D0 experiment has reached a jet energy calibration precision on the level of 1-2% over a wide kinematic range. This paper presents the methods used and the results obtained. Special attention is paid to the remaining systematic uncertainties.
DOI: 10.1088/1742-6596/513/1/012025
2014
Cited 4 times
Prototype of a File-Based High-Level Trigger in CMS
The DAQ system of the CMS experiment at the LHC is upgraded during the accelerator shutdown in 2013/14. To reduce the interdependency of the DAQ system and the high-level trigger (HLT), we investigate the feasibility of using a file-system-based HLT. Events of ~1 MB size are built at the level-1 trigger rate of 100 kHz. The events are assembled by ~50 builder units (BUs). Each BU writes the raw events at ~2GB/s to a local file system shared with Q(10) filter-unit machines (FUs) running the HLT code. The FUs read the raw data from the file system, select Q(1%) of the events, and write the selected events together with monitoring meta-data back to a disk. This data is then aggregated over several steps and made available for offline reconstruction and online monitoring. We present the challenges, technical choices, and performance figures from the prototyping phase. In addition, the steps to the final system implementation will be discussed.
DOI: 10.1088/1742-6596/664/8/082036
2015
Cited 4 times
A scalable monitoring for the CMS Filter Farm based on elasticsearch
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2.
DOI: 10.1088/1742-6596/396/1/012023
2012
Cited 4 times
Status of the CMS Detector Control System
The Compact Muon Solenoid (CMS) is a CERN multi-purpose experiment that exploits the physics of the Large Hadron Collider (LHC). The Detector Control System (DCS) is responsible for ensuring the safe, correct and efficient operation of the experiment, and has contributed to the recording of high quality physics data. The DCS is programmed to automatically react to the LHC operational mode. CMS sub-detectors' bias voltages are set depending on the machine mode and particle beam conditions. An operator provided with a small set of screens supervises the system status summarized from the approximately 6M monitored parameters. Using the experience of nearly two years of operation with beam the DCS automation software has been enhanced to increase the system efficiency by minimizing the time required by sub-detectors to prepare for physics data taking. From the infrastructure point of view the DCS will be subject to extensive modifications in 2012. The current rack mounted control PCs will be replaced by a redundant pair of DELL Blade systems. These blade servers are a high-density modular solution that incorporates servers and networking into a single chassis that provides shared power, cooling and management. This infrastructure modification associated with the migration to blade servers will challenge the DCS software and hardware factorization capabilities. The on-going studies for this migration together with the latest modifications are discussed in the paper.
DOI: 10.1051/epjconf/202024501032
2020
Cited 4 times
40 MHz Level-1 Trigger Scouting for CMS
The CMS experiment will be upgraded for operation at the HighLuminosity LHC to maintain and extend its physics performance under extreme pileup conditions. Upgrades will include an entirely new tracking system, supplemented by a track finder processor providing tracks at Level-1, as well as a high-granularity calorimeter in the endcap region. New front-end and back-end electronics will also provide the Level-1 trigger with high-resolution information from the barrel calorimeter and the muon systems. The upgraded Level-1 processors, based on powerful FPGAs, will be able to carry out sophisticated feature searches with resolutions often similar to the offline ones, while keeping pileup effects under control. In this paper, we discuss the feasibility of a system capturing Level-1 intermediate data at the beam-crossing rate of 40 MHz and carrying out online analyzes based on these limited-resolution data. This 40 MHz scouting system would provide fast and virtually unlimited statistics for detector diagnostics, alternative luminosity measurements and, in some cases, calibrations. It has the potential to enable the study of otherwise inaccessible signatures, either too common to fit in the Level-1 accept budget, or with requirements which are orthogonal to “mainstream” physics, such as long-lived particles. We discuss the requirements and possible architecture of a 40 MHz scouting system, as well as some of the physics potential, and results from a demonstrator operated at the end of Run-2 using the Global Muon Trigger data from CMS. Plans for further demonstrators envisaged for Run-3 are also discussed.
DOI: 10.1109/rtc.2016.7543164
2016
Cited 3 times
Performance of the new DAQ system of the CMS experiment for run-2
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of more than 100GB/s to the Highlevel Trigger (HLT) farm. The HLT farm selects and classifies interesting events for storage and offline analysis at an output rate of around 1 kHz. The DAQ system has been redesigned during the accelerator shutdown in 2013-2014. The motivation for this upgrade was twofold. Firstly, the compute nodes, networking and storage infrastructure were reaching the end of their lifetimes. Secondly, in order to maintain physics performance with higher LHC luminosities and increasing event pileup, a number of sub-detectors are being upgraded, increasing the number of readout channels as well as the required throughput, and replacing the off-detector readout electronics with a MicroTCA-based DAQ interface. The new DAQ architecture takes advantage of the latest developments in the computing industry. For data concentration 10/40 Gbit/s Ethernet technologies are used, and a 56Gbit/s Infiniband FDR CLOS network (total throughput ≈ 4Tbit/s) has been chosen for the event builder. The upgraded DAQ - HLT interface is entirely file-based, essentially decoupling the DAQ and HLT systems. The fully-built events are transported to the HLT over 10/40 Gbit/s Ethernet via a network file system. The collection of events accepted by the HLT and the corresponding metadata are buffered on a global file system before being transferred off-site. The monitoring of the HLT farm and the data-taking performance is based on the Elasticsearch analytics tool. This paper presents the requirements, implementation, and performance of the system. Experience is reported on the first year of operation with LHC proton-proton runs as well as with the heavy ion lead-lead runs in 2015.
DOI: 10.1088/1742-6596/513/1/012014
2014
Cited 3 times
The new CMS DAQ system for LHC operation after 2014 (DAQ2)
The Data Acquisition system of the Compact Muon Solenoid experiment at CERN assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GByte/s. We are presenting the design of the 2nd generation DAQ system, including studies of the event builder based on advanced networking technologies such as 10 and 40 Gbit/s Ethernet and 56 Gbit/s FDR Infiniband and exploitation of multicore CPU architectures. By the time the LHC restarts after the 2013/14 shutdown, the current compute nodes, networking, and storage infrastructure will have reached the end of their lifetime. In order to handle higher LHC luminosities and event pileup, a number of sub-detectors will be upgraded, increase the number of readout channels and replace the off-detector readout electronics with a μTCA implementation. The second generation DAQ system, foreseen for 2014, will need to accommodate the readout of both existing and new off-detector electronics and provide an increased throughput capacity. Advances in storage technology could make it feasible to write the output of the event builder to (RAM or SSD) disks and implement the HLT processing entirely file based.
DOI: 10.1088/1742-6596/396/1/012007
2012
Cited 3 times
Operational experience with the CMS Data Acquisition System
The data-acquisition (DAQ) system of the CMS experiment at the LHC performs the read-out and assembly of events accepted by the first level hardware trigger. Assembled events are made available to the high-level trigger (HLT), which selects interesting events for offline storage and analysis. The system is designed to handle a maximum input rate of 100 kHz and an aggregated throughput of 100 GB/s originating from approximately 500 sources and 10^8 electronic channels. An overview of the architecture and design of the hardware and software of the DAQ system is given. We report on the performance and operational experience of the DAQ and its Run Control System in the first two years of collider runs of the LHC, both in proton-proton and Pb-Pb collisions. We present an analysis of the current performance, its limitations, and the most common failure modes and discuss the ongoing evolution of the HLT capability needed to match the luminosity ramp-up of the LHC.
DOI: 10.1088/1742-6596/513/1/012031
2014
Cited 3 times
Automating the CMS DAQ
We present the automation mechanisms that have been added to the Data Acquisition and Run Control systems of the Compact Muon Solenoid (CMS) experiment during Run 1 of the LHC, ranging from the automation of routine tasks to automatic error recovery and context-sensitive guidance to the operator. These mechanisms helped CMS to maintain a data taking efficiency above 90% and to even improve it to 95% towards the end of Run 1, despite an increase in the occurrence of single-event upsets in sub-detector electronics at high LHC luminosity.
DOI: 10.1088/1742-6596/898/3/032019
2017
Cited 3 times
The CMS Data Acquisition - Architectures for the Phase-2 Upgrade
The upgraded High Luminosity LHC, after the third Long Shutdown (LS3), will provide an instantaneous luminosity of 7.5 × 1034 cm−2s−1 (levelled), at the price of extreme pileup of up to 200 interactions per crossing. In LS3, the CMS Detector will also undergo a major upgrade to prepare for the phase-2 of the LHC physics program, starting around 2025. The upgraded detector will be read out at an unprecedented data rate of up to 50 Tb/s and an event rate of 750 kHz. Complete events will be analysed by software algorithms running on standard processing nodes, and selected events will be stored permanently at a rate of up to 10 kHz for offline processing and analysis.
DOI: 10.1109/tns.2015.2409898
2015
Achieving High Performance With TCP Over 40 GbE on NUMA Architectures for CMS Data Acquisition
TCP and the socket abstraction have barely changed over the last two decades, but at the network layer there has been a giant leap from a few megabits to 100 gigabits in bandwidth. At the same time, CPU architectures have evolved into the multi-core era and applications are expected to make full use of all available resources. Applications in the data acquisition domain based on the standard socket library running in a Non-Uniform Memory Access (NUMA) architecture are unable to reach full efficiency and scalability without the software being adequately aware about the IRQ (Interrupt Request), CPU and memory affinities. During the first long shutdown of LHC, the CMS DAQ system is going to be upgraded for operation from 2015 onwards and a new software component has been designed and developed in the CMS online framework for transferring data with sockets. This software attempts to wrap the low-level socket library to ease higher-level programming with an API based on an asynchronous event driven model similar to the DAT uDAPL API. It is an event-based application with NUMA optimizations, that allows for a high throughput of data across a large distributed system. This paper describes the architecture, the technologies involved and the performance measurements of the software in the context of the CMS distributed event building.
DOI: 10.1088/1742-6596/664/8/082009
2015
Online data handling and storage at the CMS experiment
During the LHC Long Shutdown 1, the CMS Data Acquisition (DAQ) system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and support new detector back-end electronics. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file- based. All the metadata needed for bookkeeping are stored in files as well, in the form of small documents using the JSON encoding. The Storage and Transfer System (STS) is responsible for aggregating these files produced by the HLT, storing them temporarily and transferring them to the T0 facility at CERN for subsequent offline processing. The STS merger service aggregates the output files from the HLT from ∼62 sources produced with an aggregate rate of ∼2GB/s. An estimated bandwidth of 7GB/s in concurrent read/write mode is needed. Furthermore, the STS has to be able to store several days of continuous running, so an estimated of 250TB of total usable disk space is required. In this article we present the various technological and implementation choices of the three components of the STS: the distributed file system, the merger service and the transfer system.
2014
The new CMS DAQ system for LHC operation after 2014 (DAQ2)
The Data Acquisition system of the Compact Muon Solenoid experiment at CERN assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GByte/s. We are presenting the design of the 2nd generation DAQ system, including studies of the event builder based on advanced networking technologies such as 10 and 40 Gbit/s Ethernet and 56 Gbit/s FDR Infiniband and exploitation of multicore CPU architectures. By the time the LHC restarts after the 2013/14 shutdown, the current compute nodes, networking, and storage infrastructure will have reached the end of their lifetime. In order to handle higher LHC luminosities and event pileup, a number of sub-detectors will be upgraded, increase the number of readout channels and replace the off-detector readout electronics with a μTCA implementation. The second generation DAQ system, foreseen for 2014, will need to accommodate the readout of both existing and new off-detector electronics and provide an increased throughput capacity. Advances in storage technology could make it feasible to write the output of the event builder to (RAM or SSD) disks and implement the HLT processing entirely file based.
DOI: 10.1088/1742-6596/396/1/012041
2012
High availability through full redundancy of the CMS detector controls system
The CMS detector control system (DCS) is responsible for controlling and monitoring the detector status and for the operation of all CMS sub detectors and infrastructure. This is required to ensure safe and efficient data taking so that high quality physics data can be recorded. The current system architecture is composed of more than 100 servers in order to provide the required processing resources. An optimization of the system software and hardware architecture is under development to ensure redundancy of all the controlled subsystems and to reduce any downtime due to hardware or software failures. The new optimized structure is based mainly on powerful and highly reliable blade servers and makes use of a fully redundant approach, guaranteeing high availability and reliability. The analysis of the requirements, the challenges, the improvements and the optimized system architecture as well as its specific hardware and software solutions are presented.
DOI: 10.1109/rtc.2014.7097437
2014
The new CMS DAQ system for run-2 of the LHC
Summary form only given. The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s to the high level trigger (HLT) farm. The HLT farm selects interesting events for storage and offline analysis at a rate of around 1 kHz. The DAQ system has been redesigned during the accelerator shutdown in 2013/14. The motivation is twofold: Firstly, the current compute nodes, networking, and storage infrastructure will have reached the end of their lifetime by the time the LHC restarts. Secondly, in order to handle higher LHC luminosities and event pileup, a number of sub-detectors will be upgraded, increasing the number of readout channels and replacing the off-detector readout electronics with a μTCA implementation. The new DAQ architecture will take advantage of the latest developments in the computing industry. For data concentration, 10/40 Gb/s Ethernet technologies will be used, as well as an implementation of a reduced TCP/IP in FPGA for a reliable transport between custom electronics and commercial computing hardware. A 56 Gb/s Infiniband FDR Clos network has been chosen for the event builder with a throughput of ~4 Tb/s. The HLT processing is entirely file based. This allows the DAQ and HLT systems to be independent, and to use the HLT software in the same way as for the offline processing. The fully built events are sent to the HLT with 1/10/40 Gb/s Ethernet via network file systems. Hierarchical collection of HLT accepted events and monitoring meta-data are stored into a global file system. This paper presents the requirements, technical choices, and performance of the new system.
2013
10 Gbps TCP/IP streams from the FPGA for the CMS DAQ eventbuilder network
DOI: 10.1109/tns.2023.3244696
2023
Progress in Design and Testing of the DAQ and Data-Flow Control for the Phase-2 Upgrade of the CMS Experiment
The CMS detector will undergo a major upgrade for the Phase-2 of theLHC program the High-Luminosity LHC.The upgraded CMS detector willbe read out at an unprecedented data rate exceed-ing50 Tb/s, with a Level-1 trigger selecting eventsat a rate of 750 kHz, and an average event size reaching8.5MB.The Phase-2 CMS back-end electronics will bebased on the ATCA standard, with node boards receiving the detectordata from the front-ends via custom, radiation-tolerant, opticallinks.The CMS Phase-2 data acquisition (DAQ) design tightens the integrationbetween trigger control and data flow, extending the synchronousregime of the DAQ system.At the core of the design is the DAQ andTiming Hub, a custom ATCA hub card forming the bridge between thedifferent, detectorspecific, control and readout electronics and thecommon timing, trigger, and control systems.The overall synchronisation and data flow of the experiment is handledby the Trigger and Timing Control and Distribution System (TCDS).Forincreased flexibility during commissioning and calibration runs, thePhase-2 architecture breaks with the traditional distribution tree, infavour of a configurable network connecting multiple independentcontrol units to all off-detector endpoints.This paper describes the overall Phase-2 TCDS architecture, andbriefly compares it to previous CMS implementations.It then discussesthe design and prototyping experience of the DTH, and concludes withthe convergence of this prototyping process into the (pre)productionphase, starting in early 2023.
DOI: 10.1088/1742-6596/898/3/032020
2017
Performance of the CMS Event Builder
DOI: 10.1051/epjconf/201921407017
2019
Experience with dynamic resource provisioning of the CMS online cluster using a cloud overlay
The primary goal of the online cluster of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) is to build event data from the detector and to select interesting collisions in the High Level Trigger (HLT) farm for offline storage. With more than 1500 nodes and a capacity of about 850 kHEPSpecInt06, the HLT machines represent similar computing capacity of all the CMS Tier1 Grid sites together. Moreover, it is currently connected to the CERN IT datacenter via a dedicated 160 Gbps network connection and hence can access the remote EOS based storage with a high bandwidth. In the last few years, a cloud overlay based on OpenStack has been commissioned to use these resources for the WLCG when they are not needed for data taking. This online cloud facility was designed for parasitic use of the HLT, which must never interfere with its primary function as part of the DAQ system. It also allows to abstract from the different types of machines and their underlying segmented networks. During the LHC technical stop periods, the HLT cloud is set to its static mode of operation where it acts like other grid facilities. The online cloud was also extended to make dynamic use of resources during periods between LHC fills. These periods are a-priori unscheduled and of undetermined length, typically of several hours, once or more a day. For that, it dynamically follows LHC beam states and hibernates Virtual Machines (VM) accordingly. Finally, this work presents the design and implementation of a mechanism to dynamically ramp up VMs when the DAQ load on the HLT reduces towards the end of the fill.
2016
Opportunistic usage of the CMS online cluster using a cloud overlay
DOI: 10.1016/j.nuclphysbps.2007.08.074
2007
Jet production in the D0 experiment: measurements and data-to-Monte Carlo comparisons
A preliminary measurement is presented of the inclusive jet production cross section in pp¯ collisions with the D0 detector using an integrated luminosity of ∼800pb−1 of Tevatron RunII data. The cross section is studied as a function of jet pT and rapidity and compared to perturbative QCD predictions in next-to-leading order including two-loop threshold corrections. Also presented is a preliminary measurement of Z/γ∗+jet production based on ∼950pb−1. A comparison to the sherpa event generator shows excellent agreement for jet multiplicities and good agreement for the pT spectra of the jets and the Z boson and for the inter-jet angular correlations.
DOI: 10.1088/1742-6596/664/8/082035
2015
A New Event Builder for CMS Run II
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100GB/s to the high-level trigger (HLT) farm. The DAQ system has been redesigned during the LHC shutdown in 2013/14. The new DAQ architecture is based on state-of-the-art network technologies for the event building. For the data concentration, 10/40 Gbps Ethernet technologies are used together with a reduced TCP/IP protocol implemented in FPGA for a reliable transport between custom electronics and commercial computing hardware. A 56 Gbps Infiniband FDR CLOS network has been chosen for the event builder. This paper discusses the software design, protocols, and optimizations for exploiting the hardware capabilities. We present performance measurements from small-scale prototypes and from the full-scale production system.
DOI: 10.1088/1742-6596/664/8/082033
2015
File-based data flow in the CMS Filter Farm
During the LHC Long Shutdown 1, the CMS Data Acquisition system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and prepare the ground for future upgrades of the detector front-ends. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file- based. This approach provides additional decoupling between the HLT algorithms and the input and output data flow. All the metadata needed for bookkeeping of the data flow and the HLT process lifetimes are also generated in the form of small "documents" using the JSON encoding, by either services in the flow of the HLT execution (for rates etc.) or watchdog processes. These "files" can remain memory-resident or be written to disk if they are to be used in another part of the system (e.g. for aggregation of output data). We discuss how this redesign improves the robustness and flexibility of the CMS DAQ and the performance of the system currently being commissioned for the LHC Run 2.
DOI: 10.18429/jacow-icalepcs2015-wepgf013
2015
Increasing Availability by Implementing Software Redundancy in the CMS Detector Control System
DOI: 10.5281/zenodo.18897
2015
rootpy: 0.8.0
DOI: 10.1088/1742-6596/396/1/012038
2012
Distributed error and alarm processing in the CMS data acquisition system
The error and alarm system for the data acquisition of the Compact Muon Solenoid (CMS) at CERN was successfully used for the physics runs at Large Hadron Collider (LHC) during first three years of activities. Error and alarm processing entails the notification, collection, storing and visualization of all exceptional conditions occurring in the highly distributed CMS online system using a uniform scheme. Alerts and reports are shown on-line by web application facilities that map them to graphical models of the system as defined by the user. A persistency service keeps a history of all exceptions occurred, allowing subsequent retrieval of user defined time windows of events for later playback or analysis. This paper describes the architecture and the technologies used and deals with operational aspects during the first years of LHC operation. In particular we focus on performance, stability, and integration with the CMS sub-detectors.
DOI: 10.1088/1742-6596/396/1/012039
2012
Upgrade of the CMS Event Builder
The Data Acquisition system of the Compact Muon Solenoid experiment at CERN assembles events at a rate of 100 kHz, transporting event data at an aggregate throughput of 100 GB/s. By the time the LHC restarts after the 2013/14 shut-down, the current computing and networking infrastructure will have reached the end of their lifetime. This paper presents design studies for an upgrade of the CMS event builder based on advanced networking technologies such as 10/40 Gb/s Ethernet and Infiniband. The results of performance measurements with small-scale test setups are shown.
DOI: 10.22323/1.270.0022
2017
Opportunistic usage of the CMS online cluster using a cloud overlay
After two years of maintenance and upgrade, the Large Hadron Collider (LHC), the largest and most powerful particle accelerator in the world, has started its second three year run. Around 1500 computers make up the CMS (Compact Muon Solenoid) Online cluster. This cluster is used for Data Acquisition of the CMS experiment at CERN, selecting and sending to storage around 20 TBytes of data per day that are then analysed by the Worldwide LHC Computing Grid (WLCG) infrastructure that links hundreds of data centres worldwide. 3000 CMS physicists can access and process data, and are always seeking more computing power and data. The backbone of the CMS Online cluster is composed of 16000 cores which provide as much computing power as all CMS WLCG Tier1 sites (352K HEP-SPEC-06 score in the CMS cluster versus 300K across CMS Tier1 sites). The computing power available in the CMS cluster can significantly speed up the processing of data, so an effort has been made to allocate the resources of the CMS Online cluster to the grid when it isn’t used to its full capacity for data acquisition. This occurs during the maintenance periods when the LHC is non-operational, which corresponded to 117 days in 2015. During 2016, the aim is to increase the availability of the CMS Online cluster for data processing by making the cluster accessible during the time between two physics collisions while the LHC and beams are being prepared. This is usually the case for a few hours every day, which would vastly increase the computing power available for data processing. Work has already been undertaken to provide this functionality, as an OpenStack cloud layer has been deployed as a minimal overlay that leaves the primary role of the cluster untouched. This overlay also abstracts the different hardware and networks that the cluster is composed of. The operation of the cloud (starting and stopping the virtual machines) is another challenge that has been overcome as the cluster has only a few hours spare during the aforementioned beam preparation. By improving the virtual image deployment and integrating the OpenStack services with the core services of the Data Acquisition on the CMS Online cluster it is now possible to start a thousand virtual machines within 10 minutes and to turn them off within seconds. This document will explain the architectural choices that were made to reach a fully redundant and scalable cloud, with a minimal impact on the running cluster configuration while giving a maximal segregation between the services. It will also present how to cold start 1000 virtual machines 25 times faster, using tools commonly utilised in all data centres.
DOI: 10.22323/1.313.0075
2018
The FEROL40, a microTCA card interfacing custom point-to-point links and standard TCP/IP
In order to accommodate new back-end electronics of upgraded CMS sub-detectors, a new FEROL40 card in the microTCA standard has been developed. The main function of the FEROL40 is to acquire event data over multiple point-to-point serial optical links, provide buffering, perform protocol conversion, and transmit multiple TCP/IP streams (4x10Gbps) to the Ethernet network of the aggregation layer of the CMS DAQ (data acquisition) event builder. This contribution discusses the design of the FEROL40 and experience from operation
DOI: 10.22323/1.343.0129
2019
Design and development of the DAQ and Timing Hub for CMS Phase-2
The CMS detector will undergo a major upgrade for Phase-2 of the LHC program, starting around 2026.The upgraded Level-1 hardware trigger will select events at a rate of 750 kHz.At an expected event size of 7.4 MB this corresponds to a data rate of up to 50 Tbit/s.Optical links will carry the signals from on-detector front-end electronics to back-end electronics in ATCA crates in the service cavern.A DAQ and Timing Hub board aggregates data streams from back-end boards over point-to-point links, provides buffering and transmits the data to the commercial data-to-surface network for processing and storage.This hub board is also responsible for the distribution of timing, control and trigger signals to the back-ends.This paper presents the current development towards the DAQ and Timing Hub and the design of the first prototype, to be used as for validation and integration with the first back-end prototypes in 2019-2020.
DOI: 10.1051/epjconf/201921401015
2019
Operational experience with the new CMS DAQ-Expert
The data acquisition (DAQ) system of the Compact Muon Solenoid (CMS) at CERN reads out the detector at the level-1 trigger accept rate of 100 kHz, assembles events with a bandwidth of 200 GB/s, provides these events to the high level-trigger running on a farm of about 30k cores and records the accepted events. Comprising custom-built and cutting edge commercial hardware and several 1000 instances of software applications, the DAQ system is complex in itself and failures cannot be completely excluded. Moreover, problems in the readout of the detectors,in the first level trigger system or in the high level trigger may provoke anomalous behaviour of the DAQ systemwhich sometimes cannot easily be differentiated from a problem in the DAQ system itself. In order to achieve high data taking efficiency with operators from the entire collaboration and without relying too heavily on the on-call experts, an expert system, the DAQ-Expert, has been developed that can pinpoint the source of most failures and give advice to the shift crew on how to recover in the quickest way. The DAQ-Expert constantly analyzes monitoring data from the DAQ system and the high level trigger by making use of logic modules written in Java that encapsulate the expert knowledge about potential operational problems. The results of the reasoning are presented to the operator in a web-based dashboard, may trigger sound alerts in the control room and are archived for post-mortem analysis - presented in a web-based timeline browser. We present the design of the DAQ-Expert and report on the operational experience since 2017, when it was first put into production.
DOI: 10.1051/epjconf/201921401048
2019
A Scalable Online Monitoring System Based on Elasticsearch for Distributed Data Acquisition in Cms
The part of the CMS Data Acquisition (DAQ) system responsible for data readout and event building is a complex network of interdependent distributed applications. To ensure successful data taking, these programs have to be constantly monitored in order to facilitate the timeliness of necessary corrections in case of any deviation from specified behaviour. A large number of diverse monitoring data samples are periodically collected from multiple sources across the network. Monitoring data are kept in memory for online operations and optionally stored on disk for post-mortem analysis. We present a generic, reusable solution based on an open source NoSQL database, Elasticsearch, which is fully compatible and non-intrusive with respect to the existing system. The motivation is to benefit from an offthe-shelf software to facilitate the development, maintenance and support efforts. Elasticsearch provides failover and data redundancy capabilities as well as a programming language independent JSON-over-HTTP interface. The possibility of horizontal scaling matches the requirements of a DAQ monitoring system. The data load from all sources is balanced by redistribution over an Elasticsearch cluster that can be hosted on a computer cloud. In order to achieve the necessary robustness and to validate the scalability of the approach the above monitoring solution currently runs in parallel with an existing in-house developed DAQ monitoring system.
DOI: 10.1051/epjconf/202024501028
2020
DAQExpert the service to increase CMS data-taking efficiency
The Data Acquisition (DAQ) system of the Compact Muon Solenoid (CMS) experiment at the LHC is a complex system responsible for the data readout, event building and recording of accepted events. Its proper functioning plays a critical role in the data-taking efficiency of the CMS experiment. In order to ensure high availability and recover promptly in the event of hardware or software failure of the subsystems, an expert system, the DAQ Expert, has been developed. It aims at improving the data taking efficiency, reducing the human error in the operations and minimising the on-call expert demand. Introduced in the beginning of 2017, it assists the shift crew and the system experts in recovering from operational faults, streamlining the post mortem analysis and, at the end of Run 2, triggering fully automatic recovery without human intervention. DAQ Expert analyses the real-time monitoring data originating from the DAQ components and the high-level trigger updated every few seconds. It pinpoints data flow problems, and recovers them automatically or after given operator approval. We analyse the CMS downtime in the 2018 run focusing on what was improved with the introduction of automated recovery; present challenges and design of encoding the expert knowledge into automated recovery jobs. Furthermore, we demonstrate the web-based, ReactJS interfaces that ensure an effective cooperation between the human operators in the control room and the automated recovery system. We report on the operational experience with automated recovery.
2014
10 Gbps TCP/IP streams from the FPGA for High Energy Physics
The DAQ system of the CMS experiment at CERN collects data from more than 600 custom detector Front-End Drivers (FEDs). During 2013 and 2014 the CMS DAQ system will undergo a major upgrade to address the obsolescence of current hardware and the requirements posed by the upgrade of the LHC accelerator and various detector components. For a loss-less data collection from the FEDs a new FPGA based card implementing the TCP/IP protocol suite over 10Gbps Ethernet has been developed. To limit the TCP hardware implementation complexity the DAQ group developed a simplified and unidirectional but RFC 793 compliant version of the TCP protocol. This allows to use a PC with the standard Linux TCP/IP stack as a receiver. We present the challenges and protocol modifications made to TCP in order to simplify its FPGA implementation. We also describe the interaction between the simplified TCP and Linux TCP/IP stack including the performance measurements.
DOI: 10.5281/zenodo.18815
2015
rootpy: 0.7.0
DOI: 10.18429/jacow-icalepcs2015-tua3o01
2015
Detector Controls Meets JEE on the Web
2015
File-based data flow in the CMS Filter Farm
2015
A scalable monitoring for the CMS Filter Farm based on elasticsearch
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central” es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioningmore » for LHC Run 2.« less
2015
Online data handling and storage at the CMS experiment
DOI: 10.5281/zenodo.18816
2015
rootpy: 0.6.0
2014
Automating the CMS DAQ
2014
Boosting Event Building Performance using Infiniband FDR for the CMS Upgrade
DOI: 10.1109/rtc.2014.7097439
2014
Achieving high performance with TCP over 40GbE on NUMA architectures for CMS data acquisition
TCP and the socket abstraction have barely changed over the last two decades, but at the network layer there has been a giant leap from a few megabits to 100 gigabits in bandwidth. At the same time, CPU architectures have evolved into the multicore era and applications are expected to make full use of all available resources. Applications in the data acquisition domain based on the standard socket library running in a Non-Uniform Memory Access (NUMA) architecture are unable to reach full efficiency and scalability without the software being adequately aware about the IRQ (Interrupt Request), CPU and memory affinities. During the first long shutdown of LHC, the CMS DAQ system is going to be upgraded for operation from 2015 onwards and a new software component has been designed and developed in the CMS online framework for transferring data with sockets. This software attempts to wrap the low-level socket library to ease higher-level programming with an API based on an asynchronous event driven model similar to the DAT uDAPL API. It is an event-based application with NUMA optimizations, that allows for a high throughput of data across a large distributed system. This paper describes the architecture, the technologies involved and the performance measurements of the software in the context of the CMS distributed event building.
DOI: 10.1088/1742-6596/396/4/042049
2012
Health and performance monitoring of the online computer cluster of CMS
The CMS experiment at the LHC features over 2'500 devices that need constant monitoring in order to ensure proper data taking. The monitoring solution has been migrated from Nagios to Icinga, with several useful plugins. The motivations behind the migration and the selection of the plugins are discussed.
2017
New operator assistance features in the CMS Run Control System
DOI: 10.1088/1742-6596/898/3/032028
2017
New operator assistance features in the CMS Run Control System
During Run-1 of the LHC, many operational procedures have been automated in the run control system of the Compact Muon Solenoid (CMS) experiment. When detector high voltages are ramped up or down or upon certain beam mode changes of the LHC, the DAQ system is automatically partially reconfigured with new parameters. Certain types of errors such as errors caused by single-event upsets may trigger an automatic recovery procedure. Furthermore, the top-level control node continuously performs cross-checks to detect sub-system actions becoming necessary because of changes in configuration keys, changes in the set of included front-end drivers or because of potential clock instabilities. The operator is guided to perform the necessary actions through graphical indicators displayed next to the relevant command buttons in the user interface. Through these indicators, consistent configuration of CMS is ensured. However, manually following the indicators can still be inefficient at times. A new assistant to the operator has therefore been developed that can automatically perform all the necessary actions in a streamlined order. If additional problems arise, the new assistant tries to automatically recover from these. With the new assistant, a run can be started from any state of the sub-systems with a single click. An ongoing run may be recovered with a single click, once the appropriate recovery action has been selected. We review the automation features of CMS Run Control and discuss the new assistant in detail including first operational experience.
DOI: 10.3990/1.9789064880322
2009
Measurement of the top quark pair production cross section in proton-antiproton collisions at Vs=1.96 TeV : hadronic top decays with the d0 detector
Of the six quarks in the standard model the top quark is by far the heaviest: 35 times more massive than its partner the bottom quark and more than 130 times heavier than the average of the other five quarks. Its correspondingly small decay width means it tends to decay before forming a bound state. Of all quarks, therefore, the top is the least affected by quark confinement, behaving almost as a free quark. Since in the standard model top quarks couple almost exclusively to bottom quarks (t ! Wb), top quark decays provide a window on the standard model through the direct measurement of the Cabibbo-Kobayashi-Maskawa quark mixing matrix element Vtb. In the same way any lack of top quark decays into W bosons could imply the existence of decay channels beyond the standard model, for example charged Higgs bosons as expected in two-doublet Higgs models: t ! H+b. This thesis sets out to measure the top-antitop quark pair production cross section at a center-of-mass energy of ps = 1:96 TeV in the fully hadronic decay channel. The analysis is performed on 1 fb1 of Tevatron Run IIa data taken with the D0 detector between July 2002 and February 2006. A neural network is used to identify jets from b-quarks and a likelihood ratio method is used to separate signal from background. To avoid reliance on, possibly imperfect, Monte Carlo models for the modelling of the QCD background, the background was modelled using a dedicated data sample. The tt signal was modelled using the alpgen and pythia Monte Carlo event generators. The generated signal sample was passed through the full, geant based, D0 detector simulation and reconstructed using the default D0 reconstruction software.
DOI: 10.1088/1742-6596/110/2/022018
2008
Inclusive jet production at the tevatron using the d0 experiment
A preliminary measurement is presented of the inclusive jet production cross section in pp collisions at a center-of-mass energy of s√ = 1960GeV. The data was taken with the D0 detector and represents an integrated luminosity of ~ 900pb-1 of Tevatron RunII data. The cross section is studied as a function of jet transverse momentum (pt) and rapidity (y) and compared to perturbative QCD predictions in next-to-leading order including two-loop threshold corrections.
DOI: 10.2172/951334
2009
Measurement of the top quark pair production cross section in proton-antiproton collisions at a center of mass energy of 1.96 TeV, hadronic top decays with the D0 detector
Of the six quarks in the standard model the top quark is by far the heaviest: 35 times more massive than its partner the bottom quark and more than 130 times heavier than the average of the other five quarks. Its correspondingly small decay width means it tends to decay before forming a bound state. Of all quarks, therefore, the top is the least affected by quark confinement, behaving almost as a free quark. Its large mass also makes the top quark a key player in the realm of the postulated Higgs boson, whose coupling strengths to particles are proportional to their masses. Precision measurements of particle masses for e.g. the top quark and the W boson can hereby provide indirect constraints on the Higgs boson mass. Since in the standard model top quarks couple almost exclusively to bottom quarks (t → Wb), top quark decays provide a window on the standard model through the direct measurement of the Cabibbo-Kobayashi-Maskawa quark mixing matrix element Vtb. In the same way any lack of top quark decays into W bosons could imply the existence of decay channels beyond the standard model, for example charged Higgs bosons as expected in two-doublet Higgs models: t → H+b. Within the standard model top quark decays can be classified by the (lepton or quark) W boson decay products. Depending on the decay of each of the W bosons, t$\bar{t}$ pair decays can involve either no leptons at all, or one or two isolated leptons from direct W → e$\bar{v}${sub e} and W → μ$\bar{v}$μ decays. Cascade decays like b → Wc → e$\bar{v}$ec can lead to additional non-isolated leptons. The fully hadronic decay channel, in which both Ws decay into a quark-antiquark pair, has the largest branching fraction of all t$\bar{t}$ decay channels and is the only kinematically complete (i.e. neutrino-less) channel. It lacks, however, the clear isolated lepton signature and is therefore hard to distinguish from the multi-jet QCD background. It is important to measure the cross section (or branching fraction) in each channel independently to fully verify the standard model. Top quark pair production proceeds through the strong interaction, placing the scene for top quark physics at hadron colliders. This adds an additional challenge: the huge background from multi-jet QCD processes. At the Tevatron, for example, t$\bar{t}$ production is completely hidden in light q$\bar{q}$ pair production. The light (i.e. not bottom or top) quark pair production cross section is six orders of magnitude larger than that for t$\bar{t}$ production. Even including the full signature of hadronic t$\bar{t}$ decays, two b-jets and four additional jets, the QCD cross section for processes with similar signature is more than five times larger than for t$\bar{t}$ production. The presence of isolated leptons in the (semi)leptonic t$\bar{t}$ decay channels provides a clear characteristic to distinguish the t$\bar{t}$ signal from QCD background but introduces a multitude of W- and Z-related backgrounds.
DOI: 10.1088/1742-6596/1085/3/032021
2018
DAQExpert - An expert system to increase CMS data-taking efficiency
The efficiency of the Data Acquisition (DAQ) of the Compact Muon Solenoid (CMS) experiment for LHC Run 2 is constantly being improved. A significant factor affecting the data taking efficiency is the experience of the DAQ operator. One of the main responsibilities of the DAQ operator is to carry out the proper recovery procedure in case of failure of data-taking. At the start of Run 2, understanding the problem and finding the right remedy could take a considerable amount of time (up to many minutes). Operators heavily relied on the support of on-call experts, also outside working hours. Wrong decisions due to time pressure sometimes lead to an additional overhead in recovery time. To increase the efficiency of CMS data-taking we developed a new expert system, the DAQExpert, which provides shifters with optimal recovery suggestions instantly when a failure occurs. DAQExpert is a web application analyzing frequently updating monitoring data from all DAQ components and identifying problems based on expert knowledge expressed in small, independent logic-modules written in Java. Its results are presented in real-time in the control room via a web-based GUI and a sound-system in a form of short description of the current failure, and steps to recover.
DOI: 10.22323/1.313.0123
2018
CMS DAQ Current and Future Hardware Upgrades up to Post Long Shutdown 3 (LS3) Times
Following the first LHC collisions seen and recorded by CMS in 2009, the DAQ hardware went through a major upgrade during LS1 (2013-2014) and new detectors have been connected during 2015-2016 and 2016-2017 winter shutdowns.Now, LS2 (2019-2020) and LS3 (2024-mid 2026) are actively being prepared.This paper shows how CMS DAQ hardware has evolved from the beginning and will continue to evolve in order to meet the future challenges posed by High Luminosity LHC (HL-LHC) and the CMS detector evolution.In particular, post LS3 DAQ architectures are focused upon.
DOI: 10.48550/arxiv.1806.08975
2018
The CMS Data Acquisition System for the Phase-2 Upgrade
During the third long shutdown of the CERN Large Hadron Collider, the CMS Detector will undergo a major upgrade to prepare for Phase-2 of the CMS physics program, starting around 2026. The upgraded CMS detector will be read out at an unprecedented data rate of up to 50 Tb/s with an event rate of 750 kHz, selected by the level-1 hardware trigger, and an average event size of 7.4 MB. Complete events will be analyzed by the High-Level Trigger (HLT) using software algorithms running on standard processing nodes, potentially augmented with hardware accelerators. Selected events will be stored permanently at a rate of up to 7.5 kHz for offline processing and analysis. This paper presents the baseline design of the DAQ and HLT systems for Phase-2, taking into account the projected evolution of high speed network fabrics for event building and distribution, and the anticipated performance of general purpose CPU. In addition, some opportunities offered by reading out and processing parts of the detector data at the full LHC bunch crossing rate (40 MHz) are discussed.
DOI: 10.1051/epjconf/201921401006
2019
The CMS Event-Builder System for LHC Run 3 (2021-23)
The data acquisition system (DAQ) of the CMS experiment at the CERN Large Hadron Collider (LHC) assembles events of 2MB at a rate of 100 kHz. The event builder collects event fragments from about 750 sources and assembles them into complete events which are then handed to the High-Level Trigger (HLT) processes running on O (1000) computers. The aging eventbuilding hardware will be replaced during the long shutdown 2 of the LHC taking place in 2019/20. The future data networks will be based on 100 Gb/s interconnects using Ethernet and Infiniband technologies. More powerful computers may allow to combine the currently separate functionality of the readout and builder units into a single I/O processor handling simultaneously 100 Gb/s of input and output traffic. It might be beneficial to preprocess data originating from specific detector parts or regions before handling it to generic HLT processors. Therefore, we will investigate how specialized coprocessors, e.g. GPUs, could be integrated into the event builder. We will present the envisioned changes to the event-builder compared to today’s system. Initial measurements of the performance of the data networks under the event-building traffic pattern will be shown. Implications of a folded network architecture for the event building and corresponding changes to the software implementation will be discussed.
DOI: 10.1051/epjconf/201921401044
2019
Presentation layer of CMS Online Monitoring System
The Compact Muon Solenoid (CMS) is one of the experiments at the CERN Large Hadron Collider (LHC). The CMS Online Monitoring system (OMS) is an upgrade and successor to the CMS Web-Based Monitoring (WBM)system, which is an essential tool for shift crew members, detector subsystem experts, operations coordinators, and those performing physics analyses. The CMS OMS is divided into aggregation and presentation layers. Communication between layers uses RESTful JSON:API compliant requests. The aggregation layer is responsible for collecting data from heterogeneous sources, storage of transformed and pre-calculated (aggregated) values and exposure of data via the RESTful API. The presentation layer displays detector information via a modern, user-friendly and customizable web interface. The CMS OMS user interface is composed of a set of cutting-edge software frameworks and tools to display non-event data to any authenticated CMS user worldwide. The web interface tree-like component structure comprises (top-down): workspaces, folders, pages, controllers and portlets. A clear hierarchy gives the required flexibility and control for content organization. Each bottom element instantiates a portlet and is a reusable component that displays a single aspect of data, like a table, a plot, an article, etc. Pages consist of multiple different portlets and can be customized at runtime by using a drag-and-drop technique. This is how a single page can easily include information from multiple online sources. Different pages give access to a summary of the current status of the experiment, as well as convenient access to historical data. This paper describes the CMS OMS architecture, core concepts and technologies of the presentation layer.
DOI: 10.18429/jacow-pcapac2018-wep17
2019
Extending the Remote Control Capabilities in the CMS Detector Control System with Remote Procedure Call Services
The CMS Detector Control System (DCS) is implemented as a large distributed and redundant system, with applications interacting and sharing data in multiple ways. The CMS XML-RPC is a software toolkit implementing the standard Remote Procedure Call (RPC) protocol, using the Extensible Mark-up Language (XML) and a custom lightweight variant using the JavaScript Object Notation (JSON) to model, encode and expose resources through the Hypertext Transfer Protocol (HTTP). The CMS XML-RPC toolkit complies with the standard specification of the XML-RPC protocol that allows system developers to build collaborative software architectures with self-contained and reusable logic, and with encapsulation of well-defined processes. The implementation of this protocol introduces not only a powerful communication method to operate and exchange data with web-based applications, but also a new programming paradigm to design service-oriented software architectures within the CMS DCS domain. This paper presents details of the CMS XML-RPC implementation in WinCC Open Architecture (OA) Control Language using an object-oriented approach.
DOI: 10.1109/nssmic.2006.354161
2006
The New D� Level-1 Calorimeter Trigger
With increasing Tevatron luminosity, efficient triggers that meet the bandwidth limitations of the experiment's data acquisition system become more and more difficult to construct. To meet these challenges, the DOslash experiment has significantly enhanced its triggering capabilities. A major component of this upgrade is a completely re-designed Level-1 calorimeter trigger (L1Cal). This new system uses novel architecture and algorithms to maintain acceptable background rejection while preserving or even improving signal efficiency at the highest luminosities foreseen. We describe interesting features of the L1Cal and give highlights from its first few months of operation.
2021
Measurement of the top quark mass using events with a single reconstructed top quark in pp collisions at $\sqrt{s}$ = 13 TeV
A measurement of the top quark mass is performed using a data sample enriched with single top quark events produced in the $t$ channel. The study is based on proton-proton collision data, corresponding to an integrated luminosity of 35.9 fb$^{-1}$, recorded at $\sqrt{s}$ = 13 TeV by the CMS experiment at the LHC in 2016.Candidate events are selected by requiring an isolated high-momentum lepton (muon or electron) and exactly two jets, of which one is identified as originating from a bottom quark. Multivariate discriminants are designed to separate the signal from the background. Optimized thresholds are placed on the discriminant outputs to obtain an event sample with high signal purity. The top quark mass is found to be 172.13$^{+0.76}_{-0.77}$ GeV, where the uncertainty includes both the statistical and systematic components, reaching sub-GeV precision for the first time in this event topology. The masses of the top quark and antiquark are also determined separately using the lepton charge in the final state, from which the mass ratio and difference are determined to be 0.9952$^{+0.0079}_{-0.0104}$ and 0.83$^{+1.79}_{-1.35}$ GeV, respectively. The results are consistent with $CPT$ invariance.
2021
Observation of $\mathrm{B^{0}_{s}}$ mesons and measurement of the $\mathrm{B^{0}_{s}}/\mathrm{B^{+}}$ yield ratio in PbPb collisions at ${\sqrt {\smash [b]{s_{_{\mathrm {NN}}}}}} = $ 5.02 TeV
2021
High precision measurements of Z boson production in PbPb collisions at ${\sqrt {\smash [b]{s_{_{\mathrm {NN}}}}}} = $ 5.02 TeV
The CMS experiment at the LHC has measured the differential cross sections of Z bosons decaying to pairs of leptons, as functions of transverse momentum and rapidity, in lead-lead collisions at a nucleon-nucleon center-of-mass energy of 5.02 TeV. The measured Z boson elliptic azimuthal anisotropy coefficient is compatible with zero, showing that Z bosons do not experience significant final-state interactions in the medium produced in the collision. Yields of Z bosons are compared to Glauber model predictions and are found to deviate from these expectations in peripheral collisions, indicating the presence of initial collision geometry and centrality selection effects. The precision of the measurement allows, for the first time, for a data-driven determination of the nucleon-nucleon integrated luminosity as a function of lead-lead centrality, thereby eliminating the need for its estimation based on a Glauber model.
DOI: 10.1002/tox.2530070101
1992
Masthead
Environmental Toxicology and Water QualityVolume 7, Issue 1 p. fmi-fmi MastheadFree Access Masthead First published: February 1992 https://doi.org/10.1002/tox.2530070101AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat No abstract is available for this article. Volume7, Issue1February 1992Pages fmi-fmi RelatedInformation
DOI: 10.1002/tox.2530080101
1993
Masthead
Environmental Toxicology and Water QualityVolume 8, Issue 1 p. fmi-fmi MastheadFree Access Masthead First published: February 1993 https://doi.org/10.1002/tox.2530080101AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat No abstract is available for this article. Volume8, Issue1February 1993Pages fmi-fmi RelatedInformation
DOI: 10.1002/tox.v7:2
1992
DOI: 10.1002/tox.2530070301
1992
Masthead
Environmental Toxicology and Water QualityVolume 7, Issue 3 p. fmi-fmi MastheadFree Access Masthead First published: August 1992 https://doi.org/10.1002/tox.2530070301AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat No abstract is available for this article. Volume7, Issue3August 1992Pages fmi-fmi RelatedInformation
DOI: 10.1002/tox.v8:4
1993
DOI: 10.1002/tox.2530070401
1992
Masthead
Environmental Toxicology and Water QualityVolume 7, Issue 4 p. fmi-fmi MastheadFree Access Masthead First published: November 1992 https://doi.org/10.1002/tox.2530070401AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat No abstract is available for this article. Volume7, Issue4November 1992Pages fmi-fmi RelatedInformation