ϟ

Manuel Giffels

Here are all the papers by Manuel Giffels that you can download and read on OA.mg.
Manuel Giffels’s last known institution is . Download Manuel Giffels PDFs here.

Claim this Profile →
DOI: 10.1140/epjc/s10052-008-0715-2
2008
Cited 315 times
Flavor physics of leptons and dipole moments
This chapter of the report of the “Flavor in the era of the LHC” Workshop discusses the theoretical, phenomenological and experimental issues related to flavor phenomena in the charged lepton sector and in flavor conserving CP-violating processes. We review the current experimental limits and the main theoretical models for the flavor structure of fundamental particles. We analyze the phenomenological consequences of the available data, setting constraints on explicit models beyond the standard model, presenting benchmarks for the discovery potential of forthcoming measurements both at the LHC and at low energy, and exploring options for possible future experiments.
DOI: 10.1088/1742-6596/2438/1/012039
2023
Extending the distributed computing infrastructure of the CMS experiment with HPC resources
Abstract Particle accelerators are an important tool to study the fundamental properties of elementary particles. Currently the highest energy accelerator is the LHC at CERN, in Geneva, Switzerland. Each of its four major detectors, such as the CMS detector, produces dozens of Petabytes of data per year to be analyzed by a large international collaboration. The processing is carried out on the Worldwide LHC Computing Grid, that spans over more than 170 compute centers around the world and is used by a number of particle physics experiments. Recently the LHC experiments were encouraged to make increasing use of HPC resources. While Grid resources are homogeneous with respect to the used Grid middleware, HPC installations can be very different in their setup. In order to integrate HPC resources into the highly automatized processing setups of the CMS experiment a number of challenges need to be addressed. For processing, access to primary data and metadata as well as access to the software is required. At Grid sites all this is achieved via a number of services that are provided by each center. However at HPC sites many of these capabilities cannot be easily provided and have to be enabled in the user space or enabled by other means. At HPC centers there are often restrictions regarding network access to remote services, which is again a severe limitation. The paper discusses a number of solutions and recent experiences by the CMS experiment to include HPC resources in processing campaigns.
DOI: 10.48550/arxiv.2403.14903
2024
Modeling Distributed Computing Infrastructures for HEP Applications
Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
DOI: 10.48550/arxiv.2404.02100
2024
Analysis Facilities White Paper
This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HSF pre-CHEP workshop, that took place in May 2023. The paper attempts to cover all the aspects of an analysis facility.
DOI: 10.1051/epjconf/202429507020
2024
Federated Heterogeneous Compute and Storage Infrastructure for the PUNCH4NFDI Consortium
PUNCH4NFDI, funded by the Germany Research Foundation initially for five years, is a diverse consortium of particle, astro-, astroparticle, hadron and nuclear physics embedded in the National Research Data Infrastructure initiative. In order to provide seamless and federated access to the huge variety of compute and storage systems provided by the participating communities covering their very diverse needs, the Compute4PUNCH and Storage4PUNCH concepts have been developed. Both concepts comprise state-of-the-art technologies such as a token-based AAI for standardized access to compute and storage resources. The community supplied heterogeneous HPC, HTC and Cloud compute resources are dynamically and transparently integrated into one federated HTCondorbased overlay batch system using the COBalD/TARDIS resource meta-scheduler. Traditional login nodes and a JupyterHub provide entry points into the entire landscape of available compute resources, while container technologies and the CERN Virtual Machine File System (CVMFS) ensure a scalable provisioning of community-specific software environments. In Storage4PUNCH, community supplied storage systems mainly based on dCache or XRootD technology are being federated in a common infrastructure employing methods that are well established in the wider HEP community. Furthermore existing technologies for caching as well as metadata handling are being evaluated with the aim for a deeper integration. The combined Compute4PUNCH and Storage4PUNCH environment will allow a large variety of researchers to carry out resource-demanding analysis tasks. In this contribution we will present the Compute4PUNCH and Storage4PUNCH concepts, the current status of the developments as well as first experiences with scientific applications being executed on the available prototypes.
DOI: 10.1051/epjconf/202429504032
2024
Modeling Distributed Computing Infrastructures for HEP Applications
Predicting the performance of various infrastructure design options in complex federated infrastructures with computing sites distributed over a wide area network that support a plethora of users and workflows, such as the Worldwide LHC Computing Grid (WLCG), is not trivial. Due to the complexity and size of these infrastructures, it is not feasible to deploy experimental test-beds at large scales merely for the purpose of comparing and evaluating alternate designs. An alternative is to study the behaviours of these systems using simulation. This approach has been used successfully in the past to identify efficient and practical infrastructure designs for High Energy Physics (HEP). A prominent example is the Monarc simulation framework, which was used to study the initial structure of the WLCG. New simulation capabilities are needed to simulate large-scale heterogeneous computing systems with complex networks, data access and caching patterns. A modern tool to simulate HEP workloads that execute on distributed computing infrastructures based on the SimGrid and WRENCH simulation frameworks is outlined. Studies of its accuracy and scalability are presented using HEP as a case-study. Hypothetical adjustments to prevailing computing architectures in HEP are studied providing insights into the dynamics of a part of the WLCG and candidates for improvements.
DOI: 10.1088/1742-6596/513/4/042052
2014
Cited 17 times
The CMS Data Management System
The data management elements in CMS are scalable, modular, and designed to work together. The main components are PhEDEx, the data transfer and location system; the Data Booking Service (DBS), a metadata catalog; and the Data Aggregation Service (DAS), designed to aggregate views and provide them to users and services. Tens of thousands of samples have been cataloged and petabytes of data have been moved since the run began. The modular system has allowed the optimal use of appropriate underlying technologies. In this contribution we will discuss the use of both Oracle and NoSQL databases to implement the data management elements as well as the individual architectures chosen. We will discuss how the data management system functioned during the first run, and what improvements are planned in preparation for 2015.
DOI: 10.1103/physrevd.77.073010
2008
Cited 15 times
Lepton-flavor-violating decay<mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline"><mml:mi>τ</mml:mi><mml:mo>→</mml:mo><mml:mi>μ</mml:mi><mml:mi>μ</mml:mi><mml:mover accent="true"><mml:mi>μ</mml:mi><mml:mo>¯</mml:mo></mml:mover></mml:math>at the CERN LHC
Lepton-flavor-violating $\ensuremath{\tau}$ decays are predicted in many extensions of the standard model at a rate observable at future collider experiments. In this article we focus on the decay $\ensuremath{\tau}\ensuremath{\rightarrow}\ensuremath{\mu}\ensuremath{\mu}\overline{\ensuremath{\mu}}$, which is a promising channel to observe lepton-flavor violation at the CERN Large Hadron Collider (LHC). We present analytic expressions for the differential decay width derived from a model-independent effective Lagrangian with general four-fermion operators, and estimate the experimental acceptance for detecting the decay $\ensuremath{\tau}\ensuremath{\rightarrow}\ensuremath{\mu}\ensuremath{\mu}\overline{\ensuremath{\mu}}$ at the LHC. Specific emphasis is given to decay angular distributions and how they can be used to discriminate new physics models. We provide specific predictions for various extensions of the standard model, including supersymmetric, little Higgs, and technicolor models.
DOI: 10.1088/1742-6596/513/3/032040
2014
Cited 7 times
CMS computing operations during run 1
During the first run, CMS collected and processed more than 10B data events and simulated more than 15B events. Up to 100k processor cores were used simultaneously and 100PB of storage was managed. Each month petabytes of data were moved and hundreds of users accessed data samples. In this document we discuss the operational experience from this first run. We present the workflows and data flows that were executed, and we discuss the tools and services developed, and the operations and shift models used to sustain the system. Many techniques were followed from the original computing planning, but some were reactions to difficulties and opportunities. We also address the lessons learned from an operational perspective, and how this is shaping our thoughts for 2015.
DOI: 10.1016/j.nuclphysbps.2004.07.005
2006
Cited 9 times
Charge Transfer of GEM Structures in High Magnetic Fields
We report on measurements of a triple GEM (Gas Electron Multiplier) structure in high magnetic fields up to 5 T which were performed in the framework of the R&D work for a Time Projection Chamber at a future Linear Collider. The determination of charge transfer is performed using a triple GEM structure installed into a small test chamber which is irraditated with a 55Fe source. The measurements are parametrised using a functional depen- dence on the electric setup which was motivated by detailed numerical simulations of a GEM using the programs MAXWELL and GARFIELD. This parametrisation of a single GEM foil is extended to a model which describes the performance of the triple GEM structure and allows to predict the parameter setup leading to minimum ion backdrift. Applying this setup, ion backdrift of only 2.5 permille is achieved. Also the use of MHSPs (Micro Hole Strip Plates) for ion backdrift reduction is investigated. Setting an optimised negative strip voltage, a suppression factor of approximately 4 is reached at 4 T magnetic field. Additionally, the width of the charge cloud of individual 55Fe photons is measured using a fine segmented strip readout after the triple GEM structure. Charge widths between 0.2 and 0.3 mm RMS are observed, which appear to be dominated by the diffusion in the space between the individual GEMs. This charge broadening is partly suppressed at high magnetic fields.
DOI: 10.1051/epjconf/202024507040
2020
Cited 5 times
Lightweight dynamic integration of opportunistic resources
To satisfy future computing demands of the Worldwide LHC Computing Grid (WLCG), opportunistic usage of third-party resources is a promising approach. While the means to make such resources compatible with WLCG requirements are largely satisfied by virtual machines and containers technologies, strategies to acquire and disband many resources from many providers are still a focus of current research. Existing meta-schedulers that manage resources in the WLCG are hitting the limits of their design when tasked to manage heterogeneous resources from many diverse resource providers. To provide opportunistic resources to the WLCG as part of a regular WLCG site, we propose a new meta-scheduling approach suitable for opportunistic, heterogeneous resource provisioning. Instead of anticipating future resource requirements, our approach observes resource usage and promotes well-used resources. Following this approach, we have developed an inherently robust meta-scheduler, COBalD, for managing diverse, heterogeneous resources given unpredictable resource requirements. This paper explains the key concepts of our approach, and discusses the benefits and limitations of our new approach to dynamic resource provisioning compared to previous approaches.
DOI: 10.1088/1742-6596/513/6/062051
2014
Cited 4 times
Integration and validation testing for PhEDEx, DBS and DAS with the PhEDEx LifeCycle agent
The ever-increasing amount of data handled by the CMS dataflow and workflow management tools poses new challenges for cross-validation among different systems within CMS experiment at LHC. To approach this problem we developed an integration test suite based on the LifeCycle agent, a tool originally conceived for stress-testing new releases of PhEDEx, the CMS data-placement tool. The LifeCycle agent provides a framework for customising the test workflow in arbitrary ways, and can scale to levels of activity well beyond those seen in normal running. This means we can run realistic performance tests at scales not likely to be seen by the experiment for some years, or with custom topologies to examine particular situations that may cause concern some time in the future.
DOI: 10.1051/epjconf/201921408009
2019
Cited 4 times
<b>Dynamic Integration and Management of Opportunistic Resources for HEP</b>
Demand for computing resources in high energy physics (HEP) shows a highly dynamic behavior, while the provided resources by the Worldwide LHC Computing Grid (WLCG) remains static. It has become evident that opportunistic resources such as High Performance Computing (HPC) centers and commercial clouds are well suited to cover peak loads. However, the utilization of these resources gives rise to new levels of complexity, e.g. resources need to be managed highly dynamically and HEP applications require a very specific software environment usually not provided at opportunistic resources. Furthermore, aspects to consider are limitations in network bandwidth causing I/O-intensive workflows to run inefficiently. The key component to dynamically run HEP applications on opportunistic resources is the utilization of modern container and virtualization technologies. Based on these technologies, the Karlsruhe Institute of Technology (KIT) has developed ROCED, a resource manager to dynamically integrate and manage a variety of opportunistic resources. In combination with ROCED, HTCondor batch system acts as a powerful single entry point to all available computing resources, leading to a seamless and transparent integration of opportunistic resources into HEP computing. KIT is currently improving the resource management and job scheduling by focusing on I/O requirements of individual workflows, available network bandwidth as well as scalability. For these reasons, we are currently developing a new resource manager, called TARDIS. In this paper, we give an overview of the utilized technologies, the dynamic management, and integration of resources as well as the status of the I/O-based resource and job scheduling.
DOI: 10.1088/1742-6596/608/1/012018
2015
Cited 3 times
Tier 3 batch system data locality via managed caches
Modern data processing increasingly relies on data locality for performance and scalability, whereas the common HEP approaches aim for uniform resource pools with minimal locality, recently even across site boundaries. To combine advantages of both, the High- Performance Data Analysis (HPDA) Tier 3 concept opportunistically establishes data locality via coordinated caches.
DOI: 10.1088/1742-6596/513/4/042022
2014
Cited 3 times
Data Bookkeeping Service 3 – Providing event metadata in CMS
The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production, processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.
DOI: 10.1088/1742-6596/331/7/072049
2011
Cited 3 times
Design and early experience with promoting user-created data in CMS
The Computing Model of the CMS experiment [1] does not address transfering user-created data between different Grid sites. Due to the limited resources of a single site, distribution of individual user-created datasets between sites is crucial to ensure accessibility. In contrast to official datasets, there are no special requirements for user datasets (e.g. concerning data quality). The StoreResults service provides a mechanism to elevate user-created datasets to central bookkeeping ensuring the data quality is the same as an official dataset. This is a prerequisite for further distribution within the CMS dataset infrastructure.
DOI: 10.1088/1742-6596/898/5/052021
2017
Cited 3 times
On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers
This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers.
DOI: 10.1051/epjconf/201921404007
2019
Cited 3 times
Advancing throughput of HEP analysis work-flows using caching concepts
High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX.
DOI: 10.1051/epjconf/202024507038
2020
Cited 3 times
Effective Dynamic Integration and Utilization of Heterogenous Compute Resources
Increased operational effectiveness and the dynamic integration of only temporarily available compute resources (opportunistic resources) becomes more and more important in the next decade, due to the scarcity of resources for future high energy physics experiments as well as the desired integration of cloud and high performance computing resources. This results in a more heterogenous compute environment, which gives rise to huge challenges for the computing operation teams of the experiments. At the Karlsruhe Institute of Technology (KIT) we design solutions to tackle these challenges. In order to ensure an efficient utilization of opportunistic resources and unified access to the entire infrastructure, we developed the Transparent Adaptive Resource Dynamic Integration System (TARDIS). A scalable multi-agent resource manager providing interfaces to provision as well as dynamically and transparently integrate resources of various providers into one common overlay batch system. Operational effectiveness is guaranteed by relying on COBalD – the Opportunistic Balancing Daemon and its simple approach of taking into account the utilization and allocation of the different resource types, in order to run the individual workflows on the best-suited resource respectively. In this contribution we will present the current status of integrating various HPC centers and cloud providers into the compute infrastructure at the Karlsruhe Institute of Technology as well as our experiences gained in a production environment.
DOI: 10.1088/1742-6596/664/9/092008
2015
High Performance Data Analysis via Coordinated Caches
With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume of data. In addition to storage capacities, a key factor for future computing infrastructure is therefore input bandwidth available per core. Modern data analysis infrastructure relies on one of two paradigms: data is kept on dedicated storage and accessed via network or distributed over all compute nodes and accessed locally. Dedicated storage allows data volume to grow independently of processing capacities, whereas local access allows processing capacities to scale linearly. However, with the growing data volume and processing requirements, HEP will require both of these features. For enabling adequate user analyses in the future, the KIT CMS group is merging both paradigms: popular data is spread over a local disk layer on compute nodes, while any data is available from an arbitrarily sized background storage. This concept is implemented as a pool of distributed caches, which are loosely coordinated by a central service. A Tier 3 prototype cluster is currently being set up for performant user analyses of both local and remote data.
DOI: 10.1088/1742-6596/396/5/052036
2012
Data Bookkeeping Service 3 - A new event data catalog for CMS
The Data Bookkeeping Service (DBS) provides an event data catalog for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) Experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It contains all the necessary information used for tracking datasets, like their processing history and associations between runs, files and datasets, on a large scale of about 105 datasets and more than 107 files. The DBS is widely used within CMS, since all kind of data-processing like Monte Carlo production, processing of recorded event data as well as physics analysis done by the user, are relying on the information stored in DBS.
DOI: 10.52825/cordi.v1i.261
2023
Distributed Computing and Storage Infrastructure for PUNCH4NFDI
The PUNCH4NFDI consortium brings together scientists from the German particle physics, hadron and nuclear physics, astronomy, and astro-particle physics communities to improve the management and (re-)use of scientific data from these interrelated communities. The PUNCH sciences have a long tradition of building large instruments that are planned, constructed and operated by international collaborations. While the large collaborations typically employ advanced tools for data management and distribution, smaller-scale experiments often suffer from very limited resources to address these aspects. One of the aims of the consortium is to evaluate and enable or adopt existing solutions. Instances of a prototype federated and distributed computing and storage infrastructure have been set up at a handful of sites in Germany. This prototype is used to gain experience in running of scientific workflows to further guide the development of the Science Data Platform, which is an overarching goal of the consortium.
DOI: 10.1088/1742-6596/1085/3/032056
2018
Mastering Opportunistic Computing Resources for HEP
As results of the excellent LHC performance in 2016, more data than expected has been recorded leading to a higher demand for computing resources. It is already foreseeable that for the current and upcoming run periods a flat computing budget and the expected technology advance will not be sufficient to meet the future requirements. This results in a growing gap between supplied and demanded resources.
DOI: 10.1088/1742-6596/664/5/052019
2015
Active Job Monitoring in Pilots
Recent developments in high energy physics (HEP) including multi-core jobs and multi-core pilots require data centres to gain a deep understanding of the system to monitor, design, and upgrade computing clusters. Networking is a critical component. Especially the increased usage of data federations, for example in diskless computing centres or as a fallback solution, relies on WAN connectivity and availability. The specific demands of different experiments and communities, but also the need for identification of misbehaving batch jobs, requires an active monitoring. Existing monitoring tools are not capable of measuring fine-grained information at batch job level. This complicates network-aware scheduling and optimisations. In addition, pilots add another layer of abstraction. They behave like batch systems themselves by managing and executing payloads of jobs internally. The number of real jobs being executed is unknown, as the original batch system has no access to internal information about the scheduling process inside the pilots. Therefore, the comparability of jobs and pilots for predicting run-time behaviour or network performance cannot be ensured. Hence, identifying the actual payload is important. At the GridKa Tier 1 centre a specific tool is in use that allows the monitoring of network traffic information at batch job level. This contribution presents the current monitoring approach and discusses recent efforts and importance to identify pilots and their substructures inside the batch system. It will also show how to determine monitoring data of specific jobs from identified pilots. Finally, the approach is evaluated.
DOI: 10.1088/1742-6596/664/2/022022
2015
Dynamic provisioning of local and remote compute resources with OpenStack
Modern high-energy physics experiments rely on the extensive usage of computing resources, both for the reconstruction of measured events as well as for Monte-Carlo simulation. The Institut fur Experimentelle Kernphysik (EKP) at KIT is participating in both the CMS and Belle experiments with computing and storage resources. In the upcoming years, these requirements are expected to increase due to growing amount of recorded data and the rise in complexity of the simulated events. It is therefore essential to increase the available computing capabilities by tapping into all resource pools.
DOI: 10.1088/1742-6596/608/1/012017
2015
Analyzing data flows of WLCG jobs at batch job level
With the introduction of federated data access to the workflows of WLCG, it is becoming increasingly important for data centers to understand specific data flows regarding storage element accesses, firewall configurations, as well as the scheduling of batch jobs themselves. As existing batch system monitoring and related system monitoring tools do not support measurements at batch job level, a new tool has been developed and put into operation at the GridKa Tier 1 center for monitoring continuous data streams and characteristics of WLCG jobs and pilots. Long term measurements and data collection are in progress. These measurements already have been proven to be useful analyzing misbehaviors and various issues. Therefore we aim for an automated, realtime approach for anomaly detection. As a requirement, prototypes for standard workflows have to be examined. Based on measurements of several months, different features of HEP jobs are evaluated regarding their effectiveness for data mining approaches to identify these common workflows. The paper will introduce the actual measurement approach and statistics as well as the general concept and first results classifying different HEP job workflows derived from the measurements at GridKa.
DOI: 10.1088/1742-6596/898/5/052034
2017
Opportunistic data locality for end user data analysis
With the increasing data volume of LHC Run2, user analyses are evolving towards increasing data throughput. This evolution translates to higher requirements for efficiency and scalability of the underlying analysis infrastructure. We approach this issue with a new middleware to optimise data access: a layer of coordinated caches transparently provides data locality for high-throughput analyses. We demonstrated the feasibility of this approach with a prototype used for analyses of the CMS working groups at KIT. In this paper, we present our experience both with the approach in general, and our prototype in specific.
DOI: 10.1088/1742-6596/1085/3/032005
2018
Provisioning of data locality for HEP analysis workflows
The heavily increasing amount of data produced by current experiments in high energy particle physics challenge both end users and providers of computing resources. The boosted data rates and the complexity of analyses require huge datasets being processed in short turnaround cycles. Usually, data storages and computing farms are deployed by different providers, which leads to data delocalization and a strong influence of the interconnection transfer rates. The CMS collaboration at KIT has developed a prototype enabling data locality for HEP analysis processing via two concepts. A coordinated and distributed caching approach that reduce the limiting factor of data transfers by joining local high performance devices with large background storages were tested. Thereby, a throughput optimization was reached by selecting and allocating critical data within user work-flows. A highly performant setup using these caching solutions enables fast processing of throughput dependent analysis workflows.
DOI: 10.15496/publikation-29051
2019
Dynamic Resource Extension for Data Intensive Computing with Specialized Software Environments on HPC Systems
2019
Effective Dynamic Integration and Utilization of Heterogenous Compute Resources
Increased operational effectiveness and the dynamic integration of only temporarily available compute resources (opportunistic resources) becomes more and more important in the next decade, due to the scarcity of resources for future high energy physics experiments as well as the desired integration of cloud and high performance computing resources. This results in a more heterogenous compute environment, which gives rise to huge challenges for the computing operation teams of the experiments. At the Karlsruhe Institute of Technology we design solutions to tackle these challenges. In order to ensure an efficient utilization of opportunistic resources and unified access to the entire infrastructure, we developed the Transparent Adaptive Resource Dynamic Integration System (TARDIS). A scalable multi-agent resource manager providing interfaces to provision as well as dynamically and transparently integrate resources of various providers into one common overlay batch system. Operational effectiveness is guaranteed by relying on COBalD - the Opportunistic Balancing Daemon and its simple approach of taking into account the utilization and allocation of the different resource types, in order to run the individual workflows on the best-suited resource respectively. In this contribution we will present the current status of integrating various HPC centers and cloud providers into the compute infrastructure at the Karlsruhe Institute of Technology as well as our experiences gained in a production environment.
DOI: 10.1088/1742-6596/1525/1/012055
2020
Federation of compute resources available to the German CMS community
Abstract The German CMS community (DCMS) as a whole can benefit from the various compute resources, available to its different institutes. While Grid-enabled and National Analysis Facility resources are usually shared within the community, local and recently enabled opportunistic resources like HPC centers and cloud resources are not. Furthermore, there is no shared submission infrastructure available. Via HTCondor’s [1] mechanisms to connect resource pools, several remote pools can be connected transparently to the users and therefore used more efficiently by a multitude of user groups. In addition to the statically provisioned resources, also dynamically allocated resources from external cloud providers as well as HPC centers can be integrated. However, the usage of such dynamically allocated resources gives rise to additional complexity. Constraints on access policies of the resources, as well as workflow necessities have to be taken care of. To maintain a well-defined and reliable runtime environment on each resource, virtualization and containerization technologies such as virtual machines, Docker, and Singularity, are used.
DOI: 10.1088/1742-6596/1525/1/012065
2020
Boosting Performance of Data-intensive Analysis Workflows with Distributed Coordinated Caching
Abstract Data-intensive end-user analyses in high energy physics require high data throughput to reach short turnaround cycles. This leads to enormous challenges for storage and network infrastructure, especially when facing the tremendously increasing amount of data to be processed during High-Luminosity LHC runs. Including opportunistic resources with volatile storage systems into the traditional HEP computing facilities makes this situation more complex. Bringing data close to the computing units is a promising approach to solve throughput limitations and improve the overall performance. We focus on coordinated distributed caching by coordinating workows to the most suitable hosts in terms of cached files. This allows optimizing overall processing efficiency of data-intensive workows and efficiently use limited cache volume by reducing replication of data on distributed caches. We developed a NaviX coordination service at KIT that realizes coordinated distributed caching using XRootD cache proxy server infrastructure and HTCondor batch system. In this paper, we present the experience gained in operating coordinated distributed caches on cloud and HPC resources. Furthermore, we show benchmarks of a dedicated high throughput cluster, the Throughput-Optimized Analysis-System (TOpAS), which is based on the above-mentioned concept.
DOI: 10.1088/1742-6596/1525/1/012067
2020
HEP Analyses on Dynamically Allocated Opportunistic Computing Resources
Abstract The current experiments in high energy physics (HEP) have a huge data rate. To convert the measured data, an enormous number of computing resources is needed and will further increase with upgraded and newer experiments. To fulfill the ever-growing demand the allocation of additional, potentially only temporary available non-HEP dedicated resources is important. These so-called opportunistic resources cannot only be used for analyses in general but are also well-suited to cover the typical unpredictable peak demands for computing resources. For both use cases, the temporary availability of the opportunistic resources requires a dynamic allocation, integration, and management, while their heterogeneity requires optimization to maintain high resource utilization by allocating best matching resources. To find the best matching resources which should be allocated is challenging due to the unpredictable submission behavior as well as an ever-changing mixture of workflows with different requirements. Instead of predicting the best matching resource, we base our decisions on the utilization of resources. For this reason, we are developing the resource manager TARDIS (Transparent Adaptive Resource Dynamic Integration System) which manages and dynamically requests or releases resources. The decision of how many resources TARDIS has to request is implemented in COBalD (COBald - The Opportunistic Balancing Daemon) to ensure further allocation of well-used resources while reducing the amount of insufficiently used ones. TARDIS allocates and manages resources from various resource providers such as HPC centers or commercial and public clouds while ensuring a dynamic allocation and efficient utilization of these heterogeneous opportunistic resources. Furthermore, TARDIS integrates the allocated opportunistic resources into one overlay batch system which provides a single point of entry for all users. In order to provide the dedicated HEP software environment, we use virtualization and container technologies. In this contribution, we give an overview of the dynamic integration of opportunistic resources via TARDIS/COBalD in our HEP institute as well as how user analyses benefit from these additional resources.
DOI: 10.1051/epjconf/202024507007
2020
Setup and commissioning of a high-throughput analysis cluster
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster setup at Karlsruhe Institute of Technology. This cluster was designed for use cases demanding high data-throughput. Based on hyper converged servers this cluster offers 500 job slots and 1 PB of local storage. Combined with the 100 Gb network connection between the servers and a 200 Gb uplink to the Tier-1 storage, the cluster can sustain a data-throughput of 1 PB per day. In addition, the local storage provided by the hyper converged worker nodes can be used as cache space. This allows employing of caching approaches on the cluster, thereby enabling a more efficient usage of the disk space. In previous contributions this concept has been shown to lead to an expected speedup of 2 to 4 compared to conventional setups.
DOI: 10.1051/epjconf/202125102059
2021
Opportunistic transparent extension of a WLCG Tier 2 center using HPC resources
Computing resource needs are expected to increase drastically in the future. The HEP experiments ATLAS and CMS foresee an increase of a factor of 5-10 in the volume of recorded data in the upcoming years. The current infrastructure, namely the WLCG, is not sufficient to meet the demands in terms of computing and storage resources. The usage of non HEP specific resources is one way to reduce this shortage. However, using them comes at a cost: First, with multiple of such resources at hand, it gets more and more diffcult for the single user, as each resource normally requires its own authentication and has its own way of accessing it. Second, as they are not specifically designed for HEP workflows, they might lack dedicated software or other necessary services. Allocating the resources at the different providers can be done by COBalD/TARDIS, developed at KIT. The resource manager integrates resources on demand into one overlay batch system, providing the user with a single point of entry. The software and services, needed for the communities workflows, are transparently served through containers. With this, an HPC cluster at RWTH Aachen University is dynamically and transparently integrated into a Tier 2 WLCG resource, virtually doubling its computing capacities.
DOI: 10.1088/1742-6596/762/1/012011
2016
Data Locality via Coordinated Caching for Distributed Processing
To enable data locality, we have developed an approach of adding coordinated caches to existing compute clusters. Since the data stored locally is volatile and selected dynamically, only a fraction of local storage space is required. Our approach allows to freely select the degree at which data locality is provided. It may be used to work in conjunction with large network bandwidths, providing only highly used data to reduce peak loads. Alternatively, local storage may be scaled up to perform data analysis even with low network bandwidth.
DOI: 10.1088/1742-6596/762/1/012002
2016
A scalable architecture for online anomaly detection of WLCG batch jobs
For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring traffic data and characteristics of WLCG batch jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.
2009
Study of the sensitivity of CMS to the lepton flavour violating neutrinoless $\tau$ decay $\tau \to \mu\mu\mu$
DOI: 10.1016/j.nuclphysbps.2009.03.052
2009
Lepton Flavour Violation in the Neutrinoless τ Decay with the CMS Experiment
The lepton flavour violating decay τ → μ μ μ is predicted in many extensions of the Standard Model with branching ratios partially next to the current experimental upper limit of 3.2 × 10 − 8 . Therefore, lepton flavour violating τ decays are an interessant option in search of new physics.
2008
The lepton-flavour violating decay τ\to μμ\antimu at the LHC
DOI: 10.1007/978-3-540-95942-7_1
2009
Flavor physics of leptons and dipole moments
DOI: 10.15496/publikation-25203
2018
High precision calculations of particle physics at the NEMO cluster in Freiburg
DOI: 10.15496/publikation-25195
2018
Proceedings of the 4th bwHPC Symposium
DOI: 10.1007/s41781-019-0024-5
2019
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
A setup for dynamically providing resources of an external, non-dedicated cluster to researchers of the ATLAS and CMS experiments in the WLCG environment is described as it has been realized at the NEMO High Performance Computing cluster at the University of Freiburg. Techniques to provide the full WLCG software environment in a virtual machine image are described. The interplay between the schedulers for NEMO and for the external clusters is coordinated through the $$\texttt{ROCED}$$ service. A cloud computing infrastructure is deployed at NEMO to orchestrate the simultaneous usage by bare metal and virtualized jobs. Through the setup, resources are provided to users in a transparent, automatized, and on-demand way. The performance of the virtualized environment has been evaluated for particle physics applications.
2019
Concept of federating German CMS Tier 3 resources
2018
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has been made available to researchers of the ATLAS and CMS experiments. Users access the cluster from external machines connected to the World-wide LHC Computing Grid (WLCG). This paper describes how the full software environment of the WLCG is provided in a virtual machine image. The interplay between the schedulers for NEMO and for the external clusters is coordinated through the ROCED service. A cloud computing infrastructure is deployed at NEMO to orchestrate the simultaneous usage by bare metal and virtualized jobs. Through the setup, resources are provided to users in a transparent, automatized, and on-demand way. The performance of the virtualized environment has been evaluated for particle physics applications.
DOI: 10.1051/epjconf/201921403027
2019
Modeling and Simulation of Load Balancing Strategies for Computing in High Energy Physics
The amount of data to be processed by experiments in high energy physics (HEP) will increase tremendously in the coming years. To cope with this increasing load, most efficient usage of the resources is mandatory. Furthermore, the computing resources for user jobs in HEP will be increasingly distributed and heterogeneous, resulting in more difficult scheduling due to the increasing complexity of the system. We aim to create a simulation for the WLCG helping the HEP community to solve both challenges: a more efficient utilization of the grid and coping with the rising complexity of the system. There is currently no simulation in existence which helps the operators of the grid to make the correct decisions while optimizing the load balancing strategy. This paper presents a proof of concept in which the computing jobs at the Tier 1 center GridKa are modeled and simulated. To model the computing jobs we extended the Palladio simulator with a mechanism to simulate load balancing strategies. Furthermore, we implemented an automated model parameter analysis and model creation. Finally, the simulation results are validated using real-word performance data. Our results suggest that simulating larger parts of the grid is feasible and can help to optimize the utilization of the grid.
2020
Dynamic Computing Resource Extension Using COBalD/TARDIS
DOI: 10.48550/arxiv.1812.11044
2018
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has been made available to researchers of the ATLAS and CMS experiments. Users access the cluster from external machines connected to the World-wide LHC Computing Grid (WLCG). This paper describes how the full software environment of the WLCG is provided in a virtual machine image. The interplay between the schedulers for NEMO and for the external clusters is coordinated through the ROCED service. A cloud computing infrastructure is deployed at NEMO to orchestrate the simultaneous usage by bare metal and virtualized jobs. Through the setup, resources are provided to users in a transparent, automatized, and on-demand way. The performance of the virtualized environment has been evaluated for particle physics applications.
DOI: 10.1109/nssmic.2005.1596406
2006
R &amp; D Work for GEM-Based High Resolution TPC at the ILC
We report on R&D work for a time projection chamber (TPC) at the International Linear Collider (ILC). A high resolution TPC with gas amplification based on micropattern gas detectors is one of the options for the main tracking system at the ILC detector. The physics to be studied and the environment at the ILC pose new challenges to the performance of all detector components. For instance the momentum resolution of the tracker should be improved by an order of magnitude with respect to LEP detectors. Significant progress towards this goal has been achieved recently and we report here on results of our groups which are studying a TPC concept with triple gas electron multiplier (GEM) structures for gas amplification. The required spatial resolution of 100 /spl mu/m has been achieved in measurements with a prototype TPC in high magnetic fields. The problem of ion backdrift has been studied and found to be significantly reduced in such GEM structures. In addition, a low mass TPC field cage has been designed and constructed demonstrating a way to minimise the total radiation length of the ILC tracker. A detailed simulation for a TPC has been developed. Simulations concerning the background at the ILC are ongoing.
2005
Simulation for a TPC at the ILC
2005
Development of readout electronics for the operation of a TPC with GEMs
2005
Development of the readout software for the calibration test facility of a TPC
2005
Construction of a hodoscope for the study of a prototype TPC
2005
Construction of a TPC prototype with GEM readout
DOI: 10.1051/epjconf/202125102039
2021
Transparent Integration of Opportunistic Resources into the WLCG Compute Infrastructure
The inclusion of opportunistic resources, for example from High Performance Computing (HPC) centers or cloud providers, is an important contribution to bridging the gap between existing resources and future needs by the LHC collaborations, especially for the HL-LHC era. However, the integration of these resources poses new challenges and often needs to happen in a highly dynamic manner. To enable an effective and lightweight integration of these resources, the tools COBalD and TARDIS are developed at KIT. In this contribution we report on the infrastructure we use to dynamically offer opportunistic resources to collaborations in the World Wide LHC Computing Grid (WLCG). The core components are COBalD/TARDIS, HTCondor, CVMFS and modern virtualization technology. The challenging task of managing the opportunistic resources is performed by COBalD/TARDIS. We showcase the challenges, employed solutions and experiences gained with the provisioning of opportunistic resources from several resource providers like university clusters, HPC centers and cloud setups in a multi VO environment. This work can serve as a blueprint for approaching the provisioning of resources from other resource providers.