ϟ

Christoph Heidecker

Here are all the papers by Christoph Heidecker that you can download and read on OA.mg.
Christoph Heidecker’s last known institution is . Download Christoph Heidecker PDFs here.

Claim this Profile →
DOI: 10.1051/epjconf/201921408009
2019
Cited 4 times
<b>Dynamic Integration and Management of Opportunistic Resources for HEP</b>
Demand for computing resources in high energy physics (HEP) shows a highly dynamic behavior, while the provided resources by the Worldwide LHC Computing Grid (WLCG) remains static. It has become evident that opportunistic resources such as High Performance Computing (HPC) centers and commercial clouds are well suited to cover peak loads. However, the utilization of these resources gives rise to new levels of complexity, e.g. resources need to be managed highly dynamically and HEP applications require a very specific software environment usually not provided at opportunistic resources. Furthermore, aspects to consider are limitations in network bandwidth causing I/O-intensive workflows to run inefficiently. The key component to dynamically run HEP applications on opportunistic resources is the utilization of modern container and virtualization technologies. Based on these technologies, the Karlsruhe Institute of Technology (KIT) has developed ROCED, a resource manager to dynamically integrate and manage a variety of opportunistic resources. In combination with ROCED, HTCondor batch system acts as a powerful single entry point to all available computing resources, leading to a seamless and transparent integration of opportunistic resources into HEP computing. KIT is currently improving the resource management and job scheduling by focusing on I/O requirements of individual workflows, available network bandwidth as well as scalability. For these reasons, we are currently developing a new resource manager, called TARDIS. In this paper, we give an overview of the utilized technologies, the dynamic management, and integration of resources as well as the status of the I/O-based resource and job scheduling.
DOI: 10.1051/epjconf/201921404007
2019
Cited 3 times
Advancing throughput of HEP analysis work-flows using caching concepts
High throughput and short turnaround cycles are core requirements for efficient processing of data-intense end-user analyses in High Energy Physics (HEP). Together with the tremendously increasing amount of data to be processed, this leads to enormous challenges for HEP storage systems, networks and the data distribution to computing resources for end-user analyses. Bringing data close to the computing resource is a very promising approach to solve throughput limitations and improve the overall performance. However, achieving data locality by placing multiple conventional caches inside a distributed computing infrastructure leads to redundant data placement and inefficient usage of the limited cache volume. The solution is a coordinated placement of critical data on computing resources, which enables matching each process of an analysis work-flow to its most suitable worker node in terms of data locality and, thus, reduces the overall processing time. This coordinated distributed caching concept was realized at KIT by developing the coordination service NaviX that connects an XRootD cache proxy infrastructure with an HTCondor batch system. We give an overview about the coordinated distributed caching concept and experiences collected on prototype system based on NaviX.
DOI: 10.1088/1742-6596/1085/3/032056
2018
Mastering Opportunistic Computing Resources for HEP
As results of the excellent LHC performance in 2016, more data than expected has been recorded leading to a higher demand for computing resources. It is already foreseeable that for the current and upcoming run periods a flat computing budget and the expected technology advance will not be sufficient to meet the future requirements. This results in a growing gap between supplied and demanded resources.
DOI: 10.1088/1742-6596/898/5/052034
2017
Opportunistic data locality for end user data analysis
With the increasing data volume of LHC Run2, user analyses are evolving towards increasing data throughput. This evolution translates to higher requirements for efficiency and scalability of the underlying analysis infrastructure. We approach this issue with a new middleware to optimise data access: a layer of coordinated caches transparently provides data locality for high-throughput analyses. We demonstrated the feasibility of this approach with a prototype used for analyses of the CMS working groups at KIT. In this paper, we present our experience both with the approach in general, and our prototype in specific.
DOI: 10.1088/1742-6596/1085/3/032005
2018
Provisioning of data locality for HEP analysis workflows
The heavily increasing amount of data produced by current experiments in high energy particle physics challenge both end users and providers of computing resources. The boosted data rates and the complexity of analyses require huge datasets being processed in short turnaround cycles. Usually, data storages and computing farms are deployed by different providers, which leads to data delocalization and a strong influence of the interconnection transfer rates. The CMS collaboration at KIT has developed a prototype enabling data locality for HEP analysis processing via two concepts. A coordinated and distributed caching approach that reduce the limiting factor of data transfers by joining local high performance devices with large background storages were tested. Thereby, a throughput optimization was reached by selecting and allocating critical data within user work-flows. A highly performant setup using these caching solutions enables fast processing of throughput dependent analysis workflows.
DOI: 10.15496/publikation-29051
2019
Dynamic Resource Extension for Data Intensive Computing with Specialized Software Environments on HPC Systems
DOI: 10.1088/1742-6596/1525/1/012055
2020
Federation of compute resources available to the German CMS community
Abstract The German CMS community (DCMS) as a whole can benefit from the various compute resources, available to its different institutes. While Grid-enabled and National Analysis Facility resources are usually shared within the community, local and recently enabled opportunistic resources like HPC centers and cloud resources are not. Furthermore, there is no shared submission infrastructure available. Via HTCondor’s [1] mechanisms to connect resource pools, several remote pools can be connected transparently to the users and therefore used more efficiently by a multitude of user groups. In addition to the statically provisioned resources, also dynamically allocated resources from external cloud providers as well as HPC centers can be integrated. However, the usage of such dynamically allocated resources gives rise to additional complexity. Constraints on access policies of the resources, as well as workflow necessities have to be taken care of. To maintain a well-defined and reliable runtime environment on each resource, virtualization and containerization technologies such as virtual machines, Docker, and Singularity, are used.
DOI: 10.1088/1742-6596/1525/1/012065
2020
Boosting Performance of Data-intensive Analysis Workflows with Distributed Coordinated Caching
Abstract Data-intensive end-user analyses in high energy physics require high data throughput to reach short turnaround cycles. This leads to enormous challenges for storage and network infrastructure, especially when facing the tremendously increasing amount of data to be processed during High-Luminosity LHC runs. Including opportunistic resources with volatile storage systems into the traditional HEP computing facilities makes this situation more complex. Bringing data close to the computing units is a promising approach to solve throughput limitations and improve the overall performance. We focus on coordinated distributed caching by coordinating workows to the most suitable hosts in terms of cached files. This allows optimizing overall processing efficiency of data-intensive workows and efficiently use limited cache volume by reducing replication of data on distributed caches. We developed a NaviX coordination service at KIT that realizes coordinated distributed caching using XRootD cache proxy server infrastructure and HTCondor batch system. In this paper, we present the experience gained in operating coordinated distributed caches on cloud and HPC resources. Furthermore, we show benchmarks of a dedicated high throughput cluster, the Throughput-Optimized Analysis-System (TOpAS), which is based on the above-mentioned concept.
DOI: 10.1088/1742-6596/1525/1/012067
2020
HEP Analyses on Dynamically Allocated Opportunistic Computing Resources
Abstract The current experiments in high energy physics (HEP) have a huge data rate. To convert the measured data, an enormous number of computing resources is needed and will further increase with upgraded and newer experiments. To fulfill the ever-growing demand the allocation of additional, potentially only temporary available non-HEP dedicated resources is important. These so-called opportunistic resources cannot only be used for analyses in general but are also well-suited to cover the typical unpredictable peak demands for computing resources. For both use cases, the temporary availability of the opportunistic resources requires a dynamic allocation, integration, and management, while their heterogeneity requires optimization to maintain high resource utilization by allocating best matching resources. To find the best matching resources which should be allocated is challenging due to the unpredictable submission behavior as well as an ever-changing mixture of workflows with different requirements. Instead of predicting the best matching resource, we base our decisions on the utilization of resources. For this reason, we are developing the resource manager TARDIS (Transparent Adaptive Resource Dynamic Integration System) which manages and dynamically requests or releases resources. The decision of how many resources TARDIS has to request is implemented in COBalD (COBald - The Opportunistic Balancing Daemon) to ensure further allocation of well-used resources while reducing the amount of insufficiently used ones. TARDIS allocates and manages resources from various resource providers such as HPC centers or commercial and public clouds while ensuring a dynamic allocation and efficient utilization of these heterogeneous opportunistic resources. Furthermore, TARDIS integrates the allocated opportunistic resources into one overlay batch system which provides a single point of entry for all users. In order to provide the dedicated HEP software environment, we use virtualization and container technologies. In this contribution, we give an overview of the dynamic integration of opportunistic resources via TARDIS/COBalD in our HEP institute as well as how user analyses benefit from these additional resources.
DOI: 10.1051/epjconf/202024507007
2020
Setup and commissioning of a high-throughput analysis cluster
Current and future end-user analyses and workflows in High Energy Physics demand the processing of growing amounts of data. This plays a major role when looking at the demands in the context of the High-Luminosity-LHC. In order to keep the processing time and turn-around cycles as low as possible analysis clusters optimized with respect to these demands can be used. Since hyper converged servers offer a good combination of compute power and local storage, they form the ideal basis for these clusters. In this contribution we report on the setup and commissioning of a dedicated analysis cluster setup at Karlsruhe Institute of Technology. This cluster was designed for use cases demanding high data-throughput. Based on hyper converged servers this cluster offers 500 job slots and 1 PB of local storage. Combined with the 100 Gb network connection between the servers and a 200 Gb uplink to the Tier-1 storage, the cluster can sustain a data-throughput of 1 PB per day. In addition, the local storage provided by the hyper converged worker nodes can be used as cache space. This allows employing of caching approaches on the cluster, thereby enabling a more efficient usage of the disk space. In previous contributions this concept has been shown to lead to an expected speedup of 2 to 4 compared to conventional setups.
DOI: 10.15496/publikation-25203
2018
High precision calculations of particle physics at the NEMO cluster in Freiburg
DOI: 10.15496/publikation-25195
2018
Proceedings of the 4th bwHPC Symposium
2019
Concept of federating German CMS Tier 3 resources
2020
Jet Momentum Resolution for the CMS Experiment and Distributed Data Caching Strategies
Accurately measured jets are mandatory for precision measurements of the Standard Model of particle physics as well as for searches for new physics. The increased instantaneous luminosity and center-of-mass energy at LHC Run 2 pose challenges for pileup mitigation and the measurement of jet characteristics. This thesis concentrates on using Z + jets events to calibrate the energy scale of jets recorded by the CMS detector in 2018. Furthermore, it proposes a new procedure for determining the jet momentum resolution using Z + jets events. This procedure is expected to allow cross-checking complementary measurement approaches and increasing the accuracy of the jet momentum resolution at the CMS experiment. Data-intensive end-user analyses in High Energy Physics such as the presented calibration of jets put enormous challenges on the computing infrastructure since requiring high data throughput. Besides the particle physics analysis, this thesis also focuses on accelerating data processing within a distributed computing infrastructure via a coordinated distributed caching approach. Coordinated placement of critical data within distributed caches and matching workflows to the most suitable host in terms of cached data allows for optimizing processing efficiency. Improving the processing of data-intensive workflows aims at shortening turnaround cycles and thus deriving physics results, e.g. the jet calibration results, faster.