ϟ

Diego Ciangottini

Here are all the papers by Diego Ciangottini that you can download and read on OA.mg.
Diego Ciangottini’s last known institution is . Download Diego Ciangottini PDFs here.

Claim this Profile →
DOI: 10.1007/s41781-019-0026-3
2019
Cited 99 times
Rucio: Scientific Data Management
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The data can be distributed across heterogeneous data centers at widely distributed locations. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously extended to support the LHC experiments and other diverse scientific communities. In this article, we detail the fundamental concepts of Rucio, describe the architecture along with implementation details, and give operational experience from production usage.
DOI: 10.1007/s11869-023-01495-x
2024
Air quality changes during the COVID-19 pandemic guided by robust virus-spreading data in Italy
DOI: 10.1016/j.cpc.2023.108965
2024
Prototyping a ROOT-based distributed analysis workflow for HL-LHC: The CMS use case
The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
DOI: 10.48550/arxiv.2404.02100
2024
Analysis Facilities White Paper
This white paper presents the current status of the R&D for Analysis Facilities (AFs) and attempts to summarize the views on the future direction of these facilities. These views have been collected through the High Energy Physics (HEP) Software Foundation's (HSF) Analysis Facilities forum, established in March 2022, the Analysis Ecosystems II workshop, that took place in May 2022, and the WLCG/HSF pre-CHEP workshop, that took place in May 2023. The paper attempts to cover all the aspects of an analysis facility.
DOI: 10.1051/epjconf/202429511012
2024
KServe inference extension for an FPGA vendor-free ecosystem
Field Programmable Gate Arrays (FPGAs) are playing an increasingly important role in the sampling and data processing industry due to their intrinsically highly parallel architecture, low power consumption, and flexibility to execute custom algorithms. In particular, the use of FPGAs to perform Machine Learning (ML) inference is increasingly growing thanks to the development of High-Level Synthesis (HLS) projects that abstract the complexity of Hardware Description Language (HDL) programming. In this work we will describe our experience extending KServe predictors, an emerging standard for ML model inference as a service on kubernetes. This project will support a custom workflow capable of loading and serving models on-demand on top of FPGAs. A key aspect of the proposed approach is to make the firmware generation, often an obstacle to a widespread FPGA adoption, transparent. We will detail how the proposed system automates both the synthesis of the HDL code and the generation of the firmware, starting from a high-level language and user-friendly machine learning libraries. The ecosystem is then completed with the adoption of a common language for sharing user models and firmwares, that is based on a dedicated Open Container Initiative artifact definition, thus leveraging all the well established practices on managing resources on a container registry.
DOI: 10.1051/epjconf/202429510004
2024
INFN and the evolution of distributed scientific computing in Italy
INFN has been running a distributed infrastructure (the Tier-1 at Bologna-CNAF and 9 Tier-2 centres) for more than 20 years which currently offers about 150000 CPU cores and 120 PB of space both in tape and disk storage, serving more than 40 international scientific collaborations. This Grid-based infrastructure was augmented in 2019 with the INFN Cloud: a production quality multi-site federated Cloud infrastructure, composed by a core backbone, and which is able to integrate other INFN sites and public or private Clouds as well. The INFN Cloud provides a customizable and extensible portfolio offering computing and storage services spanning the IaaS, PaaS and SaaS layers, with dedicated solutions to serve special purposes, such as ISO-certified regions for the handling of sensitive data. INFN is now revising and expanding its infrastructure to tackle the challenges expected in the next 10 years of scientific computing adopting a “cloud-first” approach, through which all the INFN data centres will be federated via the INFN Cloud middleware and integrated with key HPC centres, such as the pre-exascale Leonardo machine at CINECA. In such a process, which involves both the infrastructures and the higher level services, initiatives and projects such as the "Italian National Centre on HPC, Big Data and Quantum Computing" (funded in the context of the Italian "National Recovery and Resilience Plan") and the Bologna Technopole are precious opportunities that will be exploited to offer advanced resources and services to universities, research institutions and industry. In this paper we describe how INFN is evolving its computing infrastructure, with the ambition to create and operate a national vendorneutral, open, scalable, and flexible "datalake" able to serve much more than just INFN users and experiments.
DOI: 10.3847/1538-4365/ac072a
2021
Cited 8 times
Catalog of Long-term Transient Sources in the First 10 yr of Fermi-LAT Data
We present the first Fermi Large Area Telescope (LAT) catalog of long-term γ-ray transient sources (1FLT). This comprises sources that were detected on monthly time intervals during the first decade of Fermi-LAT operations. The monthly timescale allows us to identify transient and variable sources that were not yet reported in other Fermi-LAT catalogs. The monthly data sets were analyzed using a wavelet-based source detection algorithm that provided the candidate new transient sources. The search was limited to the extragalactic regions of the sky to avoid the dominance of the Galactic diffuse emission at low Galactic latitudes. The transient candidates were then analyzed using the standard Fermi-LAT maximum likelihood analysis method. All sources detected with a statistical significance above 4σ in at least one monthly bin were listed in the final catalog. The 1FLT catalog contains 142 transient γ-ray sources that are not included in the 4FGL-DR2 catalog. Many of these sources (102) have been confidently associated with active galactic nuclei (AGNs): 24 are associated with flat-spectrum radio quasars, 1 with a BL Lac object, 70 with blazars of uncertain type, 3 with radio galaxies, 1 with a compact steep-spectrum radio source, 1 with a steep-spectrum radio quasar, and 2 with AGNs of other types. The remaining 40 sources have no candidate counterparts at other wavelengths. The median γ-ray spectral index of the 1FLT-AGN sources is softer than that reported in the latest Fermi-LAT AGN general catalog. This result is consistent with the hypothesis that detection of the softest γ-ray emitters is less efficient when the data are integrated over year-long intervals.
DOI: 10.1051/epjconf/201921407027
2019
Cited 8 times
Exploiting private and commercial clouds to generate on-demand CMS computing facilities with DODAS
Minimising time and cost is key to exploit private or commercial clouds. This can be achieved by increasing setup and operational efficiencies. The success and sustainability are thus obtained reducing the learning curve, as well as the operational cost of managing community-specific services running on distributed environments. The greater beneficiaries of this approach are communities willing to exploit opportunistic cloud resources. DODAS builds on several EOSC-hub services developed by the INDIGO-DataCloud project and allows to instantiate on-demand container-based clusters. These execute software applications to benefit of potentially “any cloud provider”, generating sites on demand with almost zero effort. DODAS provides ready-to-use solutions to implement a “Batch System as a Service” as well as a BigData platform for a “Machine Learning as a Service”, offering a high level of customization to integrate specific scenarios. A description of the DODAS architecture will be given, including the CMS integration strategy adopted to connect it with the experiment’s HTCondor Global Pool. Performance and scalability results of DODAS-generated tiers processing real CMS analysis jobs will be presented. The Instituto de Física de Cantabria and Imperial College London use cases will be sketched. Finally a high level strategy overview for optimizing data ingestion in DODAS will be described.
DOI: 10.48550/arxiv.1506.05829
2015
Cited 8 times
Proceedings of the Sixth International Workshop on Multiple Partonic Interactions at the Large Hadron Collider
Multiple Partonic Interactions are often crucial for interpreting results obtained at the Large Hadron Collider (LHC). The quest for a sound understanding of the dynamics behind MPI - particularly at this time when the LHC is due to start its "Run II" operations - has focused the aim of this workshop. MPI@LHC2014 concentrated mainly on the phenomenology of LHC measurements whilst keeping in perspective those results obtained at previous hadron colliders. The workshop has also debated some of the state-of-the-art theoretical considerations and the modeling of MPI in Monte Carlo event generators. The topics debated in the workshop included: Phenomenology of MPI processes and multiparton distributions; Considerations for the description of MPI in Quantum Chromodynamics (QCD); Measuring multiple partonic interactions; Experimental results on inelastic hadronic collisions: underlying event, minimum bias, forward energy flow; Monte Carlo generator development and tuning; Connections with low-x phenomena, diffraction, heavy-ion physics and cosmic rays. In a total of 57 plenary talks the workshop covered a wide range of experimental results, Monte Carlo development and tuning, phenomenology and dedicated measurements of MPI which were produced with data from the LHC's Run I. Recent progress of theoretical understanding of MPI in pp, pA and AA collisions as well as the role of MPI in diffraction and small-x physics were also covered. The workshop forstered close contact between the experimental and theoretical communities. It provided a setting to discuss many of the different aspects of MPI, eventually identifying them as a unifying concept between apparently different lines of research and evaluating their impact on the LHC physics programme.
DOI: 10.1088/1742-6596/664/6/062038
2015
Cited 6 times
CMS distributed data analysis with CRAB3
The CMS Remote Analysis Builder (CRAB) is a distributed workflow management tool which facilitates analysis tasks by isolating users from the technical details of the Grid infrastructure. Throughout LHC Run 1, CRAB has been successfully employed by an average of 350 distinct users each week executing about 200,000 jobs per day.
DOI: 10.1016/j.future.2017.04.021
2018
Cited 4 times
Integration of end-user Cloud storage for CMS analysis
End-user Cloud storage is increasing rapidly in popularity in research communities thanks to the collaboration capabilities it offers, namely synchronisation and sharing. CERN IT has implemented a model of such storage named, CERNBox, integrated with the CERN AuthN and AuthZ services. To exploit the use of the end-user Cloud storage for the distributed data analysis activity, the CMS experiment has started the integration of CERNBox as a Grid resource. This will allow CMS users to make use of their own storage in the Cloud for their analysis activities as well as to benefit from synchronisation and sharing capabilities to achieve results faster and more effectively. It will provide an integration model of Cloud storages in the Grid, which is implemented and commissioned over the world’s largest computing Grid infrastructure, Worldwide LHC Computing Grid (WLCG). In this paper, we present the integration strategy and infrastructure changes needed in order to transparently integrate end-user Cloud storage with the CMS distributed computing model. We describe the new challenges faced in data management between Grid and Cloud and how they were addressed, along with details of the support for Cloud storage recently introduced into the WLCG data movement middleware, FTS3. The commissioning experience of CERNBox for the distributed data analysis activity is also presented.
DOI: 10.1051/epjconf/202024504024
2020
Cited 4 times
Smart Caching at CMS: applying AI to XCache edge services
The projected Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be achieved by the evolution of current technology within a flat budget. The WLCG community is studying possible technical solutions to evolve the current computing in order to cope with the requirements; one of the main focus is resource optimization, with the ultimate aim of improving performance and efficiency, as well as simplifying and reducing operation costs. As of today the storage consolidation based on a Data Lake model is considered a good candidate for addressing HL-LHC data access challenges. The Data Lake model under evaluation can be seen as a logical system that hosts a distributed working set of analysis data. Compute power can be “close” to the lake, but also remote and thus completely external. In this context we expect data caching to play a central role as a technical solution to reduce the impact of latency and reduce network load. A geographically distributed caching layer will be functional to many satellite computing centers that might appear and disappear dynamically. In this talk we propose a system of caches, distributed at national level, describing both deployment and results of the studies made to measure the impact on the CPU efficiency. In this contribution, we also present the early results on novel caching strategy beyond the standard XRootD approach whose results will be a baseline for an AI-based smart caching system.
DOI: 10.1051/epjconf/202024509009
2020
Cited 4 times
Extension of the INFN Tier-1 on a HPC system
The INFN Tier-1 located at CNAF in Bologna (Italy) is a center of the WLCG e-Infrastructure, supporting the 4 major LHC collaborations and more than 30 other INFN-related experiments. After multiple tests towards elastic expansion of CNAF compute power via Cloud resources (provided by Azure, Aruba and in the framework of the HNSciCloud project), and building on the experience gained with the production quality extension of the Tier-1 farm on remote owned sites, the CNAF team, in collaboration with experts from the ALICE, ATLAS, CMS, and LHCb experiments, has been working to put in production a solution of an integrated HTC+HPC system with the PRACE CINECA center, located nearby Bologna. Such extension will be implemented on the Marconi A2 partition, equipped with Intel Knights Landing (KNL) processors. A number of technical challenges were faced and solved in order to successfully run on low RAM nodes, as well as to overcome the closed environment (network, access, software distribution, ... ) that HPC systems deploy with respect to standard GRID sites. We show preliminary results from a large scale integration effort, using resources secured via the successful PRACE grant N. 2018194658, for 30 million KNL core hours.
DOI: 10.1051/epjconf/201921403006
2019
Cited 3 times
Improving efficiency of analysis jobs in CMS
Hundreds of physicists analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider using the CMS Remote Analysis Builder and the CMS global pool to exploit the resources of the Worldwide LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time, the CMS collaboration is committed to minimizing time to insight for every scientist, by pushing for fewer possible access restrictions to the full data sample and supports the free choice of applications to run on the computing resources. Supporting such variety of workflows while preserving efficient resource usage poses special challenges. In this paper we report on three complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: automatic job splitting, automated run time estimates and automated site selection for jobs.
DOI: 10.22323/1.327.0024
2018
Cited 3 times
DODAS: How to effectively exploit heterogeneous clouds for scientific computations
Dynamic On Demand Analysis Service (DODAS) is a Platform as a Service tool built combining several solutions and products developed by the INDIGO-DataCloud H2020 project.DODAS allows to instantiate on-demand container-based clusters.Both HTCondor batch system and platform for the Big Data analysis based on Spark, Hadoop etc, can be deployed on any cloud-based infrastructures with almost zero effort.DODAS acts as cloud enabler designed for scientists seeking to easily exploit distributed and heterogeneous clouds to process data.Aiming to reduce the learning curve as well as the operational cost of managing community specific services running on distributed cloud, DODAS completely automates the process of provisioning, creating, managing and accessing a pool of heterogeneous computing and storage resources.DODAS was selected as one of the Thematic Services that will provide multidisciplinary solutions in the EOSC-hub project, an integration and management system of the European Open Science Cloud starting in January 2018.The main goals of this contribution are to provide a comprehensive overview of the overall technical implementation of DODAS, as well as to illustrate two distinct real examples of usage: the integration within the CMS Workload Management System and the extension of the AMS computing model.
DOI: 10.1088/1742-6596/1525/1/012057
2020
Cited 3 times
Using DODAS as deployment manager for smart caching of CMS data management system
Abstract DODAS stands for Dynamic On Demand Analysis Service and is a Platform as a Service toolkit built around several EOSC-hub services designed to instantiate and configure on-demand container-based clusters over public or private Cloud resources. It automates the whole workflow from service provisioning to the configuration and setup of software applications. Therefore, such a solution allows using “any cloud provider”, with almost zero effort. In this paper, we demonstrate how DODAS can be adopted as a deployment manager to set up and manage the compute resources and services required to develop an AI solution for smart data caching. The smart caching layer may reduce the operational cost and increase flexibility with respect to regular centrally managed storage of the current CMS computing model. The cache space should be dynamically populated with the most requested data. In addition, clustering such caching systems will allow to operate them as a Content Delivery System between data providers and end-users. Moreover, a geographically distributed caching layer will be functional also to a data-lake based model, where many satellite computing centers might appear and disappear dynamically. In this context, our strategy is to develop a flexible and automated AI environment for smart management of the content of such clustered cache system. In this contribution, we will describe the identified computational phases required for the AI environment implementation, as well as the related DODAS integration. Therefore we will start with the overview of the architecture for the pre-processing step, based on Spark, which has the role to prepare data for a Machine Learning technique. A focus will be given on the automation implemented through DODAS. Then, we will show how to train an AI-based smart cache and how we implemented a training facility managed through DODAS. Finally, we provide an overview of the inference system, based on the CMS-TensorFlow as a Service and also deployed as a DODAS service.
DOI: 10.5281/zenodo.7883082
2023
A dynamic and extensible web portal enabling the deployment of scientific virtual computational environments on hybrid e-infrastructures
DOI: 10.5281/zenodo.8036983
2023
interTwin D5.1 First Architecture design and Implementation Plan
DOI: 10.1007/s10723-023-09664-z
2023
Smart Caching in a Data Lake for High Energy Physics Analysis
Abstract The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.
DOI: 10.48550/arxiv.2307.12579
2023
Prototyping a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
DOI: 10.2139/ssrn.4529970
2023
Prototyping a Root-Based Distributed Analysis Workflow for Hl-Lhc: The Cms Use Case
The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
DOI: 10.22323/1.351.0014
2019
Integration of the Italian cache federation within the CMS computing model
The next decades at HL-LHC will be characterized by a huge increase of both storage and computing requirements (between one and two orders of magnitude). Moreover we foresee a shift on resources provisioning towards the exploitation of dynamic (on private or public cloud and HPC facilities) solutions. In this scenario the computing model of the CMS experiment is pushed towards an evolution for the optimization of the amount of space that is managed centrally and the CPU efficiency of the jobs that run on "storage-less" resources. In particular the computing resources of the "Tier2" sites layer, for the most part, can be instrumented to read data from a geographically distributed cache storage based on unmanaged resources, reducing, in this way, the operational efforts by a large fraction and generating additional flexibility. The objective of this contribution is to present the first implementation of an INFN federation of cache servers, developed also in collaboration with the eXtreme Data Cloud EU project. The CNAF Tier-1 plus Bari and Legnaro Tier-2s provide unmanaged storages which have been organized under a common namespace. This distributed cache federation has been seamlessly integrated in the CMS computing infrastructure, while the technical implementation of this solution is based on XRootD, largely adopted in the CMS computing model under the "Anydata, Anytime, Anywhere project" (AAA). The results in terms of CMS workflows performances will be shown. In addition a complete simulation of the effects of the described model under several scenarios, including dynamic hybrid cloud resource provisioning, will be discussed. Finally a plan for the upgrade of such a prototype towards a stable INFN setup seamlessly integrated with production CMS computing infrastructure will be discussed.
DOI: 10.1051/epjconf/202024507033
2020
The DODAS Experience on the EGI Federated Cloud
The EGI Cloud Compute service offers a multi-cloud IaaS federation that brings together research clouds as a scalable computing platform for research accessible with OpenID Connect Federated Identity. The federation is not limited to single sign-on, it also introduces features to facilitate the portability of applications across providers: i) a common VM image catalogue VM image replication to ensure these images will be available at providers whenever needed; ii) a GraphQL information discovery API to understand the capacities and capabilities available at each provider; and iii) integration with orchestration tools (such as Infrastructure Manager) to abstract the federation and facilitate using heterogeneous providers. EGI also monitors the correct function of every provider and collects usage information across all the infrastructure. DODAS (Dynamic On Demand Analysis Service) is an open-source Platform-as-a-Service tool, which allows to deploy software applications over heterogeneous and hybrid clouds. DODAS is one of the so-called Thematic Services of the EOSC-hub project and it instantiates on-demand container-based clusters offering a high level of abstraction to users, allowing to exploit distributed cloud infrastructures with a very limited knowledge of the underlying technologies.This work presents a comprehensive overview of DODAS integration with EGI Cloud Federation, reporting the experience of the integration with CMS Experiment submission infrastructure system.
DOI: 10.1088/1742-6596/664/6/062052
2015
AsyncStageOut: Distributed user data management for CMS Analysis
AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirements for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.
DOI: 10.48550/arxiv.1410.6664
2014
Progress in Double Parton Scattering Studies
An overview of theoretical and experimental progress in double parton scattering (DPS) is presented. The theoretical topics cover factorization in DPS, models for double parton distributions and DPS in charm production and nuclear collisions. On the experimental side, CMS results for dijet and double J/ψ production, in light of DPS, as well as first results for the 4-jet channel are presented. ALICE reports on a study of open charm and J/ψ multiplicity dependence.
DOI: 10.1109/escience.2018.00082
2018
Distributed and On-demand Cache for CMS Experiment at LHC
In the CMS [1] computing model the experiment owns dedicated resources around the world that, for the most part, are located in computing centers with a well defined Tier hierarchy. The geo-distributed storage is then controlled centrally by the CMS Computing Operations. In this architecture data are distributed and replicated across the centers following a preplacement model, mostly human controlled. Analysis jobs are then mostly executed on computing resources close to the data location. This of course allow to avoid CPU wasting due to I/O latency, although it does not allow to optimize the available job slots.
DOI: 10.1007/978-3-030-58802-1_24
2020
An Intelligent Cache Management for Data Analysis at CMS
In this work, we explore a score-based approach to manage a cache system. With the proposed method, the cache can better discriminate the input requests and improve the overall performances. We created a score based discriminator using the file statistics. The score represents the weight of a file. We tested several functions to compute the file weight used to determine whether a file has to be stored in the cache or not. We developed a solution experimenting on a real cache manager named XCache, that is used within the Compact Muon Solenoid (CMS) data analysis workflow. The aim of this work is optimizing to reduce maintaining costs of the cache system without compromising the user experience.
DOI: 10.1007/s10586-021-03325-0
2021
Migration of CMSWEB cluster at CERN to Kubernetes: a comprehensive study
Abstract The Compact Muon Solenoid (CMS) experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and developers. The release cycle is composed of several steps, from building RPMs to their deployment, validation, and integration tests. To enhance the sustainability of the CMSWEB cluster, CMS decided to migrate its cluster to a containerized solution based on Docker and orchestrated with Kubernetes (K8s). This allows us to significantly speed up the release upgrade cycle, follow the end-to-end deployment procedure, and reduce operational cost. In this paper, we give an overview of the CMSWEB VM cluster and the issues we discovered during this migration. We discuss the architecture and the implementation strategy in the CMSWEB Kubernetes cluster. Even though Kubernetes provides horizontal pod autoscaling based on CPUs and memory, in this paper, we provide details of horizontal pod autoscaling based on the custom metrics of CMSWEB services. We also discuss automated deployment procedure based on the best practices of continuous integration/continuous deployment (CI/CD) workflows. We present performance analysis between Kubernetes and VM based CMSWEB deployments. Finally, we describe various issues found during the implementation in Kubernetes and report on lessons learned during the migration process.
DOI: 10.22323/1.378.0009
2021
Reinforcement Learning for Smart Caching at the CMS experiment
In the near future, High Energy Physics experiments’ storage and computing needs will go far above what can be achieved by only scaling current computing models or current infrastructures. Considering the LHC case, for 10 years a federated infrastructure (Worldwide LHC Computing Grid, WLCG) has been successfully developed. Nevertheless, the High Luminosity (HL-LHC) scenario is forcing the WLCG community to dig for innovative solutions. In this landscape, one of the initiatives is the exploitation of Data Lakes as a solution to improve the Data and Storage management. The current Data Lake model foresees data caching to play a central role as a technical solution to reduce the impact of latency and network load. Moreover, even higher efficiency can be achieved through a smart caching algorithm: this motivates the development of an AI-based approach to the caching problem. In this work, a Reinforcement Learning-based cache model (named QCACHE) is applied in the CMS experiment context. More specifically, we focused our attention on the optimization of both cache performances and cache management costs. The QCACHE system is based on two distinct Q-Learning (or Deep Q-Learning) agents seeking to find the best action to take given the current state. More explicitly, they try to learn a policy that maximizes the total reward (i.e. hit or miss occurring in a given time span). While the addition Agent is taking care of all the cache writing requests, clearly the eviction agent deals with the decision to keep or to delete files in the cache. We will present an overview of the QCACHE framework an the results in terms of cache performances, obtained using using “Real-world” data, will be compared respect to standard replacement policies (i.e. we used historical data requests aggregation used to predict dataset popularity filtered for Italian region). Moreover, we will show the planned subsequent evolution of the framework.
2014
DPS measurements with CMS
We present recent results on Double Parton Scattering (DPS) studies using data collected during Run 1 of the LHC with the CMS experiment. Double parton scattering is investigated in several final states including vector bosons and multi-jets. Measurements of observables designed to highlight the DPS contribution are shown and compared to MC predictions from models based on multiple partonic interactions (MPI) phenomenology.
2014
Progress in Double Parton Scattering Studies
ITFA, University of Amsterdam, Science Park 904, 1018 XE, Amsterdam, The Netherlands(Dated: October 27, 2014)An overview of theoretical and experimental progress in double parton scattering (DPS) is pre-sented. The theoretical topics cover factorization in DPS, models for double parton distributionsand DPS in charm production and nuclear collisions. On the experimental side, CMS results fordijet and double J= production, in light of DPS, as well as rst results for the 4-jet channel arepresented. ALICE reports on a study of open charm and J= multiplicity dependence.I. PROGRESS IN THE THEORY OFDOUBLE PARTON SCATTERINGA. Introduction
DOI: 10.1088/1742-6596/513/3/032079
2014
CMS users data management service integration and first experiences with its NoSQL data storage
The distributed data analysis workflow in CMS assumes that jobs run in a different location to where their results are finally stored. Typically the user outputs must be transferred from one site to another by a dedicated CMS service, AsyncStageOut. This new service is originally developed to address the inefficiency in using the CMS computing resources when transferring the analysis job outputs, synchronously, once they are produced in the job execution node to the remote site.
DOI: 10.1393/ncc/i2015-15044-y
2015
Double Parton Scattering with CMS detector at LHC
— Multi-parton interactions (MPI) are experiencing a growing popularity and are widely invoked to account for observations that cannot be explained otherwise. With the large integrated luminosity available, the Double Parton Scattering (DPS) measurements (2 hard events in the same proton-proton collision) can be performed in different final states and at different energy scales. The proposed contribution is intended to present the CMS results of DPS analysis on W+2jets events. That is actually the first direct measure of DPS with CMS detector. The analyzed data correspond to an integrated luminosity of 5 fb−1 collected in pp collisions. It will be shown how the simulations of W+2jets events with MADGRAPH5+PYTHIA8 (or PYTHIA6) and NLO predictions of POWHEG2+PYTHIA6 (or HERWIG6) provide a good description of the observables but fail to describe the data if multiple parton interactions are not included. Then the fraction of DPS in W+2-jet events is extracted from a DPS+SPS sample and the value measured is fDPS = 0.055 ± 0.002(stat) ± 0.014(syst) with a corresponding effective cross section of σeff = 20.7± 0.8(stat)± 6.5(syst)mb.
DOI: 10.1088/1742-6596/664/3/032006
2015
Improvements of LHC data analysis techniques at Italian WLCG sites. Case-study of the transfer of this technology to other research areas
In 2012, 14 Italian institutions participating in LHC Experiments won a grant from the Italian Ministry of Research (MIUR), with the aim of optimising analysis activities, and in general the Tier2/Tier3 infrastructure. We report on the activities being researched upon, on the considerable improvement in the ease of access to resources by physicists, also those with no specific computing interests. We focused on items like distributed storage federations, access to batch-like facilities, provisioning of user interfaces on demand and cloud systems. R&D on next-generation databases, distributed analysis interfaces, and new computing architectures was also carried on. The project, ending in the first months of 2016, will produce a white paper with recommendations on best practices for data-analysis support by computing centers.
2016
Measurements with QCD and Jets at the LHC
DOI: 10.22323/1.278.0017
2016
QCD and Jets
In the light of the successful restart of the data-taking of the LHC experiments at the unprecedented energy in the center of mass √ s = 13 TeV, we review the prospects for the second run of the LHC for measurements related to Quantum Chromodynamics (QCD) and jets in pp collisions.Recent results from the ATLAS, CMS, and LHCb collaborations lead the discussion on the open questions on soft production that the LHC experiments are called to address during the next few years of activities.The discussion is mainly focused on measurements related to the underlying event, to the production mechanism of jets, and to the associative production of jets and heavy flavours.
2014
Multi parton interactions with CMS detector at LHC
DOI: 10.1088/1742-6596/898/4/042048
2017
A comparison of different database technologies for the CMS AsyncStageOut transfer database
AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user's output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate.
DOI: 10.1088/1742-6596/898/9/092036
2017
Efficient monitoring of CRAB jobs at CMS
CRAB is a tool used for distributed analysis of CMS data. Users can submit sets of jobs with similar requirements (tasks) with a single request. CRAB uses a client-server architecture, where a lightweight client, a server, and ancillary services work together and are maintained by CMS operators at CERN.
DOI: 10.1109/nssmic.2017.8533143
2017
A container-based solution to generate HTCondor Batch Systems on demand exploiting heterogeneous Clouds for data analysis
This paper describes the Dynamic On Demand Analysis Service (DODAS), an automated system that simplifies the process of provisioning, creating, managing and accessing a pool of heterogeneous computing and storage resources, by generating clusters to run batch systems thereby implementing the "Batch System as a Service" paradigm. DODAS is built on several INDIGO-DataCloud services among which: the PaaS Orchestrator, the Infrastructure Manager, and the Identity and Access Manager are the most important. The paper describes also a successfully integration of DODAS with the computing infrastructure of the Compact Muon Solenoid (CMS) experiment installed at LHC.
DOI: 10.48550/arxiv.2208.06437
2022
Smart caching in a Data Lake for High Energy Physics analysis
The continuous growth of data production in almost all scientific areas raises new problems in data access and management, especially in a scenario where the end-users, as well as the resources that they can access, are worldwide distributed. This work is focused on the data caching management in a Data Lake infrastructure in the context of the High Energy Physics field. We are proposing an autonomous method, based on Reinforcement Learning techniques, to improve the user experience and to contain the maintenance costs of the infrastructure.
DOI: 10.22323/1.415.0023
2022
Running Fermi-LAT analysis on Cloud: the experience with DODAS with EGI-ACE Project
The Fermi-LAT long-term Transient (FLT) monitoring aim is the routine search of -ray sources on monthly time intervals of Fermi-LAT data.The FLT analysis consists of two steps: first the monthly data sets were analyzed using a waveletbased source detection algorithm that provided the candidate new transient sources; finally these transient candidates were analyzed using the standard Fermi-LAT maximum likelihood analysis method.Only sources with a statistical significance above 4 in at least one monthly bin were listed in a catalog.The strategy adopted to implement the maximum likelihood analysis pipeline has been based on cloud solutions adopting the Dynamic On Demand Analysis Service (DODAS) [1] service as technology enabler.DODAS represents a solution to transparently exploit cloud computing with almost zero effort for a user community.This contribute will detail the technical implementation providing the point of view of the user community.
DOI: 10.22323/1.415.0022
2022
Open-source and cloud-native solutions for managing and analyzing heterogeneous and sensitive clinical Data
The requirement for an effective handling and management of heterogeneous and possibly confidential data continuously increases within multiple scientific domains.PLANET (Pollution Lake ANalysis for Effective Therapy) is a INFN-funded research initiative aiming to implement an observational study to assess a possible statistical association between environmental pollution and Covid19 infection, symptoms and course.PLANET is built on a "data-centric" based approach that takes into account clinical components, environmental and pollution conditions, complementing primary data and many eventual confounding factors such as population density, commuter density, socio-economic metrics and more.Besides the scientific one, the main technical challenge of the project is about collecting, indexing, storing and managing many types of datasets while guaranteeing FAIRness as well as adherence to the prescribed regulatory frameworks, such as those granted by the General Data Protection Regulation, GDPR.In this contribution we describe the developed open-source DataLake platform, detailing its key features: the event-based storage system provided by MinIO, which allows automatic metadata processing; the data-ingestion pipeline implemented via Argo Workflows; the GraphQL interface to query object metadata; finally, the seamless integration of the platform within a compute multi-user environment, showing how all these frameworks are integrated in the Enhanced PrIvacy and Compliance (EPIC) Cloud partition of the INFN Cloud federation.
DOI: 10.22323/1.327.0009
2018
Harvesting dispersed computational resources with Openstack: a Cloud infrastructure for the Computational Science community
Harvesting dispersed computational resources is nowadays an important and strategic topic especially in an environment, like the computational science one, where computing needs constantly increase.On the other hand managing dispersed resources might not be neither an easy task not costly effective.We successfully explored the use of OpenStack middleware to achieve this objective, our man goal is not only the resource harvesting but also to provide a modern paradigm of computing and data usage access.In the present work we will illustrate a real example on how to build a geographically distributed cloud to share and manage computing and storage resources, owned by heterogeneous cooperating entities
DOI: 10.1051/epjconf/202125102045
2021
First experiences with a portable analysis infrastructure for LHC at INFN
The challenges proposed by the HL-LHC era are not limited to the sheer amount of data to be processed: the capability of optimizing the analyser's experience will also bring important benefits for the LHC communities, in terms of total resource needs, user satisfaction and in the reduction of end time to publication. At the Italian National Institute for Nuclear Physics (INFN) a portable software stack for analysis has been proposed, based on cloud-native tools and capable of providing users with a fully integrated analysis environment for the CMS experiment. The main characterizing traits of the solution consist in the user-driven design and the portability to any cloud resource provider. All this is made possible via an evolution towards a “python-based” framework, that enables the usage of a set of open-source technologies largely adopted in both cloud-native and data-science environments. In addition, a “single sign on”-like experience is available thanks to the standards-based integration of INDIGO-IAM with all the tools. The integration of compute resources is done through the customization of a JupyterHUB solution, able to spawn identity-aware user instances ready to access data with no further setup actions. The integration with GPU resources is also available, designed to sustain more and more widespread ML based workflow. Seamless connections between the user UI and batch/big data processing framework (Spark, HTCondor) are possible. Eventually, the experiment data access latency is reduced thanks to the integrated deployment of a scalable set of caches, as developed in the context of ESCAPE project, and as such compatible with the future scenarios where a data-lake will be available for the research community. The outcome of the evaluation of such a solution in action is presented, showing how a real CMS analysis workflow can make use of the infrastructure to achieve its results.
DOI: 10.22323/1.378.0003
2021
Enabling HPC systems for HEP: the INFN-CINECA Experience
In this report we want to describe a successful integration exercise between CINECA (PRACE Tier-0) Marconi KNL system and LHC processing. A production-level system has been deployed using a 30 Mhours grant from the 18th Call for PRACE Project Access; thanks to CINECA, more than 3x the granted hours were eventually made available. Modifications at multiple levels were needed: on experiments' WMS layers, on site level access policies and routing, on virtualization. The success of the integration process paves the way to integration with additional local systems, and in general shows how the requirements of a HPC center can coexist with the needs from data intensive, complex distributed workflows.