ϟ

J. Letts

Here are all the papers by J. Letts that you can download and read on OA.mg.
J. Letts’s last known institution is . Download J. Letts PDFs here.

Claim this Profile →
DOI: 10.1109/hpdc.2004.36
2004
Cited 49 times
The Grid2003 production grid: principles and practice
The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory (Grid3) that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.
DOI: 10.1109/hpdc.2004.1323544
2004
Cited 44 times
The grid2003 production grid: principles and practice
The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.
DOI: 10.1088/1742-6596/664/6/062014
2015
Cited 17 times
How much higher can HTCondor fly?
The HTCondor high throughput computing system is heavily used in the high energy physics (HEP) community as the batch system for several Worldwide LHC Computing Grid (WLCG) resources. Moreover, it is the backbone of GlidelnWMS, the pilot system used by the computing organization of the Compact Muon Solenoid (CMS) experiment. To prepare for LHC Run 2, we probed the scalability limits of new versions and configurations of HTCondor with a goal of reaching 200,000 simultaneous running jobs in a single internationally distributed dynamic pool.
DOI: 10.1051/epjconf/202429504050
2024
The U.S. CMS HL-LHC R&D Strategic Plan
The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.
DOI: 10.1051/epjconf/202429501036
2024
Identifying and Understanding Scientific Network Flows
The High-Energy Physics (HEP) and Worldwide LHC Computing Grid (WLCG) communities have faced significant challenges in understanding their global network flows across the world’s research and education (R&E) networks. This article describes the status of the work carried out to tackle this challenge by the Research Technical Networking Working Group (RNTWG) and the Scientific Network Tags (Scitags) initiative, including the evolving framework and tools, as well as our plans to improve network visibility before the next WLCG Network Data Challenge in early 2024. The Scitags initiative is a long-term effort to improve the visibility and management of network traffic for data-intensive sciences. The efforts of the RNTWG and Scitags initiatives have created a set of tools, standards, and proof-of-concept demonstrators that show the feasibility of identifying the owner (community) and purpose (activity) of network traffic anywhere in the network.
DOI: 10.1109/nssmic.2009.5402426
2009
Cited 20 times
Hadoop distributed file system for the Grid
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service framework for the Storage Element (SE). Scalability tests show that sustained high inter-DataNode data transfer can be achieved for the cluster fully loaded with data-processing jobs. The WAN transfer to HDFS supported by BeStMan and tuned GridFTP servers shows large scalability and robustness of the system. The hadoop client can be deployed at interactive machines to support remote data access. The ability to automatically replicate precious data is especially important for computing sites, which is demonstrated at the Large Hadron Collider (LHC) computing centers. The simplicity of operations of HDFS-based SE significantly reduces the cost of ownership of Petabyte scale data storage over alternative solutions.
DOI: 10.1007/s10723-010-9152-1
2010
Cited 12 times
Distributed Analysis in CMS
The CMS experiment expects to manage several Pbytes of data each year during the LHC programme, distributing them over many computing sites around the world and enabling data access at those centers for analysis. CMS has identified the distributed sites as the primary location for physics analysis to support a wide community with thousands potential users. This represents an unprecedented experimental challenge in terms of the scale of distributed computing resources and number of user. An overview of the computing architecture, the software tools and the distributed infrastructure is reported. Summaries of the experience in establishing efficient and scalable operations to get prepared for CMS distributed analysis are presented, followed by the user experience in their current analysis activities.
DOI: 10.1088/1742-6596/513/3/032040
2014
Cited 7 times
CMS computing operations during run 1
During the first run, CMS collected and processed more than 10B data events and simulated more than 15B events. Up to 100k processor cores were used simultaneously and 100PB of storage was managed. Each month petabytes of data were moved and hundreds of users accessed data samples. In this document we discuss the operational experience from this first run. We present the workflows and data flows that were executed, and we discuss the tools and services developed, and the operations and shift models used to sustain the system. Many techniques were followed from the original computing planning, but some were reactions to difficulties and opportunities. We also address the lessons learned from an operational perspective, and how this is shaping our thoughts for 2015.
DOI: 10.1088/1742-6596/396/3/032040
2012
Cited 6 times
Performance studies and improvements of CMS distributed data transfers
CMS computing needs reliable, stable and fast connections among multi-tiered distributed infrastructures. CMS experiment relies on File Transfer Services (FTS) for data distribution, a low level data movement service responsible for moving sets of files from one site to another, while allowing participating sites to control the network resource usage. FTS servers are provided by Tier-0 and Tier-1 centers and used by all the computing sites in CMS, subject to established CMS and sites setup policies, including all the virtual organizations making use of the Grid resources at the site, and properly dimensioned to satisfy all the requirements for them. Managing the service efficiently needs good knowledge of the CMS needs for all kind of transfer routes, and the sharing and interference with other VOs using the same FTS transfer managers. This contribution deals with a complete revision of all FTS servers used by CMS, customizing the topologies and improving their setup in order to keep CMS transferring data to the desired levels, as well as performance studies for all kind of transfer routes, including overheads measurements introduced by SRM servers and storage systems, FTS server misconfigurations and identification of congested channels, historical transfer throughputs per stream, file-latency studies,... This information is retrieved directly from the FTS servers through the FTS Monitor webpages and conveniently archived for further analysis. The project provides an interface for all these values, to ease the analysis of the data.
DOI: 10.1088/1742-6596/396/4/042033
2012
Cited 6 times
CMS Data Transfer operations after the first years of LHC collisions
CMS experiment utilizes distributed computing infrastructure and its performance heavily depends on the fast and smooth distribution of data between different CMS sites. Data must be transferred from the Tier-0 (CERN) to the Tier-1s for processing, storing and archiving, and time and good quality are vital to avoid overflowing CERN storage buffers. At the same time, processed data has to be distributed from Tier-1 sites to all Tier-2 sites for physics analysis while Monte Carlo simulations sent back to Tier-1 sites for further archival. At the core of all transferring machinery is PhEDEx (Physics Experiment Data Export) data transfer system. It is very important to ensure reliable operation of the system, and the operational tasks comprise monitoring and debugging all transfer issues. Based on transfer quality information Site Readiness tool is used to create plans for resources utilization in the future. We review the operational procedures created to enforce reliable data delivery to CMS distributed sites all over the world. Additionally, we need to keep data and meta-data consistent at all sites and both on disk and on tape. In this presentation, we describe the principles and actions taken to keep data consistent on sites storage systems and central CMS Data Replication Database (TMDB/DBS) while ensuring fast and reliable data samples delivery of hundreds of terabytes to the entire CMS physics community.
DOI: 10.1088/1742-6596/664/6/062031
2015
Cited 6 times
Using the glideinWMS System as a Common Resource Provisioning Layer in CMS
CMS will require access to more than 125k processor cores for the beginning of Run 2 in 2015 to carry out its ambitious physics program with more and higher complexity events. During Run1 these resources were predominantly provided by a mix of grid sites and local batch resources. During the long shut down cloud infrastructures, diverse opportunistic resources and HPC supercomputing centers were made available to CMS, which further complicated the operations of the submission infrastructure. In this presentation we will discuss the CMS effort to adopt and deploy the glideinWMS system as a common resource provisioning layer to grid, cloud, local batch, and opportunistic resources and sites. We will address the challenges associated with integrating the various types of resources, the efficiency gains and simplifications associated with using a common resource provisioning layer, and discuss the solutions found. We will finish with an outlook of future plans for how CMS is moving forward on resource provisioning for more heterogenous architectures and services.
DOI: 10.1088/1742-6596/219/6/062055
2010
Cited 6 times
Debugging data transfers in CMS
The CMS experiment at CERN is preparing for LHC data taking in several computing preparation activities. In early 2007 a traffic load generator infrastructure for distributed data transfer tests was designed and deployed to equip the WLCG tiers which support the CMS virtual organization with a means for debugging, load-testing and commissioning data transfer routes among CMS computing centres. The LoadTest is based upon PhEDEx as a reliable, scalable data set replication system. The Debugging Data Transfers (DDT) task force was created to coordinate the debugging of the data transfer links. The task force aimed to commission most crucial transfer routes among CMS tiers by designing and enforcing a clear procedure to debug problematic links. Such procedure aimed to move a link from a debugging phase in a separate and independent environment to a production environment when a set of agreed conditions are achieved for that link. The goal was to deliver one by one working transfer routes to the CMS data operations team. The preparation, activities and experience of the DDT task force within the CMS experiment are discussed. Common technical problems and challenges encountered during the lifetime of the taskforce in debugging data transfer links in CMS are explained and summarized.
DOI: 10.1088/1742-6596/331/5/052016
2011
Cited 5 times
High Throughput WAN Data Transfer with Hadoop-based Storage
Hadoop distributed file system (HDFS) is becoming more popular in recent years as a key building block of integrated grid storage solution in the field of scientific computing. Wide Area Network (WAN) data transfer is one of the important data operations for large high energy physics experiments to manage, share and process datasets of PetaBytes scale in a highly distributed grid computing environment. In this paper, we present the experience of high throughput WAN data transfer with HDFS-based Storage Element. Two protocols, GridFTP and fast data transfer (FDT), are used to characterize the network performance of WAN data transfer.
DOI: 10.1088/1742-6596/513/6/062028
2014
Cited 5 times
Opportunistic Resource Usage in CMS
CMS is using a tiered setup of dedicated computing resources provided by sites distributed over the world and organized in WLCG. These sites pledge resources to CMS and are preparing them especially for CMS to run the experiment's applications. But there are more resources available opportunistically both on the GRID and in local university and research clusters which can be used for CMS applications. We will present CMS' strategy to use opportunistic resources and prepare them dynamically to run CMS applications. CMS is able to run its applications on resources that can be reached through the GRID, through EC2 compliant cloud interfaces. Even resources that can be used through ssh login nodes can be harnessed. All of these usage modes are integrated transparently into the GlideIn WMS submission infrastructure, which is the basis of CMS' opportunistic resource usage strategy. Technologies like Parrot to mount the software distribution via CVMFS and xrootd for access to data and simulation samples via the WAN are used and will be described. We will summarize the experience with opportunistic resource usage and give an outlook for the restart of LHC data taking in 2015.
2016
Cited 5 times
EUROPEAN ORGANISATION FOR NUCLEAR RESEARCH
We have searched for excited states of charged and neutral leptons, e , , and , in ee collisions at p s = 161 GeV using the OPAL detector at LEP. No evidence for their existence was found. With the most common coupling assumptions, the topologies from excited lepton pair production include `` and `` WW , with the subsequent decay of the virtual W bosons. From the analysis of these topologies, 95% con dence level lower mass limits of 79:9 GeV for e , 80:0 GeV for , 79:1 GeV for , 78:3 GeV for e , 78:9 GeV for and 76:2 GeV for are inferred. From the analysis of W W and topologies with missing energy and using alternative coupling assignments which favour charged ` and photonic decays, 95% con dence level lower mass limits of 77.1 GeV for each ` avour and 77.8 GeV for each avour are inferred. From the analysis of the `` , ` W and single nal states expected from excited lepton single production, upper limits on the ratio of the coupling to the compositeness scale, f= , are determined for excited lepton masses up to the kinematic limit. Submitted to Physics Letters B The OPAL Collaboration K.Ackersta , G.Alexander, J.Allison, N.Altekamp, K.Ametewee, K.J.Anderson, S.Anderson, S.Arcelli, S.Asai, D.Axen, G.Azuelos, A.H.Ball, E.Barberio, R.J. Barlow, R.Bartoldus, J.R.Batley, J. Bechtluft, C.Beeston, T.Behnke, A.N.Bell, K.W.Bell, G.Bella, S. Bentvelsen, P.Berlich, S. Bethke, O.Biebel, V.Blobel, I.J. Bloodworth, J.E.Bloomer, M.Bobinski, P.Bock, H.M.Bosch, M.Boutemeur, B.T.Bouwens, S. Braibant, R.M.Brown, H.J. Burckhart, C.Burgard, R.B urgin, P.Capiluppi, R.K.Carnegie, A.A.Carter, J.R.Carter, C.Y.Chang, D.G.Charlton, D.Chrisman, P.E.L.Clarke, I. Cohen, J.E.Conboy, O.C.Cooke, M.Cu ani, S.Dado, C.Dallapiccola, G.M.Dallavalle, S.De Jong, L.A. del Pozo, K.Desch, M.S.Dixit, E. do Couto e Silva, M.Doucet, E.Duchovni, G.Duckeck, I.P.Duerdoth, J.E.G.Edwards, P.G.Estabrooks, H.G.Evans, M.Evans, F. Fabbri, P. Fath, F. Fiedler, M.Fierro, H.M.Fischer, R. Folman, D.G.Fong, M.Foucher, A. F urtjes, P.Gagnon, J.W.Gary, J.Gascon, S.M.Gascon-Shotkin, N.I.Geddes, C.Geich-Gimbel, T.Geralis, G.Giacomelli, P.Giacomelli, R.Giacomelli, V.Gibson, W.R.Gibson, D.M.Gingrich, D.Glenzinski, J.Goldberg, M.J.Goodrick, W.Gorn, C.Grandi, E.Gross, J.Grunhaus M.Gruw e, C.Hajdu, G.G.Hanson, M.Hansroul, M.Hapke, C.K.Hargrove, P.A.Hart, C.Hartmann, M.Hauschild, C.M.Hawkes, R.Hawkings, R.J.Hemingway, M.Herndon, G.Herten, R.D.Heuer, M.D.Hildreth, J.C.Hill, S.J.Hillier, T.Hilse, P.R.Hobson, R.J.Homer, A.K.Honma, D.Horv ath, R.Howard, R.E.Hughes-Jones, D.E.Hutchcroft, P. Igo-Kemenes, D.C. Imrie, M.R. Ingram, K. Ishii, A. Jawahery, P.W. Je reys, H. Jeremie, M. Jimack, A. Joly, C.R. Jones, G. Jones, M. Jones, R.W.L. Jones, U. Jost, P. Jovanovic, T.R. Junk, D.Karlen, K.Kawagoe, T.Kawamoto, R.K.Keeler, R.G.Kellogg, B.W.Kennedy, B.J.King, J.Kirk, S.Kluth, T.Kobayashi, M.Kobel, D.S.Koetke, T.P.Kokott, M.Kolrep, S.Komamiya, T.Kress, P.Krieger, J. von Krogh, P.Kyberd, G.D. La erty, R. Lahmann, W.P. Lai, D. Lanske, J. Lauber, S.R. Lautenschlager, J.G. Layter, D. Lazic, A.M. Lee, E. Lefebvre, D. Lellouch, J. Letts, L. Levinson, C. Lewis, S.L. Lloyd, F.K. Loebinger, G.D. Long, M.J. Losty, J. Ludwig, M.Mannelli, S.Marcellini, C.Markus, A.J.Martin, J.P.Martin, G.Martinez, T.Mashimo, W.Matthews, P.Mattig, W.J.McDonald, J.McKenna, E.A.Mckigney, T.J.McMahon, A.I.McNab, R.A.McPherson, F.Meijers, S.Menke, F.S.Merritt, H.Mes, J.Meyer, A.Michelini, G.Mikenberg, D.J.Miller, R.Mir, W.Mohr, A.Montanari, T.Mori, M.Morii, U.M uller, K.Nagai, I. Nakamura, H.A.Neal, B.Nellen, B.Nijjhar, R.Nisius, S.W.O'Neale, F.G.Oakham, F.Odorici, H.O.Ogren, N.J.Oldershaw, T.Omori, M.J.Oreglia, S.Orito, J. P alink as, G. P asztor, J.R. Pater, G.N.Patrick, J. Patt, M.J. Pearce, S. Petzold, P. Pfeifenschneider, J.E. Pilcher, J. Pinfold, D.E. Plane, P. Po enberger, B. Poli, A. Posthaus, H. Przysiezniak, D.L.Rees, D.Rigby, S. Robertson, S.A.Robins, N.Rodning, J.M.Roney, A.Rooke, E.Ros, A.M.Rossi, M.Rosvick, P.Routenburg, Y.Rozen, K.Runge, O.Runolfsson, U.Ruppel, D.R.Rust, R.Rylko, K. Sachs, E.K.G. Sarkisyan, M. Sasaki, C. Sbarra, A.D. Schaile, O. Schaile, F. Scharf, P. Schar -Hansen, P. Schenk, B. Schmitt, S. Schmitt, M. Schroder, H.C. Schultz-Coulon, M. Schulz, M. Schumacher, P. Sch utz, W.G. Scott, T.G. Shears, B.C. Shen, C.H. Shepherd-Themistocleous, P. Sherwood, G.P. Siroli, A. Sittler,
DOI: 10.1088/1742-6596/898/5/052031
2017
Cited 5 times
Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits
The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.
DOI: 10.1016/j.procs.2015.05.190
2015
Cited 5 times
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoid (CMS) Experiment at LHC
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process es-timates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
DOI: 10.1088/1742-6596/219/7/072007
2010
Cited 5 times
CMS analysis operations
During normal data taking CMS expects to support potentially as many as 2000 analysis users. Since the beginning of 2008 there have been more than 800 individuals who submitted a remote analysis job to the CMS computing infrastructure. The bulk of these users will be supported at the over 40 CMS Tier-2 centres. Supporting a globally distributed community of users on a globally distributed set of computing clusters is a task that requires reconsidering the normal methods of user support for Analysis Operations. In 2008 CMS formed an Analysis Support Task Force in preparation for large-scale physics analysis activities. The charge of the task force was to evaluate the available support tools, the user support techniques, and the direct feedback of users with the goal of improving the success rate and user experience when utilizing the distributed computing environment. The task force determined the tools needed to assess and reduce the number of non-zero exit code applications submitted through the grid interfaces and worked with the CMS experiment dashboard developers to obtain the necessary information to quickly and proactively identify issues with user jobs and data sets hosted at various sites. Results of the analysis group surveys were compiled. Reference platforms for testing and debugging problems were established in various geographic regions. The task force also assessed the resources needed to make the transition to a permanent Analysis Operations task. In this presentation the results of the task force will be discussed as well as the CMS Analysis Operations plans for the start of data taking.
DOI: 10.1088/1742-6596/219/6/062047
2010
Cited 4 times
The commissioning of CMS sites: Improving the site reliability
The computing system of the CMS experiment works using distributed resources from more than 60 computing centres worldwide. These centres, located in Europe, America and Asia are interconnected by the Worldwide LHC Computing Grid. The operation of the system requires a stable and reliable behaviour of the underlying infrastructure. CMS has established a procedure to extensively test all relevant aspects of a Grid site, such as the ability to efficiently use their network to transfer data, the functionality of all the site services relevant for CMS and the capability to sustain the various CMS computing workflows at the required scale. This contribution describes in detail the procedure to rate CMS sites depending on their performance, including the complete automation of the program, the description of monitoring tools, and its impact in improving the overall reliability of the Grid from the point of view of the CMS computing system.
DOI: 10.1088/1742-6596/664/6/062030
2015
Cited 3 times
Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before Run 2
The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. The major challenge of this unification activity is scale. The combined pool size is expected to reach 200K job slots, which is significantly bigger than any other multi-user HTCondor based system currently in production. To get there we have studied scaling limitations in our existing pools, the biggest of which tops out at about 70K slots, providing valuable feedback to the development communities, who have responded by delivering improvements which have helped us reach higher and higher scales with more stability. We have also worked on improving the organization and support model for this critical service during Run 2 of the LHC. This contribution will present the results of the scale testing and experiences from the first months of running the Global Pool.
2005
Cited 6 times
Distributed computing grid experiences in CMS DC04
DOI: 10.1109/nssmic.2008.4774771
2008
Cited 4 times
The commissioning of CMS computing centres in the worldwide LHC computing Grid
The computing system of the CMS experiment uses distributed resources from more than 60 computing centres worldwide. Located in Europe, America and Asia, these centres are interconnected by the Worldwide LHC Computing Grid. The operation of the system requires a stable and reliable behavior of the underlying infrastructure. CMS has established a procedure to extensively test all relevant aspects of a Grid site, such as the ability to efficiently use their network to transfer data, services relevant for CMS and the capability to sustain the various CMS computing workflows (Monte Carlo simulation, event reprocessing and skimming, data analysis) at the required scale. This contribution describes in detail the procedure to rate CMS sites depending on their performance, including the complete automation of the program, the description of monitoring tools, and its impact in improving the overall reliability of the Grid from the point of view of the CMS computing system.
DOI: 10.1109/nssmic.2008.4775085
2008
Cited 4 times
The CMS data transfer test environment in preparation for LHC data taking
The CMS experiment is preparing for LHC data taking in several computing preparation activities. In distributed data transfer tests, in early 2007 a traffic load generator infrastructure was designed and deployed, to equip the WLCG Tiers which support the CMS Virtual Organization with a means for debugging, load-testing and commissioning data transfer routes among CMS Computing Centres. The LoadTest is based upon PhEDEx as a reliable, scalable dataset replication system. In addition, a Debugging Data Transfers (DDT) Task Force was created to coordinate the debugging of data transfer links in the preparation period and during the Computing Software and Analysis challenge in 2007 (CSA07). The task force aimed to commission most crucial transfer routes among CMS tiers by designing and enforcing a clear procedure to debug problematic links. Such procedure aimed to move a link from a debugging phase in a separate and independent environment to a production environment when a set of agreed conditions are achieved for that link. The goal was to deliver one by one working transfer routes to Data Operations. The experiences with the overall test transfers infrastructure within computing challenges - as in the WLCG Common-VO Computing Readiness Challenge (CCRC’08) - as well as in daily testing and debugging activities are reviewed and discussed, and plans for the future are presented.
DOI: 10.1088/1742-6596/513/3/032041
2014
Cited 3 times
Evolution of the pilot infrastructure of CMS: towards a single glideinWMS pool
CMS production and analysis job submission is based largely on glideinWMS and pilot submissions. The transition from multiple different submission solutions like gLite WMS and HTCondor-based implementations was carried out over years and is coming now to a conclusion. The historically explained separate glideinWMS pools for different types of production jobs and analysis jobs are being unified into a single global pool. This enables CMS to benefit from global prioritization and scheduling possibilities. It also presents the sites with only one kind of pilots and eliminates the need of having to make scheduling decisions on the CE level. This paper provides an analysis of the benefits of a unified resource pool, as well as a description of the resulting global policy. It will explain the technical challenges moving forward and present solutions to some of them.
DOI: 10.1145/2484762.2484834
2013
Cited 3 times
Using Gordon to accelerate LHC science
The discovery of the Higgs boson by the Large Hadron Collider (LHC) has garnered international attention. In addition to this singular result, the LHC may also uncover other fundamental particles, including dark matter. Much of this research is being done on data from one of the LHC experiments, the Compact Muon Solenoid (CMS). The CMS experiment was able to capture data at higher sampling frequencies than planned during the 2012 LHC operational period. The resulting data had been parked, waiting to be processed on CMS computers. While CMS has significant compute resources, by partnering with SDSC to incorporate Gordon into the CMS workflow, analysis of the parked data was completed months ahead of schedule. This allows scientists to review the results more quickly, and could guide future plans for the LHC.
DOI: 10.1051/epjconf/201921403006
2019
Cited 3 times
Improving efficiency of analysis jobs in CMS
Hundreds of physicists analyze data collected by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider using the CMS Remote Analysis Builder and the CMS global pool to exploit the resources of the Worldwide LHC Computing Grid. Efficient use of such an extensive and expensive resource is crucial. At the same time, the CMS collaboration is committed to minimizing time to insight for every scientist, by pushing for fewer possible access restrictions to the full data sample and supports the free choice of applications to run on the computing resources. Supporting such variety of workflows while preserving efficient resource usage poses special challenges. In this paper we report on three complementary approaches adopted in CMS to improve the scheduling efficiency of user analysis jobs: automatic job splitting, automated run time estimates and automated site selection for jobs.
DOI: 10.1007/s100520100722
2001
Cited 6 times
Direct determination of the CKM matrix from decays of W Bosons and top quarks at high energy $\mathbf{e}^+\mathbf{e}^-$ colliders
At proposed high energy linear $\mathrm{e}^+\mathrm{e}^-$ colliders a large number of W bosons and top quarks will be produced. We evaluate the potential precision to which the decay branching ratios into the various quark species can be measured, implying also the determination of the respective CKM matrix elements. Crucial is the identification of the individual quark flavours, which can be achieved independent of QCD models. For transitions involving up quarks the accuracy is of the same order of magnitude as has been reached in hadron decays. We estimate that for charm transitions a precision can be reached that is superior to current and projected traditional kinds of measurements. The $\mathrm{t}\to \mathrm{b}$ determination will be significantly improved, and for the first time a direct measurement of the $\mathrm{t}\to \mathrm{s}$ transition can be made. In all cases such a determination is complementary to the traditional way of extracting the CKM matrix elements.
DOI: 10.1088/1742-6596/331/7/072023
2011
Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS
Tier-2 to Tier-2 data transfers have been identified as a necessary extension of the CMS computing model. The Debugging Data Transfers (DDT) Task Force in CMS was charged with commissioning Tier-2 to Tier-2 PhEDEx transfer links beginning in late 2009, originally to serve the needs of physics analysis groups for the transfer of their results between the storage elements of the Tier-2 sites associated with the groups. PhEDEx is the data transfer middleware of the CMS experiment. For analysis jobs using CRAB, the CMS Remote Analysis Builder, the challenges of remote stage out of job output at the end of the analysis jobs led to the introduction of a local fallback stage out, and will eventually require the asynchronous transfer of user data over essentially all of the Tier-2 to Tier-2 network using the same PhEDEx infrastructure. In addition, direct file sharing of physics and Monte Carlo simulated data between Tier-2 sites can relieve the operational load of the Tier-1 sites in the original CMS Computing Model, and already represents an important component of CMS PhEDEx data transfer volume. The experience, challenges and methods used to debug and commission the thousands of data transfers links between CMS Tier-2 sites world-wide are explained and summarized. The resulting operational experience with Tier-2 to Tier-2 transfers is also presented.
DOI: 10.1088/1742-6596/513/3/032006
2014
Using ssh as portal – The CMS CRAB over glideinWMS experience
The User Analysis of the CMS experiment is performed in distributed way using both Grid and dedicated resources. In order to insulate the users from the details of computing fabric, CMS relies on the CRAB (CMS Remote Analysis Builder) package as an abstraction layer. CMS has recently switched from a client-server version of CRAB to a purely client-based solution, with ssh being used to interface with HTCondor-based glideinWMS batch system. This switch has resulted in significant improvement of user satisfaction, as well as in significant simplification of the CRAB code base and of the operation support. This paper presents the architecture of the ssh-based CRAB package, the rationale behind it, as well as the operational experience running both the client-server and the ssh-based versions in parallel for several months.
DOI: 10.48550/arxiv.2312.00772
2023
The U.S. CMS HL-LHC R&D Strategic Plan
The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.
DOI: 10.1088/1742-6596/898/5/052030
2017
CMS readiness for multi-core workload scheduling
In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
DOI: 10.2172/1436702
2018
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
DOI: 10.1051/epjconf/201921403056
2019
Improving the Scheduling Efficiency of a Global Multi-Core HTCondor Pool in CMS
Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the scheduling efficiency of workflows in reusable multi-core pilots by various improvements to the limitations of the GlideinWMS pilots, accuracy of resource requests, efficiency and speed of the HTCondor infrastructure, and job matching algorithms.
DOI: 10.1002/ajum.12319
2022
The correlation between different ultrasound planes and computed tomography measures of abdominal aortic aneurysms
Ultrasound measurements of the aorta are typically taken in the axial plane, with the transducer perpendicular to the aorta, and diameter measurements are obtained by placing the callipers from the anterior to the posterior wall and the transverse right to the left side of the aorta. While the 'conventional' anteroposterior walls in both sagittal and transverse plains may be suitable for aneurysms with less complicated geometry, there is controversy regarding the suitability of this approach for complicated, particularly tortuous aneurysms, as they may offer a more challenging situation. Previous work undertaken within our research group found that when training inexperienced users of ultrasound, they demonstrated more optimal calliper placement to the abdominal aorta when approached from a decubitus window to obtain a coronal image compared to the traditional ultrasound approach.To observe the level of agreement in real-world reporting between computed tomography (CT) and ultrasound measurements in three standard planes; transverse AP, sagittal AP and coronal (left to right) infra-renal abdominal aortic aneurysm (AAA) diameter.This is a retrospective review of the Otago Vascular Diagnostics database for AAA, where ultrasound and CT diameter data, available within 90 days of each other, were compared. In addition to patient demographics, the infrarenal aorta ultrasound diameter measurements in transverse AP and sagittal AP, along with a coronal decubitus image of the aorta was collected. No transverse measurement was performed from the left to the right of the aorta.Three hundred twenty-five participants (238 males, mean age 76.4 ± 7.5) were included. Mean ultrasound outer to the outer wall, transverse AP and sagittal AP diameters were 48.7 ± 10.5 mm and 48.9 ± 9.9 mm, respectively. The coronal diameter measurement of the aorta from left to right was 53.9 ± 12.8 mm in the left decubitus window. The mean ultrasound max was 54.3 ± 12.6 mm. The mean CT diameter measurement was 55.6 ± 12.7 mm. Correlation between the CT max and ultrasound max was r2 = 0.90, and CT with the coronal measurement r2 = 0.90, CT and AP transverse was r2=0.80, and CT with AP sagittal measurement was r2 = 0.77.The decubitus ultrasound window of the abdominal aorta, with measurement of the coronal plane, is highly correlated and in agreement with CT scanning. This window may offer an alternative approach to measuring the infrarenal abdominal aortic aneurysm and should be considered when performing surveillance of all infra-renal AAA.
DOI: 10.1088/1742-6596/664/6/062046
2015
Evolution of CMS workload management towards multicore job support
The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single and multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.
DOI: 10.1088/1742-6596/219/6/062015
2010
Bringing the CMS distributed computing system into scalable operations
Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
DOI: 10.1088/1742-6596/898/5/052037
2017
Connecting Restricted, High-Availability, or Low-Latency Resources to a Seamless Global Pool for CMS
The connection of diverse and sometimes non-Grid enabled resource types to the CMS Global Pool, which is based on HTCondor and glideinWMS, has been a major goal of CMS. These resources range in type from a high-availability, low latency facility at CERN for urgent calibration studies, called the CAF, to a local user facility at the Fermilab LPC, allocation-based computing resources at NERSC and SDSC, opportunistic resources provided through the Open Science Grid, commercial clouds, and others, as well as access to opportunistic cycles on the CMS High Level Trigger farm. In addition, we have provided the capability to give priority to local users of beyond WLCG pledged resources at CMS sites. Many of the solutions employed to bring these diverse resource types into the Global Pool have common elements, while some are very specific to a particular project. This paper details some of the strategies and solutions used to access these resources through the Global Pool in a seamless manner.
DOI: 10.1088/1742-6596/898/8/082032
2017
CMS Connect
The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from opportunistic resources for these kind of tasks and even though production and event processing analysis workflows are already managed by existing tools, there is still a lack of support to submit final stage condor-like analysis jobs familiar to Tier-3 or local Computing Facilities users into these distributed resources in an integrated (with other CMS services) and friendly way. CMS Connect is a set of computing tools and services designed to augment existing services in the CMS Physics community focusing on these kind of condor analysis jobs. It is based on the CI-Connect platform developed by the Open Science Grid and uses the CMS GlideInWMS infrastructure to transparently plug CMS global grid resources into a virtual pool accessed via a single submission machine. This paper describes the specific developments and deployment of CMS Connect beyond the CI-Connect platform in order to integrate the service with CMS specific needs, including specific Site submission, accounting of jobs and automated reporting to standard CMS monitoring resources in an effortless way to their users.
DOI: 10.22323/1.070.0043
2009
The commissioning of CMS computing centres in the WLCG Grid
DOI: 10.1051/epjconf/201921403002
2019
Exploring GlideinWMS and HTCondor scalability frontiers for an expanding CMS Global Pool
The CMS Submission Infrastructure Global Pool, built on Glidein-WMS andHTCondor, is a worldwide distributed dynamic pool responsible for the allocation of resources for all CMS computing workloads. Matching the continuously increasing demand for computing resources by CMS requires the anticipated assessment of its scalability limitations. In addition, the Global Plmust be able to expand in a more heterogeneous environment, in terms of resource provisioning (combining Grid, HPC and Cloud) and workload submissi.A dedicated testbed has been set up to simulate such conditions with the purpose of finding potential bottlenecks in the software or its configuration. This report provides a thorough description of the various scalabilitydimensions in size and complexity that are being explored for the future Global Pool, along with the analysis and solutions to the limitations proposed with the support of the GlideinWMS and HTCondor developer teams.
DOI: 10.1051/epjconf/202024503016
2020
Evolution of the CMS Global Submission Infrastructure for the HL-LHC Era
Efforts in distributed computing of the CMS experiment at the LHC at CERN are now focusing on the functionality required to fulfill the projected needs for the HL-LHC era. Cloud and HPC resources are expected to be dominant relative to resources provided by traditional Grid sites, being also much more diverse and heterogeneous. Handling their special capabilities or limitations and maintaining global flexibility and efficiency, while also operating at scales much higher than the current capacity, are the major challenges being addressed by the CMS Submission Infrastructure team. These proceedings discuss the risks to the stability and scalability of the CMS HTCondor infrastructure extrapolated to such a scenario, thought to be derived mostly from its growing complexity, with multiple Negotiators and schedulers flocking work to multiple federated pools. New mechanisms for enhanced customization and control over resource allocation and usage, mandatory in this future scenario, are also described.
DOI: 10.1007/s002880050311
1997
Cited 3 times
A new method to determine the electroweak couplings of individual light flavours at LEP
A method is presented for determining the yields and properties of individual light quark flavours in Z 0 decays that is essentially free of detailed assumptions about hadronisation. The method uses an equation system with the number of events which are single and double tagged by high energy hadrons as inputs. In addition, SU(2) isospin symmetry and the flavour independence of QCD are used to derive general relations between hadron production from the various primary light quarks. Assuming the branching fractions R q of the Z 0 into down and strange quarks to be the same, five million hadronic Z 0 decays may allow precisions of δ(R d = R s)/(R d = R s) ∼ 0.05 and δA FB(d = s) ∼ δA fb(u) ∼ 0.015 for the corresponding asymmetries. The method can be extended to include somewhat more model dependent symmetries of hadron production, which then allows the electroweak observables for each of the individual light quarks to be determined.
2003
The CMS Integration Grid Testbed
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accomplished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distrib ution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In this paper, we describe the process that led to one of the world's first continuously available, functioning grids.
DOI: 10.1088/1742-6596/513/3/032086
2014
CMS experience of running glideinWMS in High Availability mode
The CMS experiment at the Large Hadron Collider is relying on the HTCondor-based glideinWMS batch system to handle most of its distributed computing needs. In order to minimize the risk of disruptions due to software and hardware problems, and also to simplify the maintenance procedures, CMS has set up its glideinWMS instance to use most of the attainable High Availability (HA) features. The setup involves running services distributed over multiple nodes, which in turn are located in several physical locations, including Geneva (Switzerland), Chicago (Illinois, USA) and San Diego (California, USA). This paper describes the setup used by CMS, the HA limits of this setup, as well as a description of the actual operational experience spanning many months.
DOI: 10.1088/1742-6596/396/3/032102
2012
Controlled overflowing of data-intensive jobs from oversubscribed sites
The CMS analysis computing model was always relying on jobs running near the data, with data allocation between CMS compute centers organized at management level, based on expected needs of the CMS community. While this model provided high CPU utilization during job run times, there were times when a large fraction of CPUs at certain sites were sitting idle due to lack of demand, all while Terabytes of data were never accessed. To improve the utilization of both CPU and disks, CMS is moving toward controlled overflowing of jobs from sites that have data but are oversubscribed to others with spare CPU and network capacity, with those jobs accessing the data through real time Xrootd streaming over WAN. The major limiting factor for remote data access is the ability of the source storage system to serve such data, so the number of jobs accessing it must be carefully controlled. The CMS approach to this is to implement the overflowing by means of glideinWMS, a Condor based pilot system, and by providing the WMS with the known storage limits and let it schedule jobs within those limits. This paper presents the detailed architecture of the overflow-enabled glideinWMS system, together with operational experience of the past 6 months.
DOI: 10.1088/1742-6596/331/7/072051
2011
Measuring and understanding computer resource utilization in CMS
Significant funds are expended in order to make CMS data analysis possible across Tier-2 and Tier-3 resources worldwide. Here we review how CMS monitors operational success in using those resources, identifies and understands problems, monitors trends, provides feedback to site operators and software developers, and generally accumulates quantitative data on the operational aspects of CMS data analysis. This includes data transfers, data distribution, use of data and software releases for analysis, failure analysis and more.
DOI: 10.1088/1742-6596/898/9/092039
2017
Effective HTCondor-based monitoring system for CMS
The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning systems, respectively. Given the scale of the global queue in CMS, the operators found it increasingly difficult to monitor the pool to find problems and fix them. The operators had to rely on several different web pages, with several different levels of information, and sift tirelessly through log files in order to monitor the pool completely. Therefore, coming up with a suitable monitoring system was one of the crucial items before the beginning of the LHC Run 2 in order to ensure early detection of issues and to give a good overview of the whole pool. Our new monitoring page utilizes the HTCondor ClassAd information to provide a complete picture of the whole submission infrastructure in CMS. The monitoring page includes useful information from HTCondor schedulers, central managers, the glideinWMS frontend, and factories. It also incorporates information about users and tasks making it easy for operators to provide support and debug issues.
DOI: 10.1142/9789812819093_0076
2008
CMS DATA AND WORKFLOW MANAGEMENT SYSTEM
DOI: 10.1111/1754-9485.13496
2022
Catheter‐directed thrombolysis for lower limb ischaemia: A retrospective study of treatment outcomes
Lower limb ischaemia secondary to occlusion of a lower limb artery is a limb-threatening condition that can be effectively treated by catheter-directed thrombolysis (CDT). The purpose of this study was to examine treatment outcomes of CDT both at the time of treatment and ongoing patency up to 12 months following treatment. The secondary aim of the study was to investigate the influence of age of occlusion and treatment duration on success and complication rates.A retrospective observational study was performed at a single institution over a 10-year period from 2010 to 2019. Data for patient demographics, vessel occlusion factors and treatment information were obtained and analysed. Patency data were investigated using Kaplan-Meier analyses.A total of 218 limbs in 159 patients were treated during the study period. The aetiology of vessel occlusion was in situ thrombosis or occluded bypass graft in 74.5%. Technical success was achieved in 55.5% with CDT alone and 84.4% by using CDT in combination with adjunctive endovascular procedures (angioplasty or stenting). The overall probability of patency was 0.65 at 3 months and 0.44 at 12 months. The overall rate of major amputation within 30 days of thrombolysis was 8.2%. Thirty-day mortality was 6.3% and was secondary to intracranial haemorrhage in three patients.Technical success of CDT was found to be significantly higher when combined with adjunctive endovascular procedures at the time of CDT. Despite an initial moderate technical success, the probability of patency at 12 months was only 44%. The likelihood of bleeding complications and technical and long-term success remain key considerations when selecting patients for CDT.
DOI: 10.1051/epjconf/201921403004
2019
Producing Madgraph5_aMC@NLO gridpacks and using TensorFlow GPU resources in the CMS HTCondor Global Pool
The CMS experiment has an HTCondor Global Pool, composed of more than 200K CPU cores available for Monte Carlo production and the analysis of da.The submission of user jobs to this pool is handled by either CRAB, the standard workflow management tool used by CMS users to submit analysis jobs requiring event processing of large amounts of data, or by CMS Connect, a service focused on final stage condor-like analysis jobs and applications that already have a workflow job manager in place. The latest scenario canbring cases in which workflows need further adjustments in order to efficiently work in a globally distributed pool of resources. For instance, the generation of matrix elements for high energy physics processes via Madgraph5_aMC@NLO and the usage of tools not (yet) fully supported by the CMS software, such as Ten-sorFlow with GPUsupport, are tasks with particular requirements. A special adaption, either at the pool factory level (advertising GPU resources) or at the execute level (e.g: to handle special parameters that describe certain needs for the remote execute nodes during submission) is needed in order to adequately work in the CMS global pool. This contribution describes the challenges and efforts performed towards adaptingsuch workflows so they can properly profit from the Global Pool via CMS Connect.
DOI: 10.1051/epjconf/202024507005
2020
A Lightweight Door into Non-Grid Sites
The Open Science Grid (OSG) provides a common service for resource providers and scientific institutions, and supports sciences such as High Energy Physics, Structural Biology, and other community sciences. As scientific frontiers expand, so does the need for resources to analyze new data. For example, High Energy Physics experiments such as the LHC experiments foresee an exponential growth in the amount of data collected, which comes with corresponding growth in the need for computing resources. Allowing resource providers an easy way to share their resources is paramount to ensure the grow of resources available to scientists. In this context, the OSG Hosted CE initiative provides site administrator a way to reduce the effort needed to install and maintain a Compute Element (CE), and represents a solution for sites who do not have the effort and expertise to run their own Grid middleware. An HTCondor Compute Element is installed on a remote VM at UChicago for each site that joins the Hosted CE initiative. The hardware/software stack is maintained by OSG Operations staff in a homogeneus and automated way, providing a reduction in the overall operational effort needed to maintain the CEs: one single organization does it in an uniform way, instead of each single resource provider doing it in their own way. Currently, more than 20 institutions joined the Hosted CE initiative. This contribution discusses the technical details behind a Hosted CE installation, highlighting key strengths and common pitfalls, and outlining future plans to further reduce operational experience.
DOI: 10.1051/epjconf/202024503023
2020
Exploiting CRIC to streamline the configuration management of GlideinWMS factories for CMS support
GlideinWMS is a workload management and provisioning system that allows sharing computing resources distributed over independent sites. Based on the requests made by GlideinWMS frontends, a dynamically sized pool of resources is created by GlideinWMS pilot factories via pilot job submission to resource sites’ CEs. More than 400 CEs are currently serving more than ten virtual organizations through GlideinWMS, with CMS being the biggest user with 230 CEs. The complex configurations of the parameters defining resource requests, as submitted to those CEs, have been historically managed by manually editing a set of different XML files. New possibilities arise with CMS adopting the CRIC, an information system that collects, aggregates, stores, and exposes, among other things, computing resource data coming from various data providers. The paper will describe the challenges faced when CMS started to use CRIC to automatically generate the GlideinWMS factory configurations. The architecture of the prototype, and the ancillary tools developed to ease this transition, will be discussed. Finally, future plans and milestones will be outlined.
DOI: 10.48550/arxiv.1804.03983
2018
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
DOI: 10.5170/cern-2005-002.1074
2004
Distributed Grid Experiences in CMS DC04
1993
A Measurement of strange baryon production and correlations in hadronic Z0 decays
2003
The cms integration grid testbed
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-2 and Tier-2 hardware at the following sites: the California Institute of Technology, Fermi National Accelerator Laboratory, the University of California at San Diego, and the University of Florida at Gainesville. The IGT runs jobs using the Globus Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is managed using VO management scripts from the European Data Grid (EDG). Gridwide monitoring is accompolished using local tools such as Ganglia interfaced into the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa. Domain specific software is packaged and installed using the Distribution After Release (DAR) tool of CMS, while middleware under the auspices of the Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuous two month span in Fall of 2002, over 1 million official CMS GEANT based Monte Carlo events were generated and returned to CERN for analysis while being demonstrated at SC2002. In thie paper, we describe the process that led to one of the world's first continuously available functioning grids.
DOI: 10.1063/1.1394384
2001
Direct Measurement of the CKM Matrix at High Energy e[sup +]e[sup −] Colliders
We evaluate the potential precision to which the hadronic decay branching fractions of W bosons and top quarks (and hence the CKM matrix elements) can be measured at proposed high energy linear e+e− colliders. Identification of the individual light quark flavors is achieved in a model-independent way by calibration in Z0 decays. The method is competitive with and complementary to the traditional way of extracting the CKM matrix elements from hadron decay measurements.
1998
The Chicago Public Schools (CPS)/University of Chicago (UC) Internet Project (CUIP)
1996
A Measurement of Strange Baryon Production in Hadronic Z^0 Decays with the OPAL Detector at LEP
1993
A measurement of F (Z ~ bb)/F (Z hadrons) an impact parameter technique using
DOI: 10.5170/cern-1994-004.461
1994
A measurement of strange baryon production with the OPAL detector
1993
A measurement of the forward-backward asymmetry of e+e−→cc¯ and e+e−→bb¯ at centre of mass energies on and near theZ 0 peak usingD *± mesons
1993
a Measurement of Strange Baryon Production and Correlations in Hadronic Neutral Z Boson Decays