ϟ

Maciej Malawski

Here are all the papers by Maciej Malawski that you can download and read on OA.mg.
Maciej Malawski’s last known institution is . Download Maciej Malawski PDFs here.

Claim this Profile →
DOI: 10.1016/j.future.2015.01.004
2015
Cited 198 times
Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds
Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under budget and deadline constraints on Infrastructure as a Service (IaaS) clouds. IaaS clouds are characterized by on-demand resource provisioning capabilities and a pay-per-use model. We discuss, develop, and assess novel algorithms based on static and dynamic strategies for both task scheduling and resource provisioning. We perform the evaluation via simulation using a set of scientific workflow ensembles with a broad range of budget and deadline parameters, taking into account task granularity, uncertainties in task runtime estimations, provisioning delays, and failures. We find that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution. Our results show that an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.
DOI: 10.1016/j.future.2017.10.029
2020
Cited 117 times
Serverless execution of scientific workflows: Experiments with HyperFlow, AWS Lambda and Google Cloud Functions
Scientific workflows consisting of a high number of interdependent tasks represent an important class of complex scientific applications. Recently, a new type of serverless infrastructures has emerged, represented by such services as Google Cloud Functions and AWS Lambda, also referred to as the Function-as-a-Service model. In this paper we take a look at such serverless infrastructures, which are designed mainly for processing background tasks of Web and Internet of Things applications, or event-driven stream processing. We evaluate their applicability to more compute- and data-intensive scientific workflows and discuss possible ways to repurpose serverless architectures for execution of scientific workflows. We have developed prototype workflow executor functions using AWS Lambda and Google Cloud Functions, coupled with the HyperFlow workflow engine. These functions can run workflow tasks in AWS and Google infrastructures, and feature such capabilities as data staging to/from S3 or Google Cloud Storage and execution of custom application binaries. We have successfully deployed and executed the Montage astronomy workflow, often used as a benchmark, and we report on initial results of its performance evaluation. Our findings indicate that the simple mode of operation makes this approach easy to use, although there are costs involved in preparing portable application binaries for execution in a remote environment. While our solution is an early prototype, we find the presented approach highly promising. We also discuss possible future steps related to execution of scientific workflows in serverless infrastructures. Finally, we perform a cost analysis and discuss implications with regard to resource management for scientific applications in general.
DOI: 10.1109/sc.2012.38
2012
Cited 172 times
Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds
Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under budget and deadline constraints on Infrastructure- as-aService (IaaS) clouds. We discuss, develop, and assess algorithms based on static and dynamic strategies for both task scheduling and resource provisioning. We perform the evaluation via simulation using a set of scientific workflow ensembles with a broad range of budget and deadline parameters, taking into account uncertainties in task runtime estimations, provisioning delays, and failures. We find that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution. Our results show that an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.
DOI: 10.1140/epjc/s10052-019-6567-0
2019
Cited 88 times
First measurement of elastic, inelastic and total cross-section at $$\sqrt{s}=13$$ s = 13 TeV by TOTEM and overview of cross-section data at LHC energies
The TOTEM collaboration has measured the proton–proton total cross section at $$\sqrt{s}=13~\hbox {TeV}$$ with a luminosity-independent method. Using dedicated $$\beta ^{*}=90~\hbox {m}$$ beam optics, the Roman Pots were inserted very close to the beam. The inelastic scattering rate has been measured by the T1 and T2 telescopes during the same LHC fill. After applying the optical theorem the total proton–proton cross section is $$\sigma _\mathrm{tot}=(110.6~\pm ~3.4$$ ) mb, well in agreement with the extrapolation from lower energies. This method also allows one to derive the luminosity-independent elastic and inelastic cross sections: $$\sigma _\mathrm{el}=(31.0~\pm ~1.7)~\hbox {mb}$$ and $$\sigma _\mathrm{inel}=(79.5~\pm ~1.8)~\hbox {mb}$$ .
DOI: 10.1140/epjc/s10052-019-7223-4
2019
Cited 82 times
First determination of the $${\rho }$$ parameter at $${\sqrt{s} = 13}$$ TeV: probing the existence of a colourless C-odd three-gluon compound state
Abstract The TOTEM experiment at the LHC has performed the first measurement at $$\sqrt{s} = 13\,\mathrm{TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msqrt><mml:mi>s</mml:mi></mml:msqrt><mml:mo>=</mml:mo><mml:mn>13</mml:mn><mml:mspace /><mml:mi>TeV</mml:mi></mml:mrow></mml:math> of the $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>ρ</mml:mi></mml:math> parameter, the real to imaginary ratio of the nuclear elastic scattering amplitude at $$t=0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:math> , obtaining the following results: $$\rho = 0.09 \pm 0.01$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>ρ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.09</mml:mn><mml:mo>±</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math> and $$\rho = 0.10 \pm 0.01$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>ρ</mml:mi><mml:mo>=</mml:mo><mml:mn>0.10</mml:mn><mml:mo>±</mml:mo><mml:mn>0.01</mml:mn></mml:mrow></mml:math> , depending on different physics assumptions and mathematical modelling. The unprecedented precision of the $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>ρ</mml:mi></mml:math> measurement, combined with the TOTEM total cross-section measurements in an energy range larger than $$10\,\mathrm{TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mn>10</mml:mn><mml:mspace /><mml:mi>TeV</mml:mi></mml:mrow></mml:math> (from 2.76 to $$13\,\mathrm{TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mn>13</mml:mn><mml:mspace /><mml:mi>TeV</mml:mi></mml:mrow></mml:math> ), has implied the exclusion of all the models classified and published by COMPETE. The $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>ρ</mml:mi></mml:math> results obtained by TOTEM are compatible with the predictions, from other theoretical models both in the Regge-like framework and in the QCD framework, of a crossing-odd colourless 3-gluon compound state exchange in the t -channel of the proton–proton elastic scattering. On the contrary, if shown that the crossing-odd 3-gluon compound state t -channel exchange is not of importance for the description of elastic scattering, the $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>ρ</mml:mi></mml:math> value determined by TOTEM would represent a first evidence of a slowing down of the total cross-section growth at higher energies. The very low-| t | reach allowed also to determine the absolute normalisation using the Coulomb amplitude for the first time at the LHC and obtain a new total proton–proton cross-section measurement $$\sigma _{\mathrm{tot}} = (110.3 \pm 3.5)\,\mathrm{mb}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msub><mml:mi>σ</mml:mi><mml:mi>tot</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>110.3</mml:mn><mml:mo>±</mml:mo><mml:mn>3.5</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace /><mml:mi>mb</mml:mi></mml:mrow></mml:math> , completely independent from the previous TOTEM determination. Combining the two TOTEM results yields $$\sigma _{\mathrm{tot}} = (110.5 \pm 2.4)\,\mathrm{mb}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msub><mml:mi>σ</mml:mi><mml:mi>tot</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>110.5</mml:mn><mml:mo>±</mml:mo><mml:mn>2.4</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace /><mml:mi>mb</mml:mi></mml:mrow></mml:math> .
DOI: 10.1016/j.future.2013.01.004
2013
Cited 79 times
Cost minimization for computational applications on hybrid cloud infrastructures
We address the problem of task planning on multiple clouds formulated as a mixed integer nonlinear programming problem (MINLP). Its specification with AMPL modeling language allows us to apply solvers such as Bonmin and Cbc. Our model assumes multiple heterogeneous compute and storage cloud providers, such as Amazon, Rackspace, GoGrid, ElasticHosts and a private cloud, parameterized by costs and performance, including constraints on maximum number of resources at each cloud. The optimization objective is the total cost, under deadline constraint. We compute the relation between deadline and cost for a sample set of data- and compute-intensive tasks, representing bioinformatics experiments. Our results illustrate typical problems when making decisions on deployment planning on clouds and how they can be addressed using optimization techniques.
DOI: 10.1002/cpe.4792
2018
Cited 62 times
Performance evaluation of heterogeneous cloud functions
Summary Cloud Functions, often called Function‐as‐a‐Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous due to variations in underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework and one based on HyperFlow. We deployed the CPU‐intensive benchmarks: Mersenne Twister and Linpack. We measured the data transfer times between cloud functions and storage, and we measured the lifetime of the runtime environment. We evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions, and IBM Cloud Functions. We made our results available online and continuously updated. We report on the results of the performance evaluation, and we discuss the discovered insights into resource allocation policies.
DOI: 10.1140/epjc/s10052-019-7346-7
2019
Cited 51 times
Elastic differential cross-section measurement at $$\sqrt{s}=13$$ TeV by TOTEM
Abstract The TOTEM collaboration has measured the elastic proton-proton differential cross section $$\mathrm{d}\sigma /\mathrm{d}t$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>d</mml:mi><mml:mi>σ</mml:mi><mml:mo>/</mml:mo><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math> at $$\sqrt{s}=13$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msqrt><mml:mi>s</mml:mi></mml:msqrt><mml:mo>=</mml:mo><mml:mn>13</mml:mn></mml:mrow></mml:math> TeV LHC energy using dedicated $$\beta ^{*}=90$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mrow><mml:mrow /><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>90</mml:mn></mml:mrow></mml:math> m beam optics. The Roman Pot detectors were inserted to 10 $$\sigma $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>σ</mml:mi></mml:math> distance from the LHC beam, which allowed the measurement of the range [0.04 GeV $$^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mn>2</mml:mn></mml:msup></mml:math> ; 4 GeV $$^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mn>2</mml:mn></mml:msup></mml:math> $$]$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>]</mml:mo></mml:math> in four-momentum transfer squared | t |. The efficient data acquisition allowed to collect about 10 $$^{9}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mn>9</mml:mn></mml:msup></mml:math> elastic events to precisely measure the differential cross-section including the diffractive minimum (dip), the subsequent maximum (bump) and the large-| t | tail. The average nuclear slope has been found to be $$B=(20.40 \pm 0.002^{\mathrm{stat}} \pm 0.01^{\mathrm{syst}})~$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>B</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mn>20.40</mml:mn><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mn>002</mml:mn><mml:mi>stat</mml:mi></mml:msup><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mn>01</mml:mn><mml:mi>syst</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:mspace /></mml:mrow></mml:math> GeV $$^{-2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mrow><mml:mo>-</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math> in the | t |-range 0.04–0.2 GeV $$^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mn>2</mml:mn></mml:msup></mml:math> . The dip position is $$|t_{\mathrm{dip}}|=(0.47 \pm 0.004^{\mathrm{stat}} \pm 0.01^{\mathrm{syst}})~$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>dip</mml:mi></mml:msub><mml:mrow><mml:mo>|</mml:mo><mml:mo>=</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>0.47</mml:mn><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mn>004</mml:mn><mml:mi>stat</mml:mi></mml:msup><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mn>01</mml:mn><mml:mi>syst</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mspace /></mml:mrow></mml:math> GeV $$^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msup><mml:mrow /><mml:mn>2</mml:mn></mml:msup></mml:math> . The differential cross section ratio at the bump vs. at the dip $$R=1.77\pm 0.01^{\mathrm{stat}}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mn>1.77</mml:mn><mml:mo>±</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:msup><mml:mn>01</mml:mn><mml:mi>stat</mml:mi></mml:msup></mml:mrow></mml:math> has been measured with high precision. The series of TOTEM elastic pp measurements show that the dip is a permanent feature of the pp differential cross-section at the TeV scale.
DOI: 10.1016/j.procs.2011.04.064
2011
Cited 58 times
The Collage Authoring Environment
The Collage Authoring Environment is a software infrastructure which enables domain scientists to collaboratively develop and publish their work in the form of executable papers. It corresponds to the recent developments in both e-Science and computational technologies which call for a novel publishing paradigm. As part of this paradigm, static content (such as traditional scientific publications) should be supplemented with elements of interactivity, enabling reviewers and readers to reexamine the reported results by executing parts of the software on which such results are based as well as access primary scientific data. Taking into account the presented rationale we propose an environment which enables authors to seamlessly embed chunks of executable code (called assets) into scientific publications and allow repeated execution of such assets on underlying computing and data storage resources, as required by scientists who wish to build upon the presented results. The Collage Authoring Environment can be deployed on arbitrary resources, including those belonging to high performance computing centers, scientific e-Infrastructures and resources contributed by the scientists themselves. The environment provides access to static content, primary datasets (where exposed by authors) and executable assets. Execution features are provided by a dedicated engine (called the Collage Server) and embedded into an interactive view delivered to readers, resembling a traditional research publication but interactive and collaborative in its scope. Along with a textual description of the Collage environment the authors also present a prototype implementation, which supports the features described in this paper. The functionality of this prototype is discussed along with theoretical assumptions underpinning the proposed system.
DOI: 10.1007/978-3-319-75178-8_34
2018
Cited 38 times
Benchmarking Heterogeneous Cloud Functions
Cloud Functions, often called Function-as-a-Service (FaaS), pioneered by AWS Lambda, are an increasingly popular method of running distributed applications. As in other cloud offerings, cloud functions are heterogeneous, due to different underlying hardware, runtime systems, as well as resource management and billing models. In this paper, we focus on performance evaluation of cloud functions, taking into account heterogeneity aspects. We developed a cloud function benchmarking framework, consisting of one suite based on Serverless Framework, and one based on HyperFlow. We deployed the CPU-intensive benchmarks: Mersenne Twister and Linpack, and evaluated all the major cloud function providers: AWS Lambda, Azure Functions, Google Cloud Functions and IBM OpenWhisk. We make our results available online and continuously updated. We report on the initial results of the performance evaluation and we discuss the discovered insights on the resource allocation policies.
DOI: 10.1140/epjc/s10052-020-7654-y
2020
Cited 37 times
Elastic differential cross-section $${\mathrm{d}}\sigma /{\mathrm{d}}t$$ at $$\sqrt{s}=2.76\hbox { TeV}$$ and implications on the existence of a colourless C-odd three-gluon compound state
Abstract The proton–proton elastic differential cross section $${\mathrm{d}}\sigma /{\mathrm{d}}t$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>d</mml:mi><mml:mi>σ</mml:mi><mml:mo>/</mml:mo><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math> has been measured by the TOTEM experiment at $$\sqrt{s}=2.76\hbox { TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msqrt><mml:mi>s</mml:mi></mml:msqrt><mml:mo>=</mml:mo><mml:mn>2.76</mml:mn><mml:mspace /><mml:mtext>TeV</mml:mtext></mml:mrow></mml:math> energy with $$\beta ^{*}=11\hbox { m}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msup><mml:mi>β</mml:mi><mml:mrow><mml:mrow /><mml:mo>∗</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>11</mml:mn><mml:mspace /><mml:mtext>m</mml:mtext></mml:mrow></mml:math> beam optics. The Roman Pots were inserted to 13 times the transverse beam size from the beam, which allowed to measure the differential cross-section of elastic scattering in a range of the squared four-momentum transfer (| t |) from 0.36 to $$0.74\hbox { GeV}^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mn>0.74</mml:mn><mml:mspace /><mml:msup><mml:mtext>GeV</mml:mtext><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math> . The differential cross-section can be described with an exponential in the | t |-range between 0.36 and $$0.54\hbox { GeV}^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mn>0.54</mml:mn><mml:mspace /><mml:msup><mml:mtext>GeV</mml:mtext><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math> , followed by a diffractive minimum (dip) at $$|t_{\mathrm{dip}}|=(0.61\pm 0.03)\hbox { GeV}^{2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>dip</mml:mi></mml:msub><mml:mrow><mml:mo>|</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0.61</mml:mn><mml:mo>±</mml:mo><mml:mn>0.03</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace /></mml:mrow><mml:msup><mml:mtext>GeV</mml:mtext><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math> and a subsequent maximum (bump). The ratio of the $${\mathrm{d}}\sigma /{\mathrm{d}}t$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>d</mml:mi><mml:mi>σ</mml:mi><mml:mo>/</mml:mo><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math> at the bump and at the dip is $$1.7\pm 0.2$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mn>1.7</mml:mn><mml:mo>±</mml:mo><mml:mn>0.2</mml:mn></mml:mrow></mml:math> . When compared to the proton–antiproton measurement of the D0 experiment at $$\sqrt{s} = 1.96\hbox { TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:msqrt><mml:mi>s</mml:mi></mml:msqrt><mml:mo>=</mml:mo><mml:mn>1.96</mml:mn><mml:mspace /><mml:mtext>TeV</mml:mtext></mml:mrow></mml:math> , a significant difference can be observed. Under the condition that the effects due to the energy difference between TOTEM and D0 can be neglected, the result provides evidence for the exchange of a colourless C-odd three-gluon compound state in the t -channel of the proton–proton and proton–antiproton elastic scattering.
DOI: 10.1155/2015/680271
2015
Cited 41 times
Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization
This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.
DOI: 10.1007/s10723-015-9355-6
2015
Cited 37 times
Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds
This paper focuses on data-intensive workflows and addresses the problem of scheduling workflow ensembles under cost and deadline constraints in Infrastructure as a Service (IaaS) clouds. Previous research in this area ignores file transfers between workflow tasks, which, as we show, often have a large impact on workflow ensemble execution. In this paper we propose and implement a simulation model for handling file transfers between tasks, featuring the ability to dynamically calculate bandwidth and supporting a configurable number of replicas, thus allowing us to simulate various levels of congestion. The resulting model is capable of representing a wide range of storage systems available on clouds: from in-memory caches (such as memcached), to distributed file systems (such as NFS servers) and cloud storage (such as Amazon S3 or Google Cloud Storage). We observe that file transfers may have a significant impact on ensemble execution; for some applications up to 90 % of the execution time is spent on file transfers. Next, we propose and evaluate a novel scheduling algorithm that minimizes the number of transfers by taking advantage of data caching and file locality. We find that for data-intensive applications it performs better than other scheduling algorithms. Additionally, we modify the original scheduling algorithms to effectively operate in environments where file transfers take non-zero time.
DOI: 10.1109/mic.2011.143
2013
Cited 32 times
How to Use Google App Engine for Free Computing
Can the Google App Engine cloud service be used, free of charge, to execute parameter study problems? That question drove this research, which is founded on the App Engine's newly developed Task Queue API. The authors created a simple and extensible framework implementing the master-worker model to enable usage of the App Engine application servers as computational nodes. This article presents and discusses the results of the feasibility study, as well as compares the solution with EC2, Amazon's free cloud offering.
DOI: 10.1109/cgc.2013.14
2013
Cited 31 times
Energy-Constrained Provisioning for Scientific Workflow Ensembles
Large computational problems may often be modelled using multiple scientific workflows with similar structure. These workflows can be grouped into ensembles, which may be executed on distributed platforms such as the Cloud. In this paper, we focus on the provisioning of resources for scientific workflow ensembles and address the problem of meeting energy constraints along with either budget or deadline constraints. We propose and evaluate two energy-aware algorithms that can be used for resource provisioning and task scheduling. Experimental evaluation is based on simulations using synthetic data based on parameters of real scientific workflow applications. The results show that our proposed algorithms can meet constraints and minimize energy consumption without compromising the number of completed workflows in an ensemble.
DOI: 10.1109/works54523.2021.00016
2021
Cited 18 times
A Community Roadmap for Scientific Workflows Research and Development
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projects partnered to bring the international workflows community together. This paper reports on discussions and findings from two virtual &#x201C;Workflows Community Summits&#x201D; (January and April, 2021). The overarching goals of these workshops were to develop a view of the state of the art, identify crucial research challenges in the workflows community, articulate a vision for potential community efforts, and discuss technical approaches for realizing this vision. To this end, participants identified six broad themes: FAIR computational workflows; AI workflows; exascale challenges; APIs, interoperability, reuse, and standards; training and education; and building a workflows community. We summarize discussions and recommendations for each of these themes.
DOI: 10.1109/disrta.2003.1242991
2004
Cited 49 times
Towards a grid management system for HLA-based interactive simulations
This paper presents the design of a system that supports execution of HLA (high level architecture) distributed interactive simulations in an unreliable grid environment. The design of the architecture is based on the OGSA (Open Grid Services Architecture) concept that allows for modularity and compatibility with grid services already being developed. First of all, we focus on the part of the system which is responsible for migration of a HLA-connected component or components of the distributed application in the grid environment. We present a runtime support library for easily plugging HLA simulations into the grid services framework. We also present the impact of execution management (namely migration) on overall system performance.
DOI: 10.1016/j.future.2004.09.021
2005
Cited 45 times
Workflow composer and service registry for grid applications
Automatic composition of workflows from Web and Grid services is an important challenge in today’s distributed applications. The system presented in this paper supports the user in composing an application workflow from existing Grid services. The flow composition system builds workflows on an abstract level with semantic and syntactic descriptions of services available on the Grid. Two main modules of the system are the flow composer and the distributed Grid service registry. We present motivation, the concept of the overall system architecture and the results of a feasibility study.
DOI: 10.5555/2388996.2389026
2012
Cited 31 times
Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds
Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under budget and deadline constraints on Infrastructure- as-a-Service (IaaS) clouds. We discuss, develop, and assess algorithms based on static and dynamic strategies for both task scheduling and resource provisioning. We perform the evaluation via simulation using a set of scientific workflow ensembles with a broad range of budget and deadline parameters, taking into account uncertainties in task runtime estimations, provisioning delays, and failures. We find that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution. Our results show that an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.
DOI: 10.1016/j.procs.2011.04.045
2011
Cited 30 times
Component Approach to Computational Applications on Clouds
Running computational science applications on the emerging cloud infrastructures requires appropriate programming models and tools. In this paper we investigate the applicability of the component model to developing such applications. The component model we propose takes advantages of the features of the IaaS infrastructure and offers a high-level application composition API. We describe experiments on a scientific application from the bioinformatics domain, using a hybrid cloud infrastructure which consists of a private cloud running Eucalyptus and the Amazon EC2 public cloud. The measured performance of virtual machine startup time and virtualization overhead indicate promising prospects for exploiting such infrastructures along with the proposed component-based approach.
DOI: 10.1109/cloud.2018.00065
2018
Cited 25 times
Challenges for Scheduling Scientific Workflows on Cloud Functions
Serverless computing, also known as Function-as-a-Service (FaaS) or Cloud Functions, is a new method of running distributed applications by executing functions on the infrastructure of cloud providers. Although it frees the developers from managing servers, there are still decisions to be made regarding selection of function configurations based on the desired performance and cost. The billing model of this approach considers time of execution, measured in 100ms units, as well as the size of the memory allocated per function. In this paper, we look into the problem of scheduling scientific workflows, which are applications consisting of multiple tasks connected into a dependency graph. We discuss challenges related to workflow scheduling and propose the Serverless Deadline-Budget Workflow Scheduling (SDBWS) algorithm adapted to serverless platforms. We present preliminary experiments with a small-scale Montage workflow run on the AWS Lambda infrastructure.
DOI: 10.1109/escience51609.2021.00014
2021
Cited 14 times
Serverless Containers – Rising Viable Approach to Scientific Workflows
The increasing popularity of the serverless computing approach has led to the emergence of new cloud infrastructures working in Container-as-a-Service (CaaS) model like AWS Fargate, Google Cloud Run, or Azure Container Instances. New infrastructures facilitate an innovative approach to running cloud containers where developers are freed from managing underlying resources. In this paper, we focus on evaluating the capabilities of elastic containers and their usefulness for scientific computing in the scientific workflow paradigm using AWS Fargate and Google Cloud Run infrastructures. For the experimental evaluation of our approach, we extended the HyperFlow engine to support these CaaS platforms, together with adapting four scientific workflows composed of several dozen to hundreds of tasks organized into a dependency graph. Studied applications are used to create cost-performance benchmarks and flow execution plots, delay, elasticity, and scalability measurements. Results show that serverless containers can be successfully utilized for running scientific workflows. Moreover, the results allow for gaining insight into the specific advantages and limits of the studied platforms.
DOI: 10.1109/imcsit.2010.5679740
2010
Cited 28 times
Exploratory programming in the virtual laboratory
GridSpace 2 is a novel virtual laboratory framework enabling researchers to conduct virtual experiments on Grid-based resources and other HPC infrastructures. GridSpace 2 facilitates exploratory development of experiments by means of scripts which can be written in a number of popular languages, including Ruby, Python and Perl. The framework supplies a repository of gems enabling scripts to interface low-level resources such as PBS queues, EGEE computing elements, scientific applications and other types of Grid resources. Moreover, GridSpace 2 provides a Web 2.0-based Experiment Workbench supporting development and execution of virtual experiments by groups of collaborating scientists. We present an overview of the most important features of the Experiment Workbench, which is the main user interface of the Virtual laboratory, and discuss a sample experiment from the computational chemistry domain.
DOI: 10.1109/ms.2017.265095722
2018
Cited 21 times
A Scalable, Reactive Architecture for Cloud Applications
As cloud infrastructures gain popularity, new concepts and design patterns such as Command Query Responsibility Segregation (CQRS) and Event Sourcing (ES) promise to facilitate the development of scalable applications. Despite recent research and the availability of many blogs and tutorials devoted to these topics, few reports on real-world implementations exist that provide experimental insight into their scalability. To bridge this gap, researchers developed an architecture that exploits both CQRS and ES in accordance with Reactive Manifesto guidelines. Using that architecture, they implemented a prototype interactive flight-scheduling application to investigate this approach’s scalability. A performance evaluation in a cloud environment of 15 virtual machines demonstrated the CQRS and ES patterns’ horizontal scalability, observed independently for the application’s read and write models. This article explains how to assemble this type of architecture, first on a conceptual level and then with specific technologies including Akka, Cassandra, Kafka, and Neo4J. A reference implementation is available as an open-source project. This approach provides many interesting advantages without compromising performance, so its rapid adoption by the industry seems likely.
DOI: 10.3390/fluids8050159
2023
Modelling The Hemodynamics of Coronary Ischemia
Acting upon clinical patient data, acquired in the pathway of percutaneous intervention, we deploy hierarchical, multi-stage, data-handling protocols and interacting low- and high-order mathematical models (chamber elastance, state-space system and CFD models), to establish and then validate a framework to quantify the burden of ischaemia. Our core tool is a compartmental, zero-dimensional model of the coupled circulation with four heart chambers, systemic and pulmonary circulations and an optimally adapted windkessel model of the coronary arteries that reflects the diastolic dominance of coronary flow. We guide the parallel development of protocols and models by appealing to foundational physiological principles of cardiac energetics and a parameterisation (stenotic Bernoulli resistance and micro-vascular resistance) of patients’ coronary flow. We validate our process first with results which substantiate our protocols and, second, we demonstrate good correspondence between model operation and patient data. We conclude that our core model is capable of representing (patho)physiological states and discuss how it can potentially be deployed, on clinical data, to provide a quantitative assessment of the impact, on the individual, of coronary artery disease.
DOI: 10.1186/s12859-023-05564-x
2023
GraphTar: applying word2vec and graph neural networks to miRNA target prediction
Abstract Background MicroRNAs (miRNAs) are short, non-coding RNA molecules that regulate gene expression by binding to specific mRNAs, inhibiting their translation. They play a critical role in regulating various biological processes and are implicated in many diseases, including cardiovascular, oncological, gastrointestinal diseases, and viral infections. Computational methods that can identify potential miRNA–mRNA interactions from raw data use one-dimensional miRNA–mRNA duplex representations and simple sequence encoding techniques, which may limit their performance. Results We have developed GraphTar, a new target prediction method that uses a novel graph-based representation to reflect the spatial structure of the miRNA–mRNA duplex. Unlike existing approaches, we use the word2vec method to accurately encode RNA sequence information. In conjunction with the novel encoding method, we use a graph neural network classifier that can accurately predict miRNA–mRNA interactions based on graph representation learning. As part of a comparative study, we evaluate three different node embedding approaches within the GraphTar framework and compare them with other state-of-the-art target prediction methods. The results show that the proposed method achieves similar performance to the best methods in the field and outperforms them on one of the datasets. Conclusions In this study, a novel miRNA target prediction approach called GraphTar is introduced. Results show that GraphTar is as effective as existing methods and even outperforms them in some cases, opening new avenues for further research. However, the expansion of available datasets is critical for advancing the field towards real-world applications.
DOI: 10.1177/0037549705051970
2005
Cited 28 times
A Framework for HLA-Based Interactive Simulations on the Grid
This article presents the design and feasibility of a system that supports execution of High Level Architecture (HLA)-distributed interactive simulations in an unreliable Grid environment. The article presents an overall architecture of a system based on experience gained from previous designs. The most important operational components of the system are presented, and actual performance issues are discussed. The design of the architecture is based on the Open Grid Services Architecture (OGSA) concept that allows for modularity and compatibility with Grid services already being developed. The issue of migration to the recently proposed Web Services Resource Framework (WSRF) is discussed as well.
DOI: 10.1016/j.future.2009.05.012
2010
Cited 19 times
Invocation of operations from script-based Grid applications
In this paper we address the complexity of building and running modern scientific applications on various Grid systems with heterogeneous middleware. As a solution we have proposed the Grid Operation Invoker (GOI) which offers an object-oriented method invocation semantics for interacting with diverse computational services. GOI forms the core of the ViroLab virtual laboratory and it is used to invoke operations from within in-silico experiments described using a scripting notation. We describe the details of GOI (including architecture, technology adapters and asynchronous invocations) focusing on a mechanism which allows adding high-level support for batch job processing middleware, e.g. EGEE LCG/gLite. As an example, we present the NAMD molecular dynamics program, deployed on EGEE infrastructure. The main achievement is the creation of the Grid Object abstraction, which can be used to represent and access such diverse technologies as Web Services, distributed components and job processing systems. Such an application model, based on high-level scripting, is an interesting alternative to graphical workflow-based tools.
DOI: 10.1109/cloudcom.2013.98
2013
Cited 18 times
Introducing PRECIP: An API for Managing Repeatable Experiments in the Cloud
Cloud computing with its on-demand access to resources has emerged as a tool used by researchers from a wide range of domains to run computer-based experiments. In this paper we introduce a flexible experiment management API, written in Python that simplifies and formalizes the execution of scientific experiments on cloud infrastructures. We describe the features and functionality of PRECIP (Pegasus Repeatable Experiments for the Cloud in Python), and how PRECIP can be used to set up experiments on academic clouds such as OpenStack Eucalyptus, Nimbus, and commercial clouds such as Amazon EC2.
DOI: 10.1016/j.jocs.2016.09.006
2017
Cited 15 times
Porting HPC applications to the cloud: A multi-frontal solver case study
In this paper we argue that scientific applications traditionally considered as representing typical HPC workloads can be successfully and efficiently ported to a cloud infrastructure. We propose a porting methodology that enables parallelization of communication – and memory-intensive applications while achieving a good communication to computation ratio and a satisfactory performance in a cloud infrastructure. This methodology comprises several aspects: (1) task agglomeration heuristic enabling increasing granularity of tasks while ensuring they will fit in memory; (2) task scheduling heuristic increasing data locality; and (3) two-level storage architecture enabling in-memory storage of intermediate data. We implement this methodology in a scientific workflow system and use it to parallelize a multi-frontal solver for finite-element meshes, deploy it in a cloud, and execute it as a workflow. The results obtained from the experiments confirm that the proposed porting methodology leads to a significant reduction of communication costs and achievement of a satisfactory performance. We believe that these results constitute a valuable step toward a wider adoption of cloud infrastructures for computational science applications.
DOI: 10.1109/ucc-companion.2018.00020
2018
Cited 15 times
Transparent Deployment of Scientific Workflows across Clouds - Kubernetes Approach
We present an end-to-end solution for automation of scientific workflow deployment and execution on distributed computing infrastructures. The solution integrates de-facto standard and widely adopted tools, including Terraform and Kubernetes, with our HyperFlow workflow management system. In such a solution, infrastructure providers have abstracted away thanks to generic Kubernetes layer. However, we also support other computing infrastructures, both containerized such as Kubernetes or Amazon ECS, and non-containerized, e.g. Amazon Lambda, in a single unified approach. The resulting solution enables execution of hybrid workflows that utilize multiple computing infrastructures and significantly lowers the complexity related to management of repeatable infrastructures for the execution of scientific workflows and conducting scientific workflow research.
DOI: 10.1140/epjc/s10052-022-10065-x
2022
Cited 6 times
Characterisation of the dip-bump structure observed in proton–proton elastic scattering at $$\sqrt{s}$$ = 8 TeV
Abstract The TOTEM collaboration at the CERN LHC has measured the differential cross-section of elastic proton–proton scattering at $$\sqrt{s} = 8\,\mathrm{TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msqrt> <mml:mi>s</mml:mi> </mml:msqrt> <mml:mo>=</mml:mo> <mml:mn>8</mml:mn> <mml:mspace /> <mml:mi>TeV</mml:mi> </mml:mrow> </mml:math> in the squared four-momentum transfer range $$0.2\,\mathrm{GeV^{2}}&lt; |t| &lt; 1.9\,\mathrm{GeV^{2}}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>0.2</mml:mn> <mml:mspace /> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>&lt;</mml:mo> <mml:mrow> <mml:mo>|</mml:mo> <mml:mi>t</mml:mi> <mml:mo>|</mml:mo> </mml:mrow> <mml:mo>&lt;</mml:mo> <mml:mn>1.9</mml:mn> <mml:mspace /> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> </mml:mrow> </mml:math> . This interval includes the structure with a diffractive minimum (“dip”) and a secondary maximum (“bump”) that has also been observed at all other LHC energies, where measurements were made. A detailed characterisation of this structure for $$\sqrt{s} = 8\,\mathrm{TeV}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msqrt> <mml:mi>s</mml:mi> </mml:msqrt> <mml:mo>=</mml:mo> <mml:mn>8</mml:mn> <mml:mspace /> <mml:mi>TeV</mml:mi> </mml:mrow> </mml:math> yields the positions, $$|t|_{\mathrm{dip}} = (0.521 \pm 0.007)\,\mathrm{GeV^2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msub> <mml:mrow> <mml:mo>|</mml:mo> <mml:mi>t</mml:mi> <mml:mo>|</mml:mo> </mml:mrow> <mml:mi>dip</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mn>0.521</mml:mn> <mml:mo>±</mml:mo> <mml:mn>0.007</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> <mml:mspace /> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> </mml:mrow> </mml:math> and $$|t|_{\mathrm{bump}} = (0.695 \pm 0.026)\,\mathrm{GeV^2}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msub> <mml:mrow> <mml:mo>|</mml:mo> <mml:mi>t</mml:mi> <mml:mo>|</mml:mo> </mml:mrow> <mml:mi>bump</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mn>0.695</mml:mn> <mml:mo>±</mml:mo> <mml:mn>0.026</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> <mml:mspace /> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> </mml:mrow> </mml:math> , as well as the cross-section values, $$\left. {\mathrm{d}\sigma /\mathrm{d}t}\right| _{\mathrm{dip}} = (15.1 \pm 2.5)\,\mathrm{{\mu b/GeV^2}}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msub> <mml:mfenced> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>σ</mml:mi> <mml:mo>/</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mrow> </mml:mfenced> <mml:mi>dip</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mn>15.1</mml:mn> <mml:mo>±</mml:mo> <mml:mn>2.5</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> <mml:mspace /> <mml:mrow> <mml:mi>μ</mml:mi> <mml:mi>b</mml:mi> <mml:mo>/</mml:mo> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> </mml:mrow> </mml:mrow> </mml:math> and $$\left. {\mathrm{d}\sigma /\mathrm{d}t}\right| _{\mathrm{bump}} = (29.7 \pm 1.8)\,\mathrm{{\mu b/GeV^2}}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msub> <mml:mfenced> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>σ</mml:mi> <mml:mo>/</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mrow> </mml:mfenced> <mml:mi>bump</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mrow> <mml:mo>(</mml:mo> <mml:mn>29.7</mml:mn> <mml:mo>±</mml:mo> <mml:mn>1.8</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> <mml:mspace /> <mml:mrow> <mml:mi>μ</mml:mi> <mml:mi>b</mml:mi> <mml:mo>/</mml:mo> <mml:msup> <mml:mi>GeV</mml:mi> <mml:mn>2</mml:mn> </mml:msup> </mml:mrow> </mml:mrow> </mml:math> , for the dip and the bump, respectively.
DOI: 10.1109/cbms.2008.47
2008
Cited 18 times
Virtual Laboratory for Development and Execution of Biomedical Collaborative Applications
The ViroLab Virtual Laboratory is a collaborative platform for scientists representing multiple fields of expertise while working together on common scientific goals. This environment makes it possible to combine efforts of computer scientists, virology and epidemiology experts and experienced physicians to support future advances in HIV-related research and treatment. The paper explains the challenges involved in building a modern, inter-organizational platform to support science and gives an overview of solutions to these challenges. Examples of real-world problems applied in the presented environment are also described to prove the feasibility of the solution.
DOI: 10.1007/978-3-642-28267-6_18
2012
Cited 14 times
Managing Entire Lifecycles of e-Science Applications in the GridSpace2 Virtual Laboratory – From Motivation through Idea to Operable Web-Accessible Environment Built on Top of PL-Grid e-Infrastructure
The GridSpace2 environment, developed in the scope of the PL-Grid Polish National Grid Initiative, constitutes a comprehensive platform which supports e-science applications throughout their entire lifecycle. Application development may involve multiple phases, including writing, prototyping, testing and composing the application. Once the application attains maturity it becomes operable and capable of being executed, although it may still be subject to further development – including actions such as sharing with collaborating researchers or making results publicly available with the use of dedicated publishing interfaces. This paper describes each of these phases in detail, showing how the GridSpace2 platform can assist the developers and publishers of computational experiments.
DOI: 10.2196/mededu.4394
2015
Cited 13 times
Virtual Patients in a Behavioral Medicine Massive Open Online Course (MOOC): A Case-Based Analysis of Technical Capacity and User Navigation Pathways
Massive open online courses (MOOCs) have been criticized for focusing on presentation of short video clip lectures and asking theoretical multiple-choice questions. A potential way of vitalizing these educational activities in the health sciences is to introduce virtual patients. Experiences from such extensions in MOOCs have not previously been reported in the literature.This study analyzes technical challenges and solutions for offering virtual patients in health-related MOOCs and describes patterns of virtual patient use in one such course. Our aims are to reduce the technical uncertainty related to these extensions, point to aspects that could be optimized for a better learner experience, and raise prospective research questions by describing indicators of virtual patient use on a massive scale.The Behavioral Medicine MOOC was offered by Karolinska Institutet, a medical university, on the EdX platform in the autumn of 2014. Course content was enhanced by two virtual patient scenarios presented in the OpenLabyrinth system and hosted on the VPH-Share cloud infrastructure. We analyzed web server and session logs and a participant satisfaction survey. Navigation pathways were summarized using a visual analytics tool developed for the purpose of this study.The number of course enrollments reached 19,236. At the official closing date, 2317 participants (12.1% of total enrollment) had declared completing the first virtual patient assignment and 1640 (8.5%) participants confirmed completion of the second virtual patient assignment. Peak activity involved 359 user sessions per day. The OpenLabyrinth system, deployed on four virtual servers, coped well with the workload. Participant survey respondents (n=479) regarded the activity as a helpful exercise in the course (83.1%). Technical challenges reported involved poor or restricted access to videos in certain areas of the world and occasional problems with lost sessions. The visual analyses of user pathways display the parts of virtual patient scenarios that elicited less interest and may have been perceived as nonchallenging options. Analyzing the user navigation pathways allowed us to detect indications of both surface and deep approaches to the content material among the MOOC participants.This study reported on first inclusion of virtual patients in a MOOC. It adds to the body of knowledge by demonstrating how a biomedical cloud provider service can ensure technical capacity and flexible design of a virtual patient platform on a massive scale. The study also presents a new way of analyzing the use of branched virtual patients by visualization of user navigation pathways. Suggestions are offered on improvements to the design of virtual patients in MOOCs.
DOI: 10.1016/j.jocs.2017.06.012
2018
Cited 13 times
Cloud computing infrastructure for the VPH community
As virtualization technologies mature and become ever more widespread, cloud computing has emerged as a promising paradigm for e-science. In order to facilitate successful application of cloud computing in scientific research – particularly in a domain as security-minded as medical research – several technical challenges need to be addressed. This paper reports on the successful deployment and utilization of a cloud computing platform for the Virtual Physiological Human (VPH) research community, originating in the VPH-Share project and continuing beyond the end of this project. The platform tackles technical issues involved in porting existing desktop applications to the cloud environment and constitutes a uniform research space where application services can be developed, stored, accessed and shared using a variety of computational infrastructures. The paper also presents examples of application workflows which make use of the presented infrastructure – both internal and external to the VPH community.
DOI: 10.1098/rsfs.2019.0128
2020
Cited 11 times
Hit-to-lead and lead optimization binding free energy calculations for G protein-coupled receptors
We apply the hit-to-lead ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and lead-optimization TIES (thermodynamic integration with enhanced sampling) methods to compute the binding free energies of a series of ligands at the A1 and A2A adenosine receptors, members of a subclass of the GPCR (G protein-coupled receptor) superfamily. Our predicted binding free energies, calculated using ESMACS, show a good correlation with previously reported experimental values of the ligands studied. Relative binding free energies, calculated using TIES, accurately predict experimentally determined values within a mean absolute error of approximately 1 kcal mol-1. Our methodology may be applied widely within the GPCR superfamily and to other small molecule-receptor protein systems.
DOI: 10.1109/ccgrid54584.2022.00067
2022
Cited 5 times
A Serverless Engine for High Energy Physics Distributed Analysis
The Large Hadron Collider (LHC) at CERN has generated in the last decade an unprecedented volume of data for the High-Energy Physics (HEP) field. Scientific collaborations interested in analysing such data very often require computing power beyond a single machine. This issue has been tackled traditionally by running analyses in distributed environments using stateful, managed batch computing systems. While this approach has been effective so far, current estimates for future computing needs of the field present large scaling challenges. Such a managed approach may not be the only viable way to tackle them and an interesting alternative could be provided by serverless architectures, to enable an even larger scaling potential. This work describes a novel approach to running real HEP scientific applications through a distributed serverless computing engine. The engine is built upon ROOT, a well-established HEP data analysis software, and distributes its computations to a large pool of concurrent executions on Amazon Web Services Lambda Serverless Platform. Thanks to the developed tool, physicists are able to access datasets stored at CERN (also those that are under restricted access policies) and process it on remote infrastructures outside of their typical environment. The analysis of the serverless functions is monitored at runtime to gather performance metrics, both for data- and computation-intensive workloads.
DOI: 10.1177/1094342011422924
2011
Cited 13 times
Component-based approach for programming and running scientific applications on grids and clouds
This paper presents an approach to programming and running scientific applications on grid and cloud infrastructures based on two principles: the first one is to follow a component-based programming model, the second is to apply a flexible technology which allows for virtualization of the underlying infrastructure. The solutions described in this paper include high-level composition and deployment consisting of a scripting-based environment and a manager system based on an architecture description language (ADL), a dynamically managed pool of component containers, and interoperability with other component models such as Grid Component Model (GCM). We demonstrate how the proposed methodology can be implemented by combining the unique features of the Common Component Architecture (CCA) model together with the H2O resource sharing platform, resulting in the MOCCA component framework. Applications and tests include data mining using the Weka library, Monte Carlo simulation of the formation of clusters of gold atoms, as well as a set of synthetic benchmarks. The conclusion is that the component approach to scientific applications can be successfully applied to both grid and cloud infrastructures.
DOI: 10.1007/978-3-642-55224-3_24
2014
Cited 12 times
Cost Optimization of Execution of Multi-level Deadline-Constrained Scientific Workflows on Clouds
This paper introduces a cost optimization model for scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous VM instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a Cloud Object Store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified in AMPL modeling language and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications such as Montage, Epigenomics, LIGO. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.
DOI: 10.1140/epjc/s2003-01251-0
2003
Cited 20 times
Prospects for observing an invisibly decaying Higgs boson in the ${t \bar t H}$ production at the LHC
The prospects for observing an invisibly decaying Higgs boson in the ${t \bar t H}$ production at LHC are discussed. An isolated lepton, reconstructed hadronic top-quark decay, two identified b-jets and a large missing transverse energy are proposed as the final state signature for event selection. Only the standard model backgrounds are taken into account. It is shown that the $t\bar t Z$ , $t \bar t W$ , $b \bar b Z$ and $b \bar b W$ backgrounds can individually be suppressed below the signal expectation. The dominant source of background remains the $t \bar t$ production. The key for the observability will be an experimental selection which allows further suppression to be achieved of the contributions from the $t \bar t$ events with one of the top quarks decaying into a tau lepton. Depending on the details of the final analysis, an excess of the signal events above the standard model background of about 10% to 100% can be achieved in the mass range m H = 100-200 GeV.
DOI: 10.1109/e-science.2006.261096
2006
Cited 18 times
Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism
The idea of an application described through its workflow is becoming popular in the Grid community as a natural method of functional decomposition of an application. It shows all the important dependencies as a set of connections of data flow and/or control flow. As scientific workflows grow in size and complexity, a tool to assist end users is becoming necessary. In this paper we describe the formal basis, design and implementation of such a tool -- an assistant which analyzes user requirements regarding application results and works with information registries that provide information on resources available in the Grid. The Workflow Composition Tool (WCT) provides the functionality of automatic workflow construction based on the process of semantic service discovery and matchmaking. It uses a well-designed construction algorithm together with specific heuristics in order to provide useful solutions for application users.
DOI: 10.1142/s0129626413400045
2013
Cited 11 times
HOSTED SCIENCE: MANAGING COMPUTATIONAL WORKFLOWS IN THE CLOUD
Scientists today are exploring the use of new tools and computing platforms to do their science. They are using workflow management tools to describe and manage complex applications and are evaluating the features and performance of clouds to see if they meet their computational needs. Although today, hosting is limited to providing virtual resources and simple services, one can imagine that in the future entire scientific analyses will be hosted for the user. The latter would specify the desired analysis, the timeframe of the computation, and the available budget. Hosted services would then deliver the desired results within the provided constraints. This paper describes current work on managing scientific applications on the cloud, focusing on workflow management and related data management issues. Frequently, applications are not represented by single workflows but rather as sets of related workflowsworkflow ensembles. Thus, hosted services need to be able to manage entire workflow ensembles, evaluating tradeoffs between completing as many high-value ensemble members as possible and delivering results within a certain time and budget. This paper gives an overview of existing hosted science issues, presents the current state of the art on resource provisioning that can support it, as well as outlines future research directions in this field.
DOI: 10.1016/j.procs.2015.05.412
2015
Cited 10 times
Execution Management and Efficient Resource Provisioning for Flood Decision Support
We present a resource provisioning and execution management solution for a flood decision support system. The system, developed within the ISMOP project, features an urgent computing scenario in which flood threat assessment for large sections of levees is requested within a specified deadline. Unlike typical decision support systems which utilize heavyweight simulations in order to predict the possible course of an emergency, in ISMOP we employ an alternative approach based on the ‘scenario identification’ method. We show that this approach is a particularly good fit for the resource provisioning model of IaaS Clouds. We describe the architecture of the ISMOP decision support system, focusing on the urgent computing scenario and its formal resource provisioning model. Preliminary results of experiments performed in order to calibrate and validate the model indicate that the model fits experimental data.
DOI: 10.1155/2012/683634
2012
Cited 10 times
Constructing Workflows from Script Applications
For programming and executing complex applications on grid infrastructures, scientific workflows have been proposed as convenient high-level alternative to solutions based on general-purpose programming languages, APIs and scripts. GridSpace is a collaborative programming and execution environment, which is based on a scripting approach and it extends Ruby language with a high-level API for invoking operations on remote resources. In this paper we describe a tool which enables to convert the GridSpace application source code into a workflow representation which, in turn, may be used for scheduling, provenance, or visualization. We describe how we addressed the issues of analyzing Ruby source code, resolving variable and method dependencies, as well as building workflow representation. The solutions to these problems have been developed and they were evaluated by testing them on complex grid application workflows such as CyberShake, Epigenomics and Montage. Evaluation is enriched by representing typical workflow control flow patterns.
DOI: 10.1016/j.jocs.2019.06.002
2019
Cited 9 times
Large-scale urban traffic simulation with Scala and high-performance computing system
High-performance computing systems make it possible to implement large-scale simulations of natural phenomena. However, in order to develop efficacious and efficient solutions, easy-to-use software platforms and applications are required. Up until now, the most popular solutions in this area were based on the message passing interface. Next, the authors successfully tackled this problem using Erlang; now, we focus on Scala/Akka — a popular and widely used standard in parallel and distributed programming. The paper focuses on the scalable implementation of a traffic simulation system in an asynchronous and notably desynchronized way. In addition to describing the concept of the system, a series of experiments on a cluster of up to 1000 nodes (24,000 cores) is presented and discussed, showing the efficiency and scalability. The main aim of the paper is to show that the implementation of high-performance computing-grade software solutions lies at the hand of any programmer proficient in Java Virtual Machine-related technologies.
DOI: 10.1007/978-3-030-48340-1_27
2020
Cited 8 times
Adaptation of Workflow Application Scheduling Algorithm to Serverless Infrastructure
Function-as-a-Service is a novel type of cloud service used for creating distributed applications and utilizing computing resources. Application developer supplies source code of cloud functions, which are small applications or application components, while the service provider is responsible for provisioning the infrastructure, scaling and exposing a REST style API. This environment seems to be adequate for running scientific workflows, which in recent years, have become an established paradigm for implementing and preserving complex scientific processes. In this paper, we present work done on adaptation of a scheduling algorithm to FaaS infrastructure. The result of this work is a static heuristic capable of planning workflow execution based on defined function pricing, deadline and budget. The SDBCS algorithm is designed to determine the quality of assignment of particular task to specific function configuration. Each task is analyzed for execution time and cost characteristics, while keeping track of parameters of complete workflow execution. The algorithm is validated through means of experiment with a set of synthetic workflows and a real life infrastructure case study performed on AWS Lambda. The results confirm the utility of the algorithm and lead us to propose areas of further study, which include more detailed analysis of infrastructure features affecting scheduling.
DOI: 10.48550/arxiv.2106.05177
2021
Cited 7 times
Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moore's computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical
DOI: 10.5220/0011603800003414
2023
Designing Personalised Gamification of mHealth Survey Applications
DOI: 10.1109/access.2023.3281860
2023
Toward the Observability of Cloud-Native Applications: The Overview of the State-of-the-Art
The Cloud-native model, established to enhance the Twelve-Factor patterns, is an approach to developing and deploying applications according to DevOps concepts, Continuous Integration/Continuous Delivery, containers, and microservices.The notion of observability can help us cope with the complexity of such applications.We present a Systematic Mapping Study (SMS) in the observability of Cloud-native applications.We have chosen 56 studies published between 2018 and 2022.The selected studies were thoroughly analyzed, compared, and classified according to the chosen comparative criteria.The presented SMS assesses engineering approaches, maturity, and efficiency of observability by deliberating around four research questions: (1) What provides the motivations for equipping Cloud-native applications with observability capabilities?(2) Which research areas are addressed in the related literature? (3) How are observability approaches implemented?(4) What are the future trends in the Cloud-native applications observability research?
DOI: 10.1007/3-540-45825-5_13
2002
Cited 17 times
Towards the CrossGrid Architecture
DOI: 10.1007/3-540-44860-8_21
2003
Cited 14 times
Architecture of the Grid for Interactive Applications
In this paper we present the current status of the CrossGrid architecture. The architecture definition follows from the specification of requirements and design documents. It consists of descriptions of functionality of new tools and Grid services and indicates where interfaces should be defined. The components of the CrossGrid architecture are modular and they are organized in the following layers: applications, supporting tools, application development support, application-specific Grid services, generic Grid services, and fabric. We also present an analysis of the possible evolution of the CrossGrid services towards the OGSA service model.
DOI: 10.1007/11752578_78
2006
Cited 13 times
Semantic-Based Grid Workflow Composition
The work presents a solution to abstract workflow composition in a semantic Grid environment. Along with analysis of the problem of workflow composition and the description of related research in that matter, we present the Workflow Composition Tool. The tool is designed to provide descriptions of abstract (i.e. not executable) workflows of service-based Grid applications. The tool applies novel semantic techniques to deliver meaningful discovery and matching of ontologically described resources. WCT is a part of larger workflow composition and execution system being developed in the K-WfGrid project – the short description of the entire system is also included.
DOI: 10.1002/cbdv.200800338
2009
Cited 9 times
In silico Structural Study of Random Amino Acid Sequence Proteins Not Present in Nature
Abstract The three‐dimensional structures of a set of ‘never born proteins’ (NBP, random amino acid sequence proteins with no significant homology with known proteins) were predicted using two methods: Rosetta and the one based on the ‘fuzzy‐oil‐drop’ (FOD) model. More than 3000 different random amino acid sequences have been generated, filtered against the non redundant protein sequence data base, to remove sequences with significant homology with known proteins, and subjected to three‐dimensional structure prediction. Comparison between Rosetta and FOD predictions allowed to select the ten top (highest structural similarity) and the ten bottom (the lowest structural similarity) structures from the ranking list organized according to the RMS‐D value. The selected structures were taken for detailed analysis to define the scale of structural accordance and discrepancy between the two methods. The structural similarity measurements revealed discrepancies between structures generated on the basis of the two methods. Their potential biological function appeared to be quite different as well. The ten bottom structures appeared to be ‘unfoldable’ for the FOD model. Some aspects of the general characteristics of the NBPs are also discussed. The calculations were performed on the EUChinaGRID grid platform to test the performance of this infrastructure for massive protein structure predictions.
DOI: 10.1109/ccgrid.2013.54
2013
Cited 7 times
Evaluation of Cloud Providers for VPH Applications
Infrastructure as a Service (IaaS) clouds are considered interesting sources of computing and storage resources for scientific applications. However, given the large number of cloud vendors and their diverse offerings, it is not trivial for research projects to select an appropriate service provider. In this paper, we present the results of evaluation of public cloud providers, taking into account the requirements of the biomedical applications within the VPH-Share project. We performed a broad analysis of nearly 50 cloud providers and analyzed the performance and cost of 26 virtual machine instance types offered by the top three providers who meet our criteria: Amazon EC2, Rack Space and Soft Layer. We hope our results will be helpful for other research projects that are considering clouds as a potential source of computing and storage resources.
2016
Cited 6 times
Towards Serverless Execution of Scientific Workflows - HyperFlow Case Study.
DOI: 10.1109/ipdps.2005.290
2005
Cited 12 times
MOCCA - Towards a Distributed CCA Framework for Metacomputing
We describe the design and implementation of MOCCA, a distributed CCA framework implemented using the H2O metacomputing system. Motivated by the quest for appropriate metasystem programming models for large scale scientific applications, MOCCA combines the advantages of component orientation with the flexible and reconfigurable H2O middleware. By exploiting unique capabilities in H2O, including client-provider separation, security, and negotiable transport protocols, enhancements to both functionality and performance could be attained. The design and implementation of MOCCA highlights the natural match between CCA components and H2O pluglets, both in structure and invocation methodology. An outline of how native CCA modules can be supported in the MOCCA framework describes the potential for future deployment of legacy codes on metacomputing systems. We also report on preliminary experiences with test applications and sample performance measurements that favorably compare MOCCA to alternative component frameworks for tightly- and loosely-coupled metacomputing systems.
DOI: 10.4018/978-1-60566-374-6.ch027
2009
Cited 8 times
Virtual Laboratory for Collaborative Applications
Advanced research in life sciences calls for new information technology solutions to support complex, collaborative computer simulations and result analysis. This chapter presents the ViroLab virtual laboratory, which is an integrated system of dedicated tools and services, providing a common space for planning, building, improving and performing in-silico experiments by different groups of users. Within the virtual laboratory collaborative applications are built as experiment plans, using a notation based on the Ruby scripting language. During experiment execution, provenance data is created and stored. The virtual laboratory enables access to distributed, heterogeneous data resources, computational resources in Grid systems, clusters and standalone computers. The process of application development as well as the architecture and functionality of the virtual laboratory are demonstrated using a real-life example from the HIV treatment domain.
DOI: 10.1016/j.cmpb.2017.05.006
2017
Cited 6 times
Support for Taverna workflows in the VPH-Share cloud platform
Background and objective: To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Methods: Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. Results: 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. Conclusion: The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described.
DOI: 10.1016/j.procs.2017.05.192
2017
Cited 6 times
Smart levee monitoring and flood decision support system: reference architecture and urgent computing management
Real-time disaster management and decision support systems rely on complex deadline-driven simulations and require advanced middleware services to ensure that the requested deadlines are met. In this paper we propose a reference architecture of an integrated smart levee monitoring and flood decision support system, focusing on the decision support workflow and urgent computing management. The architecture is implemented in the ISMOP project where controlled flooding experiments are conducted using a full-scale experimental smart levee. While the system operating in the ISMOP project monitors a test levee, it is designed to be scalable to large-scale flood scenarios.
DOI: 10.48550/arxiv.1712.06153
2017
Cited 6 times
First measurement of elastic, inelastic and total cross-section at $\sqrt{s}=13$ TeV by TOTEM and overview of cross-section data at LHC energies
The TOTEM collaboration has measured the proton-proton total cross section at $\sqrt{s}=13$ TeV with a luminosity-independent method. Using dedicated $β^{*}=90$ m beam optics, the Roman Pots were inserted very close to the beam. The inelastic scattering rate has been measured by the T1 and T2 telescopes during the same LHC fill. After applying the optical theorem the total proton-proton cross section is $σ_{\rm tot}=(110.6 \pm 3.4$) mb, well in agreement with the extrapolation from lower energies. This method also allows one to derive the luminosity-independent elastic and inelastic cross sections: $σ_{\rm el} = (31.0 \pm 1.7)$ mb and $σ_{\rm inel} = (79.5 \pm 1.8)$ mb.
DOI: 10.1007/978-3-030-50433-5_40
2020
Cited 6 times
Foundations for Workflow Application Scheduling on D-Wave System
Many scientific processes and applications can be represented in the standardized form of workflows. One of the key challenges related to managing and executing workflows is scheduling. As an NP-hard problem with exponential complexity it imposes limitations on the size of practically solvable problems. In this paper, we present a solution to the challenge of scheduling workflow applications with the help of the D-Wave quantum annealer. To the best of our knowledge, there is no other work directly addressing workflow scheduling using quantum computing. Our solution includes transformation into a Quadratic Unconstrained Binary Optimization (QUBO) problem and discussion of experimental results, as well as possible applications of the solution. For our experiments we choose four problem instances small enough to fit into the annealer’s architecture. For two of our instances the quantum annealer finds the global optimum for scheduling. We thus show that it is possible to solve such problems with the help of the D-Wave machine and discuss the limitations of this approach.
DOI: 10.1109/ccgrid51090.2021.00095
2021
Cited 5 times
Algorithms for scheduling scientific workflows on serverless architecture
Serverless computing is a novel cloud computing paradigm where the cloud provider manages the underlying infrastructure, while users are only required to upload the code of the application. Function as a Service (FaaS) is a serverless computing model where short-lived methods are executed in the cloud. One of the promising use cases for FaaS is running scientific workflow applications, which represent a scientific process composed of related tasks. Due to the distinctive features of FaaS, which include rapid resource provisioning, indirect infrastructure management, and fine-grained billing model a need arises to create dedicated scheduling methods to effectively use the novel infrastructures as an environment for workflow applications. In this paper we propose two novel scheduling algorithms SMOHEFT and SML, which are designed to create a schedule for executing scientific workflows on serverless infrastructures concerning time and cost constraints. We evaluated proposed algorithms by performing experiments, where we planned the execution of three applications: Ellipsoids, Vina and Montage. SDBWS and SDBCS algorithms were used as a baseline. SML achieved the best results when executing Ellipsoids workflow, with a success rate above 80%, while other algorithms were below 60%. In the case of Vina, all the algorithms, except SDBWS, had a success rate above 87.5% and in the case of Montage, the success rate of all algorithms was similar, over 87.5%. The proposed algorithms' success rate is comparable or better than offered by other studied solutions.
DOI: 10.1007/978-3-319-32149-3_27
2016
Cited 5 times
A Lightweight Approach for Deployment of Scientific Workflows in Cloud Infrastructures
We propose a lightweight solution for deployment of scientific workflows in diverse cloud platforms. In the proposed deployment model, an instance of a workflow runtime environment is created on demand in the cloud as part of the workflow application. Such an approach improves isolation and helps overcome major issues of alternative solutions, leading to an easier integration. The concept has been implemented in the HyperFlow workflow environment. We describe the approach in general and illustrate it with two case studies showing the integration of HyperFlow with the PLGrid infrastructure, and the PaaSage cloud platform. Lessons learned from these two experiences lead to the conclusion that the proposed solution minimizes the development effort required to implement the integration, accelerates the deployment process in a production system, and reduces maintenance issues. Performance evaluation proves that, for certain workflows, the proposed approach can lead to significant improvement of the workflow execution time.
DOI: 10.1007/978-3-540-24689-3_38
2004
Cited 9 times
The CrossGrid Architecture: Applications, Tools, and Grid Services
This paper describes the current status of the CrossGrid Project architecture. The architecture is divided into six layers. The relations between main components are presented in UML notation. A flexible concept of plugins that enable creation of uniform user-friendly interface is shown. A brief discussion of OGSA technology and its possible application to CrossGrid services is given as an interesting area of future work.
DOI: 10.1007/978-0-387-72498-0_9
2007
Cited 7 times
Interoperability of Grid component models: GCM and CCA case study
This paper presents a case study in the generic design of Grid component models. It defines a framework allowing two component systems, one running in a CCA environment, and another running in a Fractal environment, to interact as if they were elements of the same system. This work demonstrates the openness of both Fractal and CCA component models. It also gives a very generic and exhaustive overview of the interaction strategies that can be adopted to allow full integration of these two models, like strategies for reusing in Fractal single components from the CCA world and connecting a Fractal system to an already running CCA assembly. Finally, it presents the implementation and results of investigation of interoperability between two given component frameworks: MOCCA and ProActive. In generall, this paper presents the key concepts useful to make any two component models interoperate.
DOI: 10.6026/97320630003177
2008
Cited 6 times
Never born proteins as a test case for ab initio protein structures prediction
The number of natural proteins although large is significantly smaller than the theoretical number of proteins that can be obtained combining the 20 natural amino acids, the so-called "never born proteins" (NBPs).The study of the structure and properties of these proteins allows to investigate the sources of the natural proteins being of unique characteristics or special properties.However the structural study of NPBs can also been intended as an ideal test for evaluating the efficiency of software packages for the ab initio protein structure prediction.In this research, 10.000 three-dimensional structures of proteins of completely random sequence generated according to ROSETTA and FOD model were compared.The results show the limits of these software packages, but at the same time indicate that in many cases there is a significant agreement between the prediction obtained.
2018
Cited 5 times
First determination of the $\rho $ parameter at $\sqrt{s} = 13$ TeV -- probing the existence of a colourless three-gluon bound state
The TOTEM experiment at the LHC has performed the first measurement at $\sqrt{s} = 13$ TeV of the $\rho$ parameter, the real to imaginary ratio of the nuclear elastic scattering amplitude at $t=0$, obtaining the following results: $\rho = 0.09 \pm 0.01$ and $\rho = 0.10 \pm 0.01$, depending on different physics assumptions and mathematical modelling. The unprecedented precision of the $\rho$ measurement, combined with the TOTEM total cross-section measurements in an energy range larger than 10 TeV (from 2.76 to 13 TeV), has implied the exclusion of all the models classified and published by COMPETE. The $\rho$ results obtained by TOTEM are compatible with the predictions, from alternative theoretical models both in the Regge-like framework and in the QCD framework, of a colourless 3-gluon bound state exchange in the $t$-channel of the proton-proton elastic scattering. On the contrary, if shown that the 3-gluon bound state $t$-channel exchange is not of importance for the description of elastic scattering, the $\rho$ value determined by TOTEM would represent a first evidence of a slowing down of the total cross-section growth at higher energies. The very low-$|t|$ reach allowed also to determine the absolute normalisation using the Coulomb amplitude for the first time at the LHC and obtain a new total proton-proton cross-section measurement $\sigma_{tot} = 110.3 \pm 3.5$ mb, completely independent from the previous TOTEM determination. Combining the two TOTEM results yields $\sigma_{tot} = 110.5 \pm 2.4$ mb.
DOI: 10.1007/978-3-319-32152-3_9
2016
Cited 4 times
Adaptive Multi-level Workflow Scheduling with Uncertain Task Estimates
Scheduling of scientific workflows in IaaS clouds with pay-per-use pricing model and multiple types of virtual machines is an important challenge. Most static scheduling algorithms assume that the estimates of task runtimes are known in advance, while in reality the actual runtime may vary. To address this problem, we propose an adaptive scheduling algorithm for deadline constrained workflows consisting of multiple levels. The algorithm produces a global approximate plan for the whole workflow in a first phase, and a local detailed schedule for the current level of the workflow. By applying this procedure iteratively after each level completes, the algorithm is able to adjust to the runtime variation. For each phase we propose optimization models that are solved using Mixed Integer Programming (MIP) method. The preliminary simulation results using data from Amazon infrastructure, and both synthetic and Montage workflows, show that the adaptive approach has advantages over a static one.
DOI: 10.1016/j.procs.2015.05.230
2015
Cited 4 times
Leveraging Workflows and Clouds for a Multi-frontal Solver for Finite Element Meshes
Scientific workflows in clouds have been successfully used for automation of large-scale computations, but so far they were applied to the loosely-coupled problems, where most workflow tasks can be processed independently in parallel and do not require high volume of communication. The multi-frontal solver algorithm for finite element meshes can be represented as a workflow, but the fine granularity of resulting tasks and the large communication to computation ratio makes it hard to execute it efficiently in loosely-coupled environments such as the Infrastructure-as-a-Service clouds. In this paper, we hypothesize that there exists a class of meshes that can be effectively decomposed into a workflow and mapped onto a cloud infrastructure. To show that, we have developed a workflow-based multi-frontal solver using the HyperFlow workflow engine, which comprises workflow generation from the elimination tree, analysis of the workflow structure, task aggregation based on estimated computation costs, and distributed execution using a dedicated worker service that can be deployed in clouds or clusters. The results of our experiments using the workflows of over 10,000 tasks indicate that after task aggregation the resulting workflows of over 100 tasks can be efficiently executed, and the overheads are not prohibitive. These results lead us to conclusions that our approach is feasible and gives prospects for providing a generic workflow-based solution using clouds for problems typically considered as requiring HPC infrastructure.
DOI: 10.1007/11752578_84
2006
Cited 7 times
A Grid Service for Management of Multiple HLA Federate Processes
The subject of this paper is a Grid management service called HLA–Speaking Service that interfaces an actual High Level Architecture (HLA) application with the Grid HLA Management System (GHMS). HLA–Speaking Service is responsible for execution of an application code on the site it resides and manages multiple federate processes. The design of the architecture is based on the OGSA concept that allows for modularity and compatibility with Grid Services already being developed. We present the functionality of the Service with an example of N–body simulation of dense stellar system.
DOI: 10.1007/978-3-540-68111-3_113
2008
Cited 5 times
Universal Grid Client: Grid Operation Invoker
DOI: 10.1016/j.procs.2017.05.016
2017
Cited 4 times
Topology-aware Job Allocation in 3D Torus-based HPC Systems with Hard Job Priority Constraints
In this paper, we address the topology-aware job allocation problem on 3D torus-based high performance computing systems, with the objective of reducing system fragmentation. Firstly, we propose a group-based job allocation strategy, which leads to a more global optimization of resource allocation. Secondly, we propose two shape allocation methods to determine the topo-logical shape for each input job, including a zigzag allocation method for communication non-sensitive jobs, and a convex allocation method for communication sensitive jobs. Thirdly, we propose a topology-aware job mapping algorithm to reduce the system fragmentation brought in by the job mapping process, including a target bin selection method and a bi-directional job mapping method. The evaluation results validate the efficiency of our approach in reducing system fragmentation and improving system utilization.
DOI: 10.1007/978-3-030-29400-7_18
2019
Cited 4 times
Declarative Big Data Analysis for High-Energy Physics: TOTEM Use Case
The High-Energy Physics community faces new data processing challenges caused by the expected growth of data resulting from the upgrade of LHC accelerator. These challenges drive the demand for exploring new approaches for data analysis. In this paper, we present a new declarative programming model extending the popular ROOT data analysis framework, and its distributed processing capability based on Apache Spark. The developed framework enables high-level operations on the data, known from other big data toolkits, while preserving compatibility with existing HEP data files and software. In our experiments with a real analysis of TOTEM experiment data, we evaluate the scalability of this approach and its prospects for interactive processing of such large data sets. Moreover, we show that the analysis code developed with the new model is portable between a production cluster at CERN and an external cluster hosted in the Helix Nebula Science Cloud thanks to the bundle of services of Science Box.
DOI: 10.1098/rsfs.2020.0006
2020
Cited 4 times
The EurValve model execution environment
The goal of this paper is to present a dedicated high-performance computing (HPC) infrastructure which is used in the development of a so-called reduced-order model (ROM) for simulating the outcomes of interventional procedures which are contemplated in the treatment of valvular heart conditions. Following a brief introduction to the problem, the paper presents the design of a model execution environment, in which representative cases can be simulated and the parameters of the ROM fine-tuned to enable subsequent deployment of a decision support system without further need for HPC. The presentation of the system is followed by information concerning its use in processing specific patient cases in the context of the EurValve international collaboration.
DOI: 10.1007/978-3-031-08754-7_50
2022
CXR-FL: Deep Learning-Based Chest X-ray Image Analysis Using Federated Learning
Federated learning enables building a shared model from multicentre data while storing the training data locally for privacy. In this paper, we present an evaluation (called CXR-FL) of deep learning-based models for chest X-ray image analysis using the federated learning method. We examine the impact of federated learning parameters on the performance of central models. Additionally, we show that classification models perform worse if trained on a region of interest reduced to segmentation of the lung compared to the full image. However, focusing training of the classification model on the lung area may result in improved pathology interpretability during inference. We also find that federated learning helps maintain model generalizability. The pre-trained weights and code are publicly available at ( https://github.com/SanoScience/CXR-FL ).
DOI: 10.1109/mic.2022.3168810
2022
Serverless Computing for Scientific Applications
Serverless computing has become an important model in cloud computing and influenced the design of many applications. Here, we provide our perspective on how the recent landscape of serverless computing for scientific applications looks like. We discuss the advantages and problems with serverless computing for scientific applications, and based on the analysis of existing solutions and approaches, we propose a science-oriented architecture for a serverless computing framework that is based on the existing designs. Finally, we provide an outlook of current trends and future directions.
2007
Cited 5 times
Collaborative Virtual Laboratory for e-Health
This paper describes the Virtual Laboratory for e-Health system which is currently being developed in the EU IST ViroLab project. The Virtual Laboratory is an environment that enables clinical researchers to prepare and execute computing experiments using a distributed Grid infrastructure, while not requiring in-depth Grid computing technologies knowledge. By virtualizing the hardware, computing infrastructure and databases, the Virtual Laboratory is a user friendly environment, with tailored workflow templates to harness and automate such diverse tasks as data archiving, data integration, data mining and analysis, modeling and simulation.
2008
Cited 4 times
Invocation of Grid operations in the ViroLab Virtual Laboratory
This paper presents invocation of grid operations within the ViroLab Virtual Laboratory. Virtual laboratory enables users to develop and execute experiments that access computational resources on the Grid exposed via various middleware technologies. An abstraction over the Grid environment is introduced which is based on the concept of grid objects accessible from the experiment script based on Ruby. We describe the Grid Operation Invoker library which is the core of virtual laboratory engine and provides access to heterogeneous computational resources in a uniform manner using pluggable adapters. Sample applications include a script, which implements a data mining scenario using the Weka library and combines Web services with MOCCA technologies.
DOI: 10.1007/978-3-642-03644-6_18
2009
Cited 4 times
ViroLab Security and Virtual Organization Infrastructure
This paper introduces security requirements and solutions present in the ViroLab Virtual Laboratory. Our approach is to use a federated Single Sign-On mechanism based on the Shibboleth framework that enables multiple partners to authenticate against their local identity systems and use resources provided by all other partners. Since the basic Shibboleth capabilities do not meet our specific requirements related to supporting non-web-based services, we created a set of custom tools that allow us to develop a homogeneous, Shibboleth-based security solution for both Web and non-web-based software components. This paper describes these tools in detail, together with other services of the virtual laboratory which have been integrated with the security infrastructure. A decentralized, attribute-based approach facilitating the creation and management of virtual organizations is the key achievement of our work.
DOI: 10.1007/978-0-387-78448-9_25
2008
Cited 4 times
High-Level Scripting Approach for Building Component-Based Applications on the Grid
DOI: 10.1007/978-3-540-39924-7_82
2003
Cited 6 times
Component-Based System for Grid Application Workflow Composition
An application working within a Grid environment can be very complex, with distributed modules and decentralized computation. It is not a simple task to dispatch that kind of application, especially when the environment is changing. This paper presents the design and the implementation of the Application Flow Composer system which supports building the description of the Grid application flow by combining its elements from a loose set of components distributed in the Grid. The system is based on the Common Component Architecture (CCA) and uses the CCA distributed component description model. OGSA Registry Grid Service is applied for storing component description documents. The performance tests confirm the feasibility of this approach.
DOI: 10.1007/978-3-540-24669-5_113
2004
Cited 5 times
Execution and Migration Management of HLA-Based Interactive Simulations on the Grid
This paper presents the design of a system that supports execution of a HLA distributed interactive simulations in an unreliable Grid environment. The design of the architecture is based on the OGSA concept that allows for modularity and compatibility with Grid Services already being developed. First of all, we focus on the part of the system that is responsible for migration of a HLA-connected component or components of the distributed application in the Grid environment. We present a runtime support library for easily plugging HLA simulations into the Grid Services Framework. We also present the impact of execution management (namely migration) on overall system performance.
DOI: 10.1007/978-3-319-61756-5_12
2017
Cited 3 times
Topology-Aware Scheduling on Blue Waters with Proactive Queue Scanning and Migration-Based Job Placement
Modern HPC systems, such as Blue Waters, have multidimensional torus topologies, which make it hard to achieve a high system utilization and a high scheduling efficiency. The low system utilization is majorly caused by system fragmentation, which includes both internal fragmentation due to convex prism shape requirement, and external fragmentation resulted from contiguous allocation strategy. The low scheduling efficiency comes from using a brute force search to find the free block with a matching shape for each job, which is highly time consuming. In this paper, we address the topology-aware scheduling problem on Blue Waters, with the objective of improving system utilization and scheduling efficiency. To improve scheduling efficiency, we propose an efficient free partition detection method. To improve system utilization, we propose a job scheduling strategy with proactive queue scanning and a migration-based job placement algorithm. Through extensive simulations of modeled trace data, we demonstrate that our approach improves the system utilization.
DOI: 10.1007/978-3-030-43229-4_25
2020
Cited 3 times
Cloud Infrastructure Automation for Scientific Workflows
We present a solution for cloud infrastructure automation for scientific workflows. Unlike existing approaches, our solution is based on widely adopted tools, such as Terraform, and achieves a strict separation of two concerns: infrastructure description and provisioning vs. workflow description. At the same time it enables a comprehensive integration with a given cloud infrastructure, i.e. such wherein workflow execution can be managed by the cloud. The solution is integrated with our HyperFlow workflow management system and evaluated by demonstrating its use in experiments related to auto-scaling of scientific workflows in two types of cloud infrastructures: containerized Infrastructure-as-a-Service (IaaS) and Function-as-a-Service (FaaS). Experimental evaluation involves deployment and execution of a test workflow in Amazon ECS/Docker cluster and on a hybrid of Amazon ECS and AWS Lambda. The results show that our solution not only helps in the creation of repeatable infrastructures for scientific computing but also greatly facilitates automation of research experiments related to the execution of scientific workflows on advanced computing infrastructures.
DOI: 10.1145/3452413.3464788
2020
Cited 3 times
Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda
The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers.
DOI: 10.1109/e-science.2006.127
2006
Cited 4 times
Semantic Composition of Scientific Workflows Based on the Petri Nets Formalism
The idea of an application described through its workflow is becoming popular in the Grid community as a natural method of functional decomposition of an application. It shows all the important dependencies as a set of connections of data flow and/or control flow. As scientific workflows grow in size and complexity, a tool to assist end users is becoming necessary. In this paper we describe the formal basis, design and implementation of such a tool -- an assistant which analyzes user requirements regarding application results and works with information registries that provide information on resources available in the Grid. The Workflow Composition Tool (WCT) provides the functionality of automatic workflow construction based on the process of semantic service discovery and matchmaking. It uses a well-designed construction algorithm together with specific heuristics in order to provide useful solutions for application users.
2008
Cited 3 times
GridSpace Engine of the ViroLab Virtual Laboratory
GridSpace Engine is the central operational unit of the ViroLab Virtual Laboratory. This specific runtime environment enables access to computational and data resources by coordinating execution of experiments written in the Ruby programming language extended with virtual laboratory capabilities. Experiments harness published and semantically described services which constitute a GridSpace. The GridSpace Engine is a reliable service acting as an entry point to the Virtual Laboratory, its execution capabilities and a facade for specialized services such as Data Access Service. Moreover, owing to the provided dedicated libraries, the GridSpace Engine supports interactive execution and run-time monitoring of experiments. Furthermore, the GridSpace Engine is capable of retrieving experiment source not only from file systems but also from multiple Application Repositories accessed by dedicated adapters. Currently, our repository is based on the Subversion source code management and version control system. The GridSpace Engine is also responsible for storing obtained experimental results in the Laboratory Data Base.
2008
Cited 3 times
ViroLab Virtual Laboratory
DOI: 10.1007/978-3-642-28267-6_20
2012
Examining Protein Folding Process Simulation and Searching for Common Structure Motifs in a Protein Family as Experiments in the GridSpace2 Virtual Laboratory
This paper presents two in-silico experiments from the field of bioinformatics. The first experiment covers the popular problem of protein folding process simulation and investigates the correctness of the “Fuzzy Oil Drop” model (FOD) [3], on over 60 thousands of proteins deposited in Protein Data Bank [18]. The FOD model assumes the hydrophobicity distribution in proteins to be accordant with the 3D Gauss function differentiating the hydrophobicity density from the highest in the center of the molecule, to zero level on the surface. The second experiment focuses on performing comparison of proteins that belong to the same family. Examination of proteins alignment at three different levels of protein description may lead to identifying a conservative area in protein family, which is responsible for the protein function. It also creates a possibility of determining a ligand binding site for protein, which is a key issue in drug design. Both experiments were realized as virtual experiments in the GridSpace2 Virtual Laboratory [13] Experiment Workbench [16] and were executed on Zeus cluster provided by PL-Grid.
DOI: 10.48550/arxiv.2302.03616
2023
Can gamification reduce the burden of self-reporting in mHealth applications? A feasibility study using machine learning from smartwatch data to estimate cognitive load
The effectiveness of digital treatments can be measured by requiring patients to self-report their state through applications, however, it can be overwhelming and causes disengagement. We conduct a study to explore the impact of gamification on self-reporting. Our approach involves the creation of a system to assess cognitive load (CL) through the analysis of photoplethysmography (PPG) signals. The data from 11 participants is utilized to train a machine learning model to detect CL. Subsequently, we create two versions of surveys: a gamified and a traditional one. We estimate the CL experienced by other participants (13) while completing surveys. We find that CL detector performance can be enhanced via pre-training on stress detection tasks. For 10 out of 13 participants, a personalized CL detector can achieve an F1 score above 0.7. We find no difference between the gamified and non-gamified surveys in terms of CL but participants prefer the gamified version.
DOI: 10.48550/arxiv.2304.08190
2023
Serverless Approach to Sensitivity Analysis of Computational Models
Digital twins are virtual representations of physical objects or systems used for the purpose of analysis, most often via computer simulations, in many engineering and scientific disciplines. Recently, this approach has been introduced to computational medicine, within the concept of Digital Twin in Healthcare (DTH). Such research requires verification and validation of its models, as well as the corresponding sensitivity analysis and uncertainty quantification (VVUQ). From the computing perspective, VVUQ is a computationally intensive process, as it requires numerous runs with variations of input parameters. Researchers often use high-performance computing (HPC) solutions to run VVUQ studies where the number of parameter combinations can easily reach tens of thousands. However, there is a viable alternative to HPC for a substantial subset of computational models - serverless computing. In this paper we hypothesize that using the serverless computing model can be a practical and efficient approach to selected cases of running VVUQ calculations. We show this on the example of the EasyVVUQ library, which we extend by providing support for many serverless services. The resulting library - CloudVVUQ - is evaluated using two real-world applications from the computational medicine domain adapted for serverless execution. Our experiments demonstrate the scalability of the proposed approach.
DOI: 10.1007/978-3-031-36021-3_2
2023
Digital Twin Simulation Development and Execution on HPC Infrastructures
The Digital Twin paradigm in medical care has recently gained popularity among proponents of translational medicine, to enable clinicians to make informed choices regarding treatment on the basis of digital simulations. In this paper we present an overview of functional and non-functional requirements related to specific IT solutions which enable such simulations - including the need to ensure repeatability and traceability of results - and propose an architecture that satisfies these requirements. We then describe a computational platform that facilitates digital twin simulations, and validate our approach in the context of a real-life medical use case: the BoneStrength application.
DOI: 10.1109/ccgrid57682.2023.00064
2023
Serverless Approach to Sensitivity Analysis of Computational Models
Digital twins are virtual representations of physical objects or systems used for the purpose of analysis, most often via computer simulations, in many engineering and scientific disciplines. Recently, this approach has been introduced to computational medicine, within the concept of Digital Twin in Healthcare (DTH). Such research requires verification and validation of its models, as well as the corresponding sensitivity analysis and uncertainty quantification (VVUQ). From the computing perspective, VVUQ is a computationally intensive process, as it requires numerous runs with variations of input parameters. Researchers often use high-performance computing (HPC) solutions to run VVUQ studies where the number of parameter combinations can easily reach tens of thousands. However, there is a viable alternative to HPC for a substantial subset of computational models - serverless computing. In this paper we hypothesize that using the serverless computing model can be a practical and efficient approach to selected cases of running VVUQ calculations. We show this on the example of the EasyVVUQ library, which we extend by providing support for many serverless services. The resulting library - CloudVVUQ - is evaluated using two real-world applications from the computational medicine domain adapted for serverless execution. Our experiments demonstrate the scalability of the proposed approach.
DOI: 10.21203/rs.3.rs-3173492/v1
2023
GraphTar: applying word2vec and graph neural networks to miRNA target prediction
Abstract Background: MicroRNAs (miRNAs) are short, non-coding RNA molecules that regulate gene expression by binding to specific mRNAs, inhibiting their translation. They play a critical role in regulating various biological processes and are implicated in many diseases, including cardiovascular, oncological, gastrointestinal diseases, and viral infections. Computational methods that can identify potential miRNA-mRNA interactions from raw data use one-dimensional miRNA-mRNA duplex representations and simple sequence encoding techniques, which may limit their performance. Results: We have developed GraphTar, a new target prediction method that uses a novel graph-based representation to reflect the spatial structure of the miRNA-mRNA duplex. Unlike existing approaches, we use the word2vec method to accurately encode RNA sequence information. In conjunction with a novel, word2vec-based encoding method, we use a graph neural network classifier that can accurately predict miRNA-mRNA interactions based on graph representation learning. As part of a comparative study, we evaluate three different node embedding approaches within the GraphTar framework and compare them with other state-of-the-art target prediction methods. The results show that the proposed method achieves similar performance to the best methods in the field and outperforms them on one of the datasets. Conclusions: In this study, a novel miRNA target prediction approach called GraphTar is introduced. Results show that GraphTar is as effective as existing methods and even outperforms them in some cases, opening new avenues for further research. However, the expansion of available datasets is critical for advancing the field towards real-world applications.
DOI: 10.1145/3624062.3626283
2023
Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments
The annual Workshop on Workflows in Support of Large-Scale Science (WORKS) is a premier venue for the scientific workflow community to present the latest advances in research and development on the many facets of scientific workflows throughout their life-cycle. The Lightning Talks at WORKS focus on describing a novel tool, scientific workflow, or concept, which are work-in-progress and address emerging technologies and frameworks to foster discussion in the community. This paper summarizes the lightning talks at the 2023 edition of WORKS, covering five topics: leveraging large language models to build and execute workflows; developing a common workflow scheduler interface; scaling uncertainty workflow applications on exascale computing systems; evaluating a transcriptomics workflow for cloud vs. HPC systems; and best practices in migrating legacy workflows to workflow management systems.
DOI: 10.1007/978-3-540-24688-6_6
2004
Cited 4 times
Grid Service Registry for Workflow Composition Framework
The system presented in this paper supports the user in composing the flow of distributed application from existing Grid services. The flow composition system builds workflows on an abstract level with semantic and syntactic description of services available in a Grid services registry. This paper presents concepts of an overall system architecture and it focuses on one of the two main modules of the system – the distributed Grid service registry.
DOI: 10.1007/978-3-540-24688-6_109
2004
Cited 4 times
Support for Effective and Fault Tolerant Execution of HLA-Based Applications in the OGSA Framework
The execution of High Level Architecture (HLA) distributed interactive simulations in an unreliable Grid environment requires efficient brokering of resources, which is an important part of the Grid Services framework supporting execution of such simulations. This paper presents the overall architecture of the framework with emphasis on services supplying the Broker Service with information about application performance on a Wide Area Network (WAN). For that purpose, a benchmark interaction-simulation-visualization schema is designed basing on CrossGrid medical application architecture [1,10].
2006
Cited 3 times
Security models for lightweight grid architectures
Security management is important for the effective functioning of a grid system. Here we present a security management model developed for the lightweight multilevel grid system. We decided to implement a role based security model, where a number of roles are defined for the grid. Different roles have different access restrictions. They are assigned to users and this is how users receive rights in the grid. Every cluster in the grid has security service, which handle user's identification and access control. The role based access control allows us to incorporate some simple access decision logic in the information service, which makes controlling user rights easier. Further in this paper we compare our security architecture with that of the lightweight middleware H2O and MOCCA in order to identify the important requirements, common concepts and technical solutions which may be reused.
DOI: 10.1007/978-3-540-69389-5_28
2008
A Tool for Building Collaborative Applications by Invocation of Grid Operations
The motivation for this work is the need for providing tools which facilitate building scientific applications that are developed and executed on various Grid systems, implemented with different technologies. As a solution to this problem, we have developed the Grid Operation Invoker (GOI) which offers object-oriented method invocation semantics for interacting with computational services accessible with diverse middleware frameworks. GOI forms the core of the ViroLab virtual laboratory engine and it is used to invoke operations from within experiments described using a scripting notation. In this paper, after outlining the features of GOI, we describe how it is enhanced with a mechanism of so-called local gems which allows adding high-level support for middleware technologies based on the batch job-processing model, e.g. EGEE LCG/gLite. As a result, we demonstrate how a molecular dynamics program called NAMD, deployed on EGEE, was integrated with the ViroLab virtual laboratory.