ϟ
 
DOI: 10.12688/f1000research.10137.1
¤ OpenAccess: Gold
This work has “Gold” OA status. This means it is published in an Open Access journal that is indexed by the DOAJ.

The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows

Brian D. O’Connor,Denis Yuen,Vincent Chung,A. Duncan,Xiang Kun Liu,Janice Patricia,Benedict Paten,Lincoln Stein,Vincent Ferretti

Workflow
Cloud computing
Upload
2017
<ns4:p>As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore (<ns4:ext-link xmlns:ns3="http://www.w3.org/1999/xlink" ext-link-type="uri" ns3:href="https://dockstore.org">https://dockstore.org</ns4:ext-link>), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).</ns4:p>
Loading...
    Cite this:
Generate Citation
Powered by Citationsy*
    The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows” is a paper by Brian D. O’Connor Denis Yuen Vincent Chung A. Duncan Xiang Kun Liu Janice Patricia Benedict Paten Lincoln Stein Vincent Ferretti published in 2017. It has an Open Access status of “gold”. You can read and download a PDF Full Text of this paper here.