Your browser doesn't support javascript.
loading
Container-based bioinformatics with Pachyderm.
Novella, Jon Ander; Emami Khoonsari, Payam; Herman, Stephanie; Whitenack, Daniel; Capuccini, Marco; Burman, Joachim; Kultima, Kim; Spjuth, Ola.
Afiliación
  • Novella JA; Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
  • Emami Khoonsari P; Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.
  • Herman S; Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
  • Whitenack D; Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala, Sweden.
  • Capuccini M; Pachyderm, Inc., San Francisco, CA, USA.
  • Burman J; Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
  • Kultima K; Department of Information Technology, Uppsala University, Uppsala, Sweden.
  • Spjuth O; Department of Neuroscience, Uppsala University, Uppsala, Sweden.
Bioinformatics ; 35(5): 839-846, 2019 03 01.
Article en En | MEDLINE | ID: mdl-30101309
MOTIVATION: Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages and (iii) a data management layer that tracks data as it moves through the processing pipeline. RESULTS: Pachyderm is an open-source workflow system and data management framework that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability. AVAILABILITY AND IMPLEMENTATION: Pachyderm is available from https://github.com/pachyderm/pachyderm. The Pachyderm Helm Chart is available from https://github.com/kubernetes/charts/tree/master/stable/pachyderm. Pachyderm is available out-of-the-box from the PhenoMeNal VRE (https://github.com/phnmnl/KubeNow-plugin) and general Kubernetes environments instantiated via KubeNow. The code of the workflow used for the analysis is available on GitHub (https://github.com/pharmbio/LC-MS-Pachyderm). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Biología Computacional Tipo de estudio: Prognostic_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Suecia

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Biología Computacional Tipo de estudio: Prognostic_studies Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2019 Tipo del documento: Article País de afiliación: Suecia