A workflow reproducibility scale for automatic validation of biological interpretation results.

Suetake, Hirotaka; Fukusato, Tsukasa; Igarashi, Takeo; Ohta, Tazro

Suetake, Hirotaka; Fukusato, Tsukasa; Igarashi, Takeo; Ohta, Tazro.

Afiliação

Suetake H; Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan.
Fukusato T; Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan.
Igarashi T; Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan.
Ohta T; Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, 411-8540, Japan.

Gigascience ; 122022 12 28.

Article em En | MEDLINE | ID: mdl-37150537

ABSTRACT

ABSTRACT

BACKGROUND:

Reproducibility of data analysis workflow is a key issue in the field of bioinformatics. Recent computing technologies, such as virtualization, have made it possible to reproduce workflow execution with ease. However, the reproducibility of results is not well discussed; that is, there is no standard way to verify whether the biological interpretation of reproduced results is the same. Therefore, it still remains a challenge to automatically evaluate the reproducibility of results.

RESULTS:

We propose a new metric, a reproducibility scale of workflow execution results, to evaluate the reproducibility of results. This metric is based on the idea of evaluating the reproducibility of results using biological feature values (e.g., number of reads, mapping rate, and variant frequency) representing their biological interpretation. We also implemented a prototype system that automatically evaluates the reproducibility of results using the proposed metric. To demonstrate our approach, we conducted an experiment using workflows used by researchers in real research projects and the use cases that are frequently encountered in the field of bioinformatics.

CONCLUSIONS:

Our approach enables automatic evaluation of the reproducibility of results using a fine-grained scale. By introducing our approach, it is possible to evolve from a binary view of whether the results are superficially identical or not to a more graduated view. We believe that our approach will contribute to more informed discussion on reproducibility in bioinformatics.

Assuntos

Biologia Computacional; Pesquisadores; Humanos; Fluxo de Trabalho; Reprodutibilidade dos Testes; Biologia Computacional/métodos; Software

Palavras-chave

provenance; reproducibility; workflow

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Pesquisadores / Biologia Computacional Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google