Your browser doesn't support javascript.
loading
Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects.
Sheffield, Nathan C; Stolarczyk, Michal; Reuter, Vincent P; Rendeiro, André F.
Afiliación
  • Sheffield NC; Center for Public Health Genomics, University of Virginia, VA 22908, USA.
  • Stolarczyk M; Department of Public Health Sciences, University of Virginia, VA 22908, USA.
  • Reuter VP; Department of Biomedical Engineering, University of Virginia, VA 22908, USA.
  • Rendeiro AF; Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908, USA.
Gigascience ; 10(12)2021 12 06.
Article en En | MEDLINE | ID: mdl-34890448
ABSTRACT

BACKGROUND:

Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software.

RESULTS:

To address this, we present the Portable Encapsulated Project (PEP) specification, a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many biological samples. In addition to standardization, the PEP specification provides descriptors and modifiers for project-level and sample-level metadata, which improve portability across both computing environments and data processing tools. PEPs include a schema validator framework, allowing formal definition of required metadata attributes for data analysis broadly. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata.

CONCLUSIONS:

The PEP specification is an important step toward unifying data annotation and processing tools in data-intensive biological research projects. Links to tools and documentation are available at http//pep.databio.org/.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / Metadatos Idioma: En Revista: Gigascience Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / Metadatos Idioma: En Revista: Gigascience Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos