Your browser doesn't support javascript.
loading
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.
Bernstein, Matthew N; Doan, AnHai; Dewey, Colin N.
Afiliación
  • Bernstein MN; Department of Computer Sciences.
  • Doan A; Department of Computer Sciences.
  • Dewey CN; Department of Computer Sciences.
Bioinformatics ; 33(18): 2914-2923, 2017 Sep 15.
Article en En | MEDLINE | ID: mdl-28535296
ABSTRACT
MOTIVATION The NCBI's Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA.

RESULTS:

We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. AVAILABILITY AND IMPLEMENTATION The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http//github.com/deweylab/metasra-pipeline. CONTACT cdewey@biostat.wisc.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Bases de Datos Genéticas / Secuenciación de Nucleótidos de Alto Rendimiento / Ontologías Biológicas / Metadatos Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2017 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Bases de Datos Genéticas / Secuenciación de Nucleótidos de Alto Rendimiento / Ontologías Biológicas / Metadatos Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2017 Tipo del documento: Article