Quantifying and Cataloguing Unknown Sequences within Human Microbiomes.

Modha, Sejal; Robertson, David L; Hughes, Joseph; Orton, Richard J

Modha, Sejal; Robertson, David L; Hughes, Joseph; Orton, Richard J.

Afiliación

Modha S; MRC University of Glasgowgrid.8756.c Centre for Virus Research, Glasgow, United Kingdom.
Robertson DL; MRC University of Glasgowgrid.8756.c Centre for Virus Research, Glasgow, United Kingdom.
Hughes J; MRC University of Glasgowgrid.8756.c Centre for Virus Research, Glasgow, United Kingdom.
Orton RJ; MRC University of Glasgowgrid.8756.c Centre for Virus Research, Glasgow, United Kingdom.

mSystems ; 7(2): e0146821, 2022 04 26.

Article en En | MEDLINE | ID: mdl-35258340

RESUMEN

Advances in genome sequencing technologies and lower costs have enabled the exploration of a multitude of known and novel environments and microbiomes. This has led to an exponential growth in the raw sequence data that are deposited in online repositories. Metagenomic and metatranscriptomic data sets are typically analysed with regard to a specific biological question. However, it is widely acknowledged that these data sets are comprised of a proportion of sequences that bear no similarity to any currently known biological sequence, and this so-called "dark matter" is often excluded from downstream analyses. In this study, a systematic framework was developed to assemble, identify, and measure the proportion of unknown sequences present in distinct human microbiomes. This framework was applied to 40 distinct studies, comprising 963 samples, and covering 10 different human microbiomes including fecal, oral, lung, skin, and circulatory system microbiomes. We found that while the human microbiome is one of the most extensively studied, on average 2% of assembled sequences have not yet been taxonomically defined. However, this proportion varied extensively among different microbiomes and was as high as 25% for skin and oral microbiomes that have more interactions with the environment. A rate of taxonomic characterization of 1.64% of unknown sequences being characterized per month was calculated from these taxonomically unknown sequences discovered in this study. A cross-study comparison led to the identification of similar unknown sequences in different samples and/or microbiomes. Both our computational framework and the novel unknown sequences produced are publicly available for future cross-referencing. Our approach led to the discovery of several novel viral genomes that bear no similarity to sequences in the public databases. Some of these are widespread as they have been found in different microbiomes and studies. Hence, our study illustrates how the systematic characterization of unknown sequences can help the discovery of novel microbes, and we call on the research community to systematically collate and share the unknown sequences from metagenomic studies to increase the rate at which the unknown sequence space can be classified.

Asunto(s)

Microbiota; Ursidae; Animales; Humanos; Ursidae/genética; Microbiota/genética; Metagenoma; Metagenómica; Bases de Datos Factuales

Palabras clave

dark matter; genome assembly; human microbiome; metagenomics; microbial dark matter; novel sequences; unknown sequences; virus

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Ursidae / Microbiota Límite: Animals / Humans Idioma: En Revista: MSystems Año: 2022 Tipo del documento: Article País de afiliación: Reino Unido Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google