Terabase-scale metagenome coassembly with MetaHipMer.

Hofmeyr, Steven; Egan, Rob; Georganas, Evangelos; Copeland, Alex C; Riley, Robert; Clum, Alicia; Eloe-Fadrosh, Emiley; Roux, Simon; Goltsman, Eugene; Buluç, Aydin; Rokhsar, Daniel; Oliker, Leonid; Yelick, Katherine

Hofmeyr, Steven; Egan, Rob; Georganas, Evangelos; Copeland, Alex C; Riley, Robert; Clum, Alicia; Eloe-Fadrosh, Emiley; Roux, Simon; Goltsman, Eugene; Buluç, Aydin; Rokhsar, Daniel; Oliker, Leonid; Yelick, Katherine.

Afiliação

Hofmeyr S; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. shofmeyr@lbl.gov.
Egan R; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Georganas E; Parallel Computing Lab, Intel Corp., Santa Clara, CA, 95054, USA.
Copeland AC; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Riley R; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Clum A; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Eloe-Fadrosh E; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Roux S; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Goltsman E; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Buluç A; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Rokhsar D; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, 94720, USA.
Oliker L; Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Yelick K; Department of Molecular and Cellular Biology, University of California, Berkeley, CA, 94720, USA.

Sci Rep ; 10(1): 10689, 2020 07 01.

Article em En | MEDLINE | ID: mdl-32612216

RESUMO

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

Assuntos

Biologia Computacional/métodos; Genoma Bacteriano/genética; Metagenoma/genética; Metagenômica/métodos; Algoritmos; Computadores; Microbiota/genética; Pseudoalteromonas/genética; Pseudoalteromonas/isolamento & purificação; Análise de Sequência de DNA/métodos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Genoma Bacteriano / Biologia Computacional / Metagenoma / Metagenômica Idioma: En Revista: Sci Rep Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google