Pesquisa | BVS Integralidade em Saúde

Benchmarking genome assembly methods on metagenomic sequencing data.

Zhang, Zhenmiao; Yang, Chao; Veldsman, Werner Pieter; Fang, Xiaodong; Zhang, Lu.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36917471

RESUMO

Metagenome assembly is an efficient approach to reconstruct microbial genomes from metagenomic sequencing data. Although short-read sequencing has been widely used for metagenome assembly, linked- and long-read sequencing have shown their advancements in assembly by providing long-range DNA connectedness. Many metagenome assembly tools were developed to simplify the assembly graphs and resolve the repeats in microbial genomes. However, there remains no comprehensive evaluation of metagenomic sequencing technologies, and there is a lack of practical guidance on selecting the appropriate metagenome assembly tools. This paper presents a comprehensive benchmark of 19 commonly used assembly tools applied to metagenomic sequencing datasets obtained from simulation, mock communities or human gut microbiomes. These datasets were generated using mainstream sequencing platforms, such as Illumina and BGISEQ short-read sequencing, 10x Genomics linked-read sequencing, and PacBio and Oxford Nanopore long-read sequencing. The assembly tools were extensively evaluated against many criteria, which revealed that long-read assemblers generated high contig contiguity but failed to reveal some medium- and high-quality metagenome-assembled genomes (MAGs). Linked-read assemblers obtained the highest number of overall near-complete MAGs from the human gut microbiomes. Hybrid assemblers using both short- and long-read sequencing were promising methods to improve both total assembly length and the number of near-complete MAGs. This paper also discussed the running time and peak memory consumption of these assembly tools and provided practical guidance on selecting them.

Assuntos

Metagenoma , Microbiota , Humanos , Benchmarking , Microbiota/genética , Metagenômica/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

METAMVGL: a multi-view graph-based metagenomic contig binning algorithm by integrating assembly and paired-end graphs.

Zhang, Zhenmiao; Zhang, Lu.

BMC Bioinformatics ; 22(Suppl 10): 378, 2021 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-34294039

RESUMO

BACKGROUND: Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs' nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. RESULTS: We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs' weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. CONCLUSIONS: Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.

Assuntos

Metagenoma , Microbiota , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metagenoma/genética , Metagenômica , Microbiota/genética , Análise de Sequência de DNA , Software

Structural and Functional Disparities within the Human Gut Virome in Terms of Genome Topology and Representative Genome Selection.

Veldsman, Werner P; Yang, Chao; Zhang, Zhenmiao; Huang, Yufen; Chowdhury, Debajyoti; Zhang, Lu.

Viruses ; 16(1)2024 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-38257834

RESUMO

Circularity confers protection to viral genomes where linearity falls short, thereby fulfilling the form follows function aphorism. However, a shift away from morphology-based classification toward the molecular and ecological classification of viruses is currently underway within the field of virology. Recent years have seen drastic changes in the International Committee on Taxonomy of Viruses' operational definitions of viruses, particularly for the tailed phages that inhabit the human gut. After the abolition of the order Caudovirales, these tailed phages are best defined as members of the class Caudoviricetes. To determine the epistemological value of genome topology in the context of the human gut virome, we designed a set of seven experiments to assay the impact of genome topology and representative viral selection on biological interpretation. Using Oxford Nanopore long reads for viral genome assembly coupled with Illumina short-read polishing, we showed that circular and linear virus genomes differ remarkably in terms of genome quality, GC skew, transfer RNA gene frequency, structural variant frequency, cross-reference functional annotation (COG, KEGG, Pfam, and TIGRfam), state-of-the-art marker-based classification, and phage-host interaction. Furthermore, the disparity profile changes during dereplication. In particular, our phage-host interaction results demonstrated that proportional abundances cannot be meaningfully compared without due regard for genome topology and dereplication threshold, which necessitates the need for standardized reporting. As a best practice guideline, we recommend that comparative studies of the human gut virome always report the ratio of circular to linear viral genomes along with the dereplication threshold so that structural and functional metrics can be placed into context when assessing biologically relevant metagenomic properties such as proportional abundance.

Assuntos

Bacteriófagos , Viroma , Humanos , Viroma/genética , Genoma Viral , Bacteriófagos/genética , Metagenoma , Bioensaio

LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome.

Yang, Chao; Zhang, Zhenmiao; Huang, Yufen; Xie, Xuefeng; Liao, Herui; Xiao, Jin; Veldsman, Werner Pieter; Yin, Kejing; Fang, Xiaodong; Zhang, Lu.

Gigascience ; 132024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38869148

RESUMO

BACKGROUND: Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform. FINDINGS: To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots. CONCLUSIONS: LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.

Assuntos

Genoma Humano , Metagenoma , Metagenômica , Software , Humanos , Metagenômica/métodos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos

Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity.

Zhang, Zhenmiao; Xiao, Jin; Wang, Hongbo; Yang, Chao; Huang, Yufen; Yue, Zhen; Chen, Yang; Han, Lijuan; Yin, Kejing; Lyu, Aiping; Fang, Xiaodong; Zhang, Lu.

Nat Commun ; 15(1): 4631, 2024 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-38821971

RESUMO

Although long-read sequencing enables the generation of complete genomes for unculturable microbes, its high cost limits the widespread adoption of long-read sequencing in large-scale metagenomic studies. An alternative method is to assemble short-reads with long-range connectivity, which can be a cost-effective way to generate high-quality microbial genomes. Here, we develop Pangaea, a bioinformatic approach designed to enhance metagenome assembly using short-reads with long-range connectivity. Pangaea leverages connectivity derived from physical barcodes of linked-reads or virtual barcodes by aligning short-reads to long-reads. Pangaea utilizes a deep learning-based read binning algorithm to assemble co-barcoded reads exhibiting similar sequence contexts and abundances, thereby improving the assembly of high- and medium-abundance microbial genomes. Pangaea also leverages a multi-thresholding algorithm strategy to refine assembly for low-abundance microbes. We benchmark Pangaea on linked-reads and a combination of short- and long-reads from simulation data, mock communities and human gut metagenomes. Pangaea achieves significantly higher contig continuity as well as more near-complete metagenome-assembled genomes (NCMAGs) than the existing assemblers. Pangaea also generates three complete and circular NCMAGs on the human gut microbiomes.

Assuntos

Algoritmos , Microbioma Gastrointestinal , Genoma Microbiano , Metagenoma , Metagenômica , Humanos , Metagenoma/genética , Metagenômica/métodos , Microbioma Gastrointestinal/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado Profundo , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Genoma Bacteriano

A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data.

Yang, Chao; Chowdhury, Debajyoti; Zhang, Zhenmiao; Cheung, William K; Lu, Aiping; Bian, Zhaoxiang; Zhang, Lu.

Comput Struct Biotechnol J ; 19: 6301-6314, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34900140

RESUMO

Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.

A comprehensive investigation of metagenome assembly by linked-read sequencing.

Zhang, Lu; Fang, Xiaodong; Liao, Herui; Zhang, Zhenmiao; Zhou, Xin; Han, Lijuan; Chen, Yang; Qiu, Qinwei; Li, Shuai Cheng.

Microbiome ; 8(1): 156, 2020 11 11.

Artigo em Inglês | MEDLINE | ID: mdl-33176883

RESUMO

BACKGROUND: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10-100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality. RESULTS: We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (CR) and DNA fragment physical depth (CF). For the same C, deeper CR resulted in more draft genomes while deeper CF improved the quality of the draft genomes. We also found that average fragment length (µFL) had marginal effect on assemblies, while fragments per partition (NF/P) impacted the off-target reads involved in local assembly, namely, lower NF/P values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads. CONCLUSIONS: We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient CR but a smaller amount of input DNA. Video Abstract.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma/genética , Metagenômica/métodos , Microbiota/genética , Análise de Sequência de DNA/métodos , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa