Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
PeerJ ; 11: e16129, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37753177

RESUMEN

Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies-Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR's ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.


Asunto(s)
Cromatina , Metagenoma , Cromatina/genética , Metagenoma/genética , Algoritmos , Benchmarking , Aprendizaje Automático Supervisado
2.
Microbiol Spectr ; 11(4): e0020023, 2023 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-37310219

RESUMEN

Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum Eremiobacterota. Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla Dependentiae, Dormibacterota, and Methylomirabilota. In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. IMPORTANCE Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.


Asunto(s)
Microbiota , Suelo , Microbiota/genética , Bacterias/genética , Metagenoma , Genoma Viral , Metagenómica/métodos
3.
mBio ; 14(2): e0358422, 2023 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-36877031

RESUMEN

Bacteria catalyze the formation and destruction of soil organic matter, but the bacterial dynamics in soil that govern carbon (C) cycling are not well understood. Life history strategies explain the complex dynamics of bacterial populations and activities based on trade-offs in energy allocation to growth, resource acquisition, and survival. Such trade-offs influence the fate of soil C, but their genomic basis remains poorly characterized. We used multisubstrate metagenomic DNA stable isotope probing to link genomic features of bacteria to their C acquisition and growth dynamics. We identify several genomic features associated with patterns of bacterial C acquisition and growth, notably genomic investment in resource acquisition and regulatory flexibility. Moreover, we identify genomic trade-offs defined by numbers of transcription factors, membrane transporters, and secreted products, which match predictions from life history theory. We further show that genomic investment in resource acquisition and regulatory flexibility can predict bacterial ecological strategies in soil. IMPORTANCE Soil microbes are major players in the global carbon cycle, yet we still have little understanding of how the carbon cycle operates in soil communities. A major limitation is that carbon metabolism lacks discrete functional genes that define carbon transformations. Instead, carbon transformations are governed by anabolic processes associated with growth, resource acquisition, and survival. We use metagenomic stable isotope probing to link genome information to microbial growth and carbon assimilation dynamics as they occur in soil. From these data, we identify genomic traits that can predict bacterial ecological strategies which define bacterial interactions with soil carbon.


Asunto(s)
Rasgos de la Historia de Vida , Suelo/química , Microbiología del Suelo , Bacterias/genética , Bacterias/metabolismo , Carbono/metabolismo , Isótopos/metabolismo , Metagenómica
4.
BMC Bioinformatics ; 23(1): 513, 2022 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-36451083

RESUMEN

BACKGROUND: The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures. RESULTS: In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment. CONCLUSIONS: We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.


Asunto(s)
Metagenoma , Microbiota , Metagenómica , Programas Informáticos , Biología Computacional
5.
Nat Methods ; 19(4): 429-440, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35396482

RESUMEN

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Asunto(s)
Metagenoma , Metagenómica , Archaea/genética , Metagenómica/métodos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Programas Informáticos
6.
Sci Rep ; 10(1): 10689, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32612216

RESUMEN

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano/genética , Metagenoma/genética , Metagenómica/métodos , Algoritmos , Computadores , Microbiota/genética , Pseudoalteromonas/genética , Pseudoalteromonas/aislamiento & purificación , Análisis de Secuencia de ADN/métodos
7.
Philos Trans A Math Phys Eng Sci ; 378(2166): 20190394, 2020 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-31955674

RESUMEN

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

8.
BMC Bioinformatics ; 20(1): 552, 2019 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-31694525

RESUMEN

BACKGROUND: Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. RESULTS: Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent "scrubbing" (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. CONCLUSIONS: MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub .


Asunto(s)
Aprendizaje Profundo/normas , Secuenciación de Nanoporos/métodos , Bases de Datos Genéticas , Humanos , Metagenoma , Mejoramiento de la Calidad , Programas Informáticos
9.
PeerJ ; 7: e7359, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31388474

RESUMEN

We previously reported on MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated microbial species. MetaBAT has become one of the most popular binning tools largely due to its computational efficiency and ease of use, especially in binning experiments with a large number of samples and a large assembly. MetaBAT requires users to choose parameters to fine-tune its sensitivity and specificity. If those parameters are not chosen properly, binning accuracy can suffer, especially on assemblies of poor quality. Here, we developed MetaBAT 2 to overcome this problem. MetaBAT 2 uses a new adaptive binning algorithm to eliminate manual parameter tuning. We also performed extensive software engineering optimization to increase both computational and memory efficiency. Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed. Binning a typical metagenome assembly takes only a few minutes on a single commodity workstation. We therefore recommend the community adopts MetaBAT 2 for their metagenome binning experiments. MetaBAT 2 is open source software and available at https://bitbucket.org/berkeleylab/metabat.

10.
Nat Commun ; 8(1): 858, 2017 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-29021524

RESUMEN

Virophages are small viruses that co-infect eukaryotic cells alongside giant viruses (Mimiviridae) and hijack their machinery to replicate. While two types of virophages have been isolated, their genomic diversity and ecology remain largely unknown. Here we use time series metagenomics to identify and study the dynamics of 25 uncultivated virophage populations, 17 of which represented by complete or near-complete genomes, in two North American freshwater lakes. Taxonomic analysis suggests that these freshwater virophages represent at least three new candidate genera. Ecologically, virophage populations are repeatedly detected over years and evolutionary stable, yet their distinct abundance profiles and gene content suggest that virophage genera occupy different ecological niches. Co-occurrence analyses reveal 11 virophages strongly associated with uncultivated Mimiviridae, and three associated with eukaryotes among the Dinophyceae, Rhizaria, Alveolata, and Cryptophyceae groups. Together, these findings significantly augment virophage databases, help refine virophage taxonomy, and establish baseline ecological hypotheses and tools to study virophages in nature.Virophages are recently-identified small viruses that infect larger viruses, yet their diversity and ecological roles are poorly understood. Here, Roux and colleagues present time series metagenomics data revealing new virophage genera and their putative ecological interactions in two freshwater lakes.


Asunto(s)
Ecosistema , Eucariontes/virología , Lagos/virología , Mimiviridae , Virófagos/genética , Genoma Viral , Metagenoma , Metagenómica
11.
PeerJ ; 3: e1165, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26336640

RESUMEN

Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

12.
Bioinformatics ; 29(4): 435-43, 2013 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-23303509

RESUMEN

MOTIVATION: Researchers need general purpose methods for objectively evaluating the accuracy of single and metagenome assemblies and for automatically detecting any errors they may contain. Current methods do not fully meet this need because they require a reference, only consider one of the many aspects of assembly quality or lack statistical justification, and none are designed to evaluate metagenome assemblies. RESULTS: In this article, we present an Assembly Likelihood Evaluation (ALE) framework that overcomes these limitations, systematically evaluating the accuracy of an assembly in a reference-independent manner using rigorous statistical methods. This framework is comprehensive, and integrates read quality, mate pair orientation and insert length (for paired-end reads), sequencing coverage, read alignment and k-mer frequency. ALE pinpoints synthetic errors in both single and metagenomic assemblies, including single-base errors, insertions/deletions, genome rearrangements and chimeric assemblies presented in metagenomes. At the genome level with real-world data, ALE identifies three large misassemblies from the Spirochaeta smaragdinae finished genome, which were all independently validated by Pacific Biosciences sequencing. At the single-base level with Illumina data, ALE recovers 215 of 222 (97%) single nucleotide variants in a training set from a GC-rich Rhodobacter sphaeroides genome. Using real Pacific Biosciences data, ALE identifies 12 of 12 synthetic errors in a Lambda Phage genome, surpassing even Pacific Biosciences' own variant caller, EviCons. In summary, the ALE framework provides a comprehensive, reference-independent and statistically rigorous measure of single genome and metagenome assembly accuracy, which can be used to identify misassemblies or to optimize the assembly process. AVAILABILITY: ALE is released as open source software under the UoI/NCSA license at http://www.alescore.org. It is implemented in C and Python.


Asunto(s)
Genómica/métodos , Metagenómica/métodos , Programas Informáticos , Teorema de Bayes , Escherichia coli/genética , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Estadísticos , Probabilidad
13.
Science ; 331(6016): 463-7, 2011 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-21273488

RESUMEN

The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.


Asunto(s)
Bacterias/genética , Biomasa , Bovinos/microbiología , Celulasas/genética , Celulosa/metabolismo , Metagenoma , Rumen/microbiología , Secuencia de Aminoácidos , Animales , Bacterias/enzimología , Bacterias/aislamiento & purificación , Bacterias/metabolismo , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Metabolismo de los Hidratos de Carbono , Celulasa/genética , Celulasa/metabolismo , Celulasas/química , Celulasas/metabolismo , Celulosa 1,4-beta-Celobiosidasa/genética , Celulosa 1,4-beta-Celobiosidasa/metabolismo , Genes Bacterianos , Genoma Bacteriano , Metagenómica/métodos , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Poaceae/microbiología , Rumen/metabolismo , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...