Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 60
Filter
Add more filters

Publication year range
1.
Nucleic Acids Res ; 52(16): e74, 2024 Sep 09.
Article in English | MEDLINE | ID: mdl-39011878

ABSTRACT

Genome search and/or classification typically involves finding the best-match database (reference) genomes and has become increasingly challenging due to the growing number of available database genomes and the fact that traditional methods do not scale well with large databases. By combining k-mer hashing-based probabilistic data structures (i.e. ProbMinHash, SuperMinHash, Densified MinHash and SetSketch) to estimate genomic distance, with a graph based nearest neighbor search algorithm (Hierarchical Navigable Small World Graphs, or HNSW), we created a new data structure and developed an associated computer program, GSearch, that is orders of magnitude faster than alternative tools while maintaining high accuracy and low memory usage. For example, GSearch can search 8000 query genomes against all available microbial or viral genomes for their best matches (n = ∼318 000 or ∼3 000 000, respectively) within a few minutes on a personal laptop, using ∼6 GB of memory (2.5 GB via SetSketch). Notably, GSearch has an O(log(N)) time complexity and will scale well with billions of genomes based on a database splitting strategy. Further, GSearch implements a three-step search strategy depending on the degree of novelty of the query genomes to maximize specificity and sensitivity. Therefore, GSearch solves a major bottleneck of microbiome studies that require genome search and/or classification.


Subject(s)
Algorithms , Genomics , Software , Genomics/methods , Genome, Viral , Databases, Genetic
2.
Phytopathology ; 113(8): 1387-1393, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37081724

ABSTRACT

Strains of Xanthomonas citri pv. malvacearum cause bacterial blight of cotton, a potentially serious threat to cotton production worldwide, including in sub-Saharan countries. Development of disease symptoms, such as water soaking, has been linked to the activity of a class of type 3 effectors, called transcription activator-like (TAL) effectors, which induce susceptibility genes in the host's cells. To gain further insight into the global diversity of the pathogen, to elucidate their repertoires of TAL effector genes, and to better understand the evolution of these genes in the cotton-pathogenic xanthomonads, we sequenced the genomes of three African strains of X. citri pv. malvacearum using nanopore technology. We show that the cotton-pathogenic pathovar of X. citri is a monophyletic lineage containing at least three distinct genetic subclades, which appear to be mirrored by their repertoires of TAL effectors. We observed an atypical level of TAL effector gene pseudogenization, which might be related to resistance genes that are deployed to control the disease. Our work thus contributes to a better understanding of the conservation and importance of TAL effectors in the interaction with the host plant, which can inform strategies for improving resistance against bacterial blight in cotton.

3.
Article in English | MEDLINE | ID: mdl-36125864

ABSTRACT

Thousands of new bacterial and archaeal species and higher-level taxa are discovered each year through the analysis of genomes and metagenomes. The Genome Taxonomy Database (GTDB) provides hierarchical sequence-based descriptions and classifications for new and as-yet-unnamed taxa. However, bacterial nomenclature, as currently configured, cannot keep up with the need for new well-formed names. Instead, microbiologists have been forced to use hard-to-remember alphanumeric placeholder labels. Here, we exploit an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB. These newly created names represent an important resource for the microbiology community, facilitating communication between bioinformaticians, microbiologists and taxonomists, while populating the emerging landscape of microbial taxonomic and functional discovery with accessible and memorable linguistic labels.


Subject(s)
Archaea , Fatty Acids , Archaea/genetics , Bacteria/genetics , Bacterial Typing Techniques , Base Composition , DNA, Bacterial/genetics , Fatty Acids/chemistry , Phylogeny , RNA, Ribosomal, 16S/genetics , Sequence Analysis, DNA
4.
Nature ; 536(7615): 179-83, 2016 08 11.
Article in English | MEDLINE | ID: mdl-27487207

ABSTRACT

Bacteria of the SAR11 clade constitute up to one half of all microbial cells in the oxygen-rich surface ocean. SAR11 bacteria are also abundant in oxygen minimum zones (OMZs), where oxygen falls below detection and anaerobic microbes have vital roles in converting bioavailable nitrogen to N2 gas. Anaerobic metabolism has not yet been observed in SAR11, and it remains unknown how these bacteria contribute to OMZ biogeochemical cycling. Here, genomic analysis of single cells from the world's largest OMZ revealed previously uncharacterized SAR11 lineages with adaptations for life without oxygen, including genes for respiratory nitrate reductases (Nar). SAR11 nar genes were experimentally verified to encode proteins catalysing the nitrite-producing first step of denitrification and constituted ~40% of OMZ nar transcripts, with transcription peaking in the anoxic zone of maximum nitrate reduction activity. These results link SAR11 to pathways of ocean nitrogen loss, redefining the ecological niche of Earth's most abundant organismal group.


Subject(s)
Alphaproteobacteria/classification , Alphaproteobacteria/metabolism , Aquatic Organisms/metabolism , Nitrogen/analysis , Oceans and Seas , Oxygen/analysis , Seawater/chemistry , Adaptation, Physiological/genetics , Alphaproteobacteria/genetics , Alphaproteobacteria/isolation & purification , Anaerobiosis/genetics , Aquatic Organisms/enzymology , Aquatic Organisms/genetics , Aquatic Organisms/isolation & purification , Denitrification , Gene Expression Profiling , Genes, Bacterial , Genome, Bacterial/genetics , Nitrate Reductases/genetics , Nitrate Reductases/metabolism , Nitrates/metabolism , Nitrites/metabolism , Nitrogen/metabolism , Oxidation-Reduction , Oxygen/metabolism , Phylogeny , Single-Cell Analysis , Transcription, Genetic
5.
Appl Environ Microbiol ; 87(6)2021 02 26.
Article in English | MEDLINE | ID: mdl-33452027

ABSTRACT

The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non-Enterobacteriaceae) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning.IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets.


Subject(s)
Escherichia coli/genetics , Feces/microbiology , Genome, Bacterial , Escherichia coli/isolation & purification , Humans , Metagenome
6.
Environ Microbiol ; 22(8): 3394-3412, 2020 08.
Article in English | MEDLINE | ID: mdl-32495495

ABSTRACT

Recent advances in sequencing technology and bioinformatic pipelines have allowed unprecedented access to the genomes of yet-uncultivated microorganisms from diverse environments. However, the catalogue of freshwater genomes remains limited, and most genome recovery attempts in freshwater ecosystems have only targeted specific taxa. Here, we present a genome recovery pipeline incorporating iterative subtractive binning, and apply it to a time series of 100 metagenomic datasets from seven connected lakes and estuaries along the Chattahoochee River (Southeastern USA). Our set of metagenome-assembled genomes (MAGs) represents >400 yet-unnamed genomospecies, substantially increasing the number of high-quality MAGs from freshwater lakes. We propose names for two novel species: 'Candidatus Elulimicrobium humile' ('Ca. Elulimicrobiota', 'Patescibacteria') and 'Candidatus Aquidulcis frankliniae' ('Chloroflexi'). Collectively, our MAGs represented about half of the total microbial community at any sampling point. To evaluate the prevalence of these genomospecies in the chronoseries, we introduce methodologies to estimate relative abundance and habitat preference that control for uneven genome quality and sample representation. We demonstrate high degrees of habitat-specialization and endemicity for most genomospecies in the Chattahoochee lakes. Wider ecological ranges characterized smaller genomes with higher coding densities, indicating an overall advantage of smaller, more compact genomes for cosmopolitan distributions.


Subject(s)
Chloroflexi/classification , Chloroflexi/isolation & purification , Genome, Bacterial/genetics , Lakes/microbiology , Chloroflexi/genetics , Databases, Genetic , Metagenome/genetics , Metagenomics , Microbiota/genetics
7.
Environ Microbiol ; 22(6): 2094-2106, 2020 06.
Article in English | MEDLINE | ID: mdl-32114693

ABSTRACT

Microbial communities ultimately control the fate of petroleum hydrocarbons (PHCs) that enter the natural environment, but the interactions of microbes with PHCs and the environment are highly complex and poorly understood. Genome-resolved metagenomics can help unravel these complex interactions. However, the lack of a comprehensive database that integrates existing genomic/metagenomic data from oil environments with physicochemical parameters known to regulate the fate of PHCs currently limits data analysis and interpretations. Here, we curated a comprehensive, searchable database that documents microbial populations in natural oil ecosystems and oil spills, along with available underlying physicochemical data, geocoded via geographic information system to reveal their geographic distribution patterns. Analysis of the ~2000 metagenome-assembled genomes (MAGs) available in the database revealed strong ecological niche specialization within habitats. Over 95% of the recovered MAGs represented novel taxa underscoring the limited representation of cultured organisms from oil-contaminated and oil reservoir ecosystems. The majority of MAGs linked to oil-contaminated ecosystems were detectable in non-oiled samples from the Gulf of Mexico but not in comparable samples from elsewhere, indicating that the Gulf is primed for oil biodegradation. The repository should facilitate future work toward a predictive understanding of the microbial taxa and their activities that control the fate of oil spills.


Subject(s)
Biodegradation, Environmental , Databases, Genetic , Oil and Gas Fields/microbiology , Petroleum Pollution/analysis , Petroleum/microbiology , Gulf of Mexico , Hydrocarbons/metabolism , Metagenome/genetics , Metagenomics , Microbiota/genetics , Petroleum/metabolism
8.
Appl Environ Microbiol ; 86(6)2020 03 02.
Article in English | MEDLINE | ID: mdl-31924621

ABSTRACT

Little is known about the public health risks associated with natural creek sediments that are affected by runoff and fecal pollution from agricultural and livestock practices. For instance, the persistence of foodborne pathogens such as Shiga toxin-producing Escherichia coli (STEC) originating from these practices remains poorly quantified. Towards closing these knowledge gaps, the water-sediment interface of two creeks in the Salinas River Valley of California was sampled over a 9-month period using metagenomics and traditional culture-based tests for STEC. Our results revealed that these sediment communities are extremely diverse and have functional and taxonomic diversity comparable to that observed in soils. With our sequencing effort (∼4 Gbp per library), we were unable to detect any pathogenic E. coli in the metagenomes of 11 samples that had tested positive using culture-based methods, apparently due to relatively low abundance. Furthermore, there were no significant differences in the abundance of human- or cow-specific gut microbiome sequences in the downstream impacted sites compared to that in upstream more pristine (control) sites, indicating natural dilution of anthropogenic inputs. Notably, the high number of metagenomic reads carrying antibiotic resistance genes (ARGs) found in all samples was significantly higher than ARG reads in other available freshwater and soil metagenomes, suggesting that these communities may be natural reservoirs of ARGs. The work presented here should serve as a guide for sampling volumes, amount of sequencing to apply, and what bioinformatics analyses to perform when using metagenomics for public health risk studies of environmental samples such as sediments.IMPORTANCE Current agricultural and livestock practices contribute to fecal contamination in the environment and the spread of food- and waterborne disease and antibiotic resistance genes (ARGs). Traditionally, the level of pollution and risk to public health are assessed by culture-based tests for the intestinal bacterium Escherichia coli However, the accuracy of these traditional methods (e.g., low accuracy in quantification, and false-positive signal when PCR based) and their suitability for sediments remain unclear. We collected sediments for a time series metagenomics study from one of the most highly productive agricultural regions in the United States in order to assess how agricultural runoff affects the native microbial communities and if the presence of Shiga toxin-producing Escherichia coli (STEC) in sediment samples can be detected directly by sequencing. Our study provided important information on the potential for using metagenomics as a tool for assessment of public health risk in natural environments.


Subject(s)
Geologic Sediments/microbiology , Metagenomics , Public Health/methods , Risk Assessment/methods , Shiga-Toxigenic Escherichia coli/isolation & purification , Agriculture , Animal Husbandry , Animals , California , Livestock , Rivers/microbiology , Water Pollution
9.
Nucleic Acids Res ; 46(W1): W282-W288, 2018 07 02.
Article in English | MEDLINE | ID: mdl-29905870

ABSTRACT

The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.


Subject(s)
Genomics , Internet , RNA, Ribosomal, 16S/genetics , Software , Classification , Genetic Variation/genetics , Genome, Archaeal/genetics , Genome, Bacterial/genetics , Phylogeny
10.
Nucleic Acids Res ; 45(3): e14, 2017 02 17.
Article in English | MEDLINE | ID: mdl-28180325

ABSTRACT

Functional annotation of metagenomic and metatranscriptomic data sets relies on similarity searches based on e-value thresholds resulting in an unknown number of false positive and negative matches. To overcome these limitations, we introduce ROCker, aimed at identifying position-specific, most-discriminant thresholds in sliding windows along the sequence of a target protein, accounting for non-discriminative domains shared by unrelated proteins. ROCker employs the receiver operating characteristic (ROC) curve to minimize false discovery rate (FDR) and calculate the best thresholds based on how simulated shotgun metagenomic reads of known composition map onto well-curated reference protein sequences and thus, differs from HMM profiles and related methods. We showcase ROCker using ammonia monooxygenase (amoA) and nitrous oxide reductase (nosZ) genes, mediating oxidation of ammonia and the reduction of the potent greenhouse gas, N2O, to inert N2, respectively. ROCker typically showed 60-fold lower FDR when compared to the common practice of using fixed e-values. Previously uncounted 'atypical' nosZ genes were found to be two times more abundant, on average, than their typical counterparts in most soil metagenomes and the abundance of bacterial amoA was quantified against the highly-related particulate methane monooxygenase (pmoA). Therefore, ROCker can reliably detect and quantify target genes in short-read metagenomes.


Subject(s)
Metagenomics/statistics & numerical data , Aquatic Organisms/genetics , Computational Biology/methods , Databases, Genetic/statistics & numerical data , Ecosystem , Microbial Consortia/genetics , Phylogeny , ROC Curve , Soil Microbiology
11.
Appl Environ Microbiol ; 84(6)2018 03 15.
Article in English | MEDLINE | ID: mdl-29305502

ABSTRACT

The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. However, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, however, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ∼12%, on average, compared to the ANI-based approach (∼14% underestimation when using the 97% identity threshold). More importantly, the degree of underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas, Burkholderia, Escherichia, Campylobacter, and Citrobacter These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary.IMPORTANCE Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Therefore, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity.


Subject(s)
Bacteria/classification , Genome, Bacterial , Microbiota , Sequence Analysis, RNA/methods , Bacteria/genetics , RNA, Ribosomal, 16S/analysis
13.
Appl Environ Microbiol ; 83(8)2017 04 15.
Article in English | MEDLINE | ID: mdl-28258138

ABSTRACT

A single liter of water contains hundreds, if not thousands, of bacterial and archaeal species, each of which typically makes up a very small fraction of the total microbial community (<0.1%), the so-called "rare biosphere." How often, and via what mechanisms, e.g., clonal amplification versus horizontal gene transfer, the rare taxa and genes contribute to microbial community response to environmental perturbations represent important unanswered questions toward better understanding the value and modeling of microbial diversity. We tested whether rare species frequently responded to changing environmental conditions by establishing 20-liter planktonic mesocosms with water from Lake Lanier (Georgia, USA) and perturbing them with organic compounds that are rarely detected in the lake, including 2,4-dichlorophenoxyacetic acid (2,4-D), 4-nitrophenol (4-NP), and caffeine. The populations of the degraders of these compounds were initially below the detection limit of quantitative PCR (qPCR) or metagenomic sequencing methods, but they increased substantially in abundance after perturbation. Sequencing of several degraders (isolates) and time-series metagenomic data sets revealed distinct cooccurring alleles of degradation genes, frequently carried on transmissible plasmids, especially for the 2,4-D mesocosms, and distinct species dominating the post-enrichment microbial communities from each replicated mesocosm. This diversity of species and genes also underlies distinct degradation profiles among replicated mesocosms. Collectively, these results supported the hypothesis that the rare biosphere can serve as a genetic reservoir, which can be frequently missed by metagenomics but enables community response to changing environmental conditions caused by organic pollutants, and they provided insights into the size of the pool of rare genes and species.IMPORTANCE A single liter of water or gram of soil contains hundreds of low-abundance bacterial and archaeal species, the so called rare biosphere. The value of this astonishing biodiversity for ecosystem functioning remains poorly understood, primarily due to the fact that microbial community analysis frequently focuses on abundant organisms. Using a combination of culture-dependent and culture-independent (metagenomics) techniques, we showed that rare taxa and genes commonly contribute to the microbial community response to organic pollutants. Our findings should have implications for future studies that aim to study the role of rare species in environmental processes, including environmental bioremediation efforts of oil spills or other contaminants.


Subject(s)
Biodiversity , Ecosystem , Fresh Water/microbiology , Microbial Consortia/physiology , Water Pollutants, Chemical/metabolism , Water Pollutants, Chemical/pharmacology , 2,4-Dichlorophenoxyacetic Acid/metabolism , 2,4-Dichlorophenoxyacetic Acid/pharmacology , Archaea/classification , Archaea/genetics , Archaea/metabolism , Bacteria/classification , Bacteria/genetics , Bacteria/metabolism , Biodegradation, Environmental , Caffeine/metabolism , Caffeine/pharmacology , Georgia , Lakes/microbiology , Metagenomics , Microbial Consortia/drug effects , Microbial Consortia/genetics , Nitrophenols/metabolism , Nitrophenols/pharmacology , Phylogeny , RNA, Ribosomal, 16S , Real-Time Polymerase Chain Reaction , Water Pollutants, Chemical/chemistry
14.
Appl Environ Microbiol ; 82(9): 2872-2883, 2016 May.
Article in English | MEDLINE | ID: mdl-26969701

ABSTRACT

Although the source of drinking water (DW) used in hospitals is commonly disinfected, biofilms forming on water pipelines are a refuge for bacteria, including possible pathogens that survive different disinfection strategies. These biofilm communities are only beginning to be explored by culture-independent techniques that circumvent the limitations of conventional monitoring efforts. Hence, theories regarding the frequency of opportunistic pathogens in DW biofilms and how biofilm members withstand high doses of disinfectants and/or chlorine residuals in the water supply remain speculative. The aim of this study was to characterize the composition of microbial communities growing on five hospital shower hoses using both 16S rRNA gene sequencing of bacterial isolates and whole-genome shotgun metagenome sequencing. The resulting data revealed a Mycobacterium-like population, closely related to Mycobacterium rhodesiae and Mycobacterium tusciae, to be the predominant taxon in all five samples, and its nearly complete draft genome sequence was recovered. In contrast, the fraction recovered by culture was mostly affiliated with Proteobacteria, including members of the genera Sphingomonas, Blastomonas, and Porphyrobacter.The biofilm community harbored genes related to disinfectant tolerance (2.34% of the total annotated proteins) and a lower abundance of virulence determinants related to colonization and evasion of the host immune system. Additionally, genes potentially conferring resistance to ß-lactam, aminoglycoside, amphenicol, and quinolone antibiotics were detected. Collectively, our results underscore the need to understand the microbiome of DW biofilms using metagenomic approaches. This information might lead to more robust management practices that minimize the risks associated with exposure to opportunistic pathogens in hospitals.


Subject(s)
Bacterial Physiological Phenomena , Biofilms/growth & development , Cross Infection/genetics , Cross Infection/microbiology , Hospitals , Water Microbiology , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Bacteria/pathogenicity , Chlorine , Culture Techniques , DNA, Bacterial/analysis , Disinfectants/pharmacology , Disinfection , Drug Resistance, Bacterial , Genome, Bacterial , Metagenome , Microbiota/genetics , Mycobacterium/physiology , Ohio , Phylogeny , Proteobacteria/physiology , RNA, Ribosomal, 16S/genetics , Sphingomonadaceae/physiology , Water Supply
15.
Nucleic Acids Res ; 42(8): e73, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24589583

ABSTRACT

Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100-2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ∼10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.


Subject(s)
Genomics/methods , Metagenomics/methods , Phylogeny , Algorithms , Classification/methods , Genes , Humans , Microbiota , Software
16.
Proc Natl Acad Sci U S A ; 110(7): 2575-80, 2013 Feb 12.
Article in English | MEDLINE | ID: mdl-23359712

ABSTRACT

The composition and prevalence of microorganisms in the middle-to-upper troposphere (8-15 km altitude) and their role in aerosol-cloud-precipitation interactions represent important, unresolved questions for biological and atmospheric science. In particular, airborne microorganisms above the oceans remain essentially uncharacterized, as most work to date is restricted to samples taken near the Earth's surface. Here we report on the microbiome of low- and high-altitude air masses sampled onboard the National Aeronautics and Space Administration DC-8 platform during the 2010 Genesis and Rapid Intensification Processes campaign in the Caribbean Sea. The samples were collected in cloudy and cloud-free air masses before, during, and after two major tropical hurricanes, Earl and Karl. Quantitative PCR and microscopy revealed that viable bacterial cells represented on average around 20% of the total particles in the 0.25- to 1-µm diameter range and were at least an order of magnitude more abundant than fungal cells, suggesting that bacteria represent an important and underestimated fraction of micrometer-sized atmospheric aerosols. The samples from the two hurricanes were characterized by significantly different bacterial communities, revealing that hurricanes aerosolize a large amount of new cells. Nonetheless, 17 bacterial taxa, including taxa that are known to use C1-C4 carbon compounds present in the atmosphere, were found in all samples, indicating that these organisms possess traits that allow survival in the troposphere. The findings presented here suggest that the microbiome is a dynamic and underappreciated aspect of the upper troposphere with potentially important impacts on the hydrological cycle, clouds, and climate.


Subject(s)
Air Microbiology , Atmosphere , Biodiversity , Cyclonic Storms , Metagenome/genetics , Altitude , Analysis of Variance , Caribbean Region , Phylogeography , Sequence Analysis, DNA , Species Specificity
17.
Bioinformatics ; 30(5): 629-35, 2014 Mar 01.
Article in English | MEDLINE | ID: mdl-24123672

ABSTRACT

MOTIVATION: Determining the fraction of the diversity within a microbial community sampled and the amount of sequencing required to cover the total diversity represent challenging issues for metagenomics studies. Owing to these limitations, central ecological questions with respect to the global distribution of microbes and the functional diversity of their communities cannot be robustly assessed. RESULTS: We introduce Nonpareil, a method to estimate and project coverage in metagenomes. Nonpareil does not rely on high-quality assemblies, operational taxonomic unit calling or comprehensive reference databases; thus, it is broadly applicable to metagenomic studies. Application of Nonpareil on available metagenomic datasets provided estimates on the relative complexity of soil, freshwater and human microbiome communities, and suggested that ∼200 Gb of sequencing data are required for 95% abundance-weighted average coverage of the soil communities analyzed. AVAILABILITY AND IMPLEMENTATION: Nonpareil is available at https://github.com/lmrodriguezr/nonpareil/ under the Artistic License 2.0.


Subject(s)
Metagenomics/methods , Algorithms , Metagenome , Microbiota , Soil Microbiology
18.
Appl Environ Microbiol ; 80(5): 1777-86, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24375144

ABSTRACT

Soil microbial communities are extremely complex, being composed of thousands of low-abundance species (<0.1% of total). How such complex communities respond to natural or human-induced fluctuations, including major perturbations such as global climate change, remains poorly understood, severely limiting our predictive ability for soil ecosystem functioning and resilience. In this study, we compared 12 whole-community shotgun metagenomic data sets from a grassland soil in the Midwestern United States, half representing soil that had undergone infrared warming by 2°C for 10 years, which simulated the effects of climate change, and the other half representing the adjacent soil that received no warming and thus, served as controls. Our analyses revealed that the heated communities showed significant shifts in composition and predicted metabolism, and these shifts were community wide as opposed to being attributable to a few taxa. Key metabolic pathways related to carbon turnover, such as cellulose degradation (∼13%) and CO2 production (∼10%), and to nitrogen cycling, including denitrification (∼12%), were enriched under warming, which was consistent with independent physicochemical measurements. These community shifts were interlinked, in part, with higher primary productivity of the aboveground plant communities stimulated by warming, revealing that most of the additional, plant-derived soil carbon was likely respired by microbial activity. Warming also enriched for a higher abundance of sporulation genes and genomes with higher G+C content. Collectively, our results indicate that microbial communities of temperate grassland soils play important roles in mediating feedback responses to climate change and advance the understanding of the molecular mechanisms of community adaptation to environmental perturbations.


Subject(s)
Biota/radiation effects , Global Warming , Metagenomics , Soil Microbiology , Carbon/metabolism , Humans , Metabolic Networks and Pathways , Midwestern United States , Nitrogen/metabolism
19.
Syst Appl Microbiol ; 47(6): 126554, 2024 Sep 19.
Article in English | MEDLINE | ID: mdl-39305564

ABSTRACT

Stable taxon names for Bacteria and Archaea are essential for capturing and documenting prokaryotic diversity. They are also crucial for scientific communication, effective accumulation of biological data related to the taxon names and for developing a comprehensive understanding of prokaryotic evolution. However, after more than a hundred years, taxonomists have succeeded in valid publication of only around 30 000 species names, based mostly on pure cultures under the International Code of Nomenclature of Prokaryotes (ICNP), out of the millions estimated to reside in the biosphere. The vast majority of prokaryotic species have not been cultured and are becoming increasingly known to us via culture-independent sequence-based approaches. Until recently, such taxa could only be addressed nomenclaturally via provisional names such as Candidatus or alphanumeric identifiers. Here, we present options and considerations to facilitate validation of names for these taxa using the recently established Code of Nomenclature of Prokaryotes Described from Sequence Data (SeqCode). Community engagement and participation of relevant taxon specialists are critical and encouraged for the success of endeavours to formally name the uncultured majority.

20.
Syst Appl Microbiol ; 47(2-3): 126498, 2024 May.
Article in English | MEDLINE | ID: mdl-38442686

ABSTRACT

Codes of nomenclature that provide well-regulated and stable frameworks for the naming of taxa are a fundamental underpinning of biological research. These Codes themselves require systems that govern their administration, interpretation and emendment. Here we review the provisions that have been made for the governance of the recently introduced Code of Nomenclature of Prokaryotes Described from Sequence Data (SeqCode), which provides a nomenclatural framework for the valid publication of names of Archaea and Bacteria using isolate genome, metagenome-assembled genome or single-amplified genome sequences as type material. The administrative structures supporting the SeqCode are designed to be open and inclusive. Direction is provided by the SeqCode Community, which we encourage those with an interest in prokaryotic systematics to join.


Subject(s)
Archaea , Bacteria , Community Participation , Terminology as Topic , Archaea/classification , Archaea/genetics , Bacteria/genetics , Bacteria/classification , Classification/methods
SELECTION OF CITATIONS
SEARCH DETAIL