RESUMO
All species shed DNA during life or in death, providing an opportunity to monitor biodiversity via environmental DNA (eDNA). In recent years, combining eDNA, high-throughput sequencing technologies, bioinformatics, and increasingly complete sequence databases has promised a non-invasive and non-destructive environmental monitoring tool. Modern agricultural systems are often large monocultures and so are highly vulnerable to disease outbreaks. Pest and pathogen monitoring in agricultural ecosystems is key for efficient and early disease prevention, lower pesticide use, and better food security. Although the air is rich in biodiversity, it has the lowest DNA concentration of all environmental media and yet is the route for windborne spread of many damaging crop pathogens. Our work suggests that ecosystems can be monitored efficiently using airborne nucleic acid information. Here, we show that the airborne DNA of microbes can be recovered, shotgun sequenced, and taxonomically classified, including down to the species level. We show that by monitoring a field growing key crops we can identify the presence of agriculturally significant pathogens and quantify their changing abundance over a period of 1.5 months, often correlating with weather variables. We add to the evidence that aerial eDNA can be used as a source for biomonitoring in terrestrial ecosystems, specifically highlighting agriculturally relevant species and how pathogen levels correlate with weather conditions. Our ability to detect dynamically changing levels of species and strains highlights the value of airborne eDNA in agriculture, monitoring biodiversity changes, and tracking taxa of interest.
Assuntos
Agricultura , Biodiversidade , Metagenômica , Metagenômica/métodos , DNA Ambiental/análise , DNA Ambiental/genética , Microbiologia do Ar , Ecossistema , Monitoramento Ambiental/métodos , Metagenoma , Produtos Agrícolas/microbiologia , Bactérias/genética , Bactérias/classificação , Bactérias/isolamento & purificaçãoRESUMO
This article presents metagenome-assembled genomes (MAGs) for both eukaryotic and prokaryotic organisms originating from the Arctic and Atlantic oceans, along with gene prediction and functional annotation for MAGs from both domains. Eleven samples from the chlorophyll-a maximum layer of the surface ocean were collected during two cruises in 2012; six from the Arctic in June-July on ARK-XXVII/1 (PS80), and five from the Atlantic in November on ANT-XXIX/1 (PS81). Sequencing and assembly was carried out by the Joint Genome Institute (JGI), who provide annotation of the assembled sequences, and 122 MAGs for prokaryotic organisms. A subsequent binning process identified 21 MAGs for eukaryotic organisms, mostly identified as Mamiellophyceae or Bacillariophyceae. The data for each MAG includes sequences in FASTA format, and tables of functional annotation of genes. For eukaryotic MAGs, transcript and protein sequences for predicted genes are available. A spreadsheet is provided summarising quality measures and taxonomic classifications for each MAG. These data provide draft genomes for uncultured marine microbes, including some of the first MAGs for polar eukaryotes, and can provide reference genetic data for these environments, or used in genomics-based comparison between environments.
RESUMO
MOTIVATION: The assembly of contiguous sequence from metagenomic samples presents a particular challenge, due to the presence of multiple species, often closely related, at varying levels of abundance. Capturing diversity within species, for example, viral haplotypes, or bacterial strain-level diversity, is even more challenging. RESULTS: We present MetaCortex, a metagenome assembler that captures intra-species diversity by searching for signatures of local variation along assembled sequences in the underlying assembly graph and outputting these sequences in sequence graph format. We show that MetaCortex produces accurate assemblies with higher genome coverage and contiguity than other popular metagenomic assemblers on mock viral communities with high levels of strain-level diversity and on simulated communities containing simulated strains. AVAILABILITY AND IMPLEMENTATION: Source code is freely available to download from https://github.com/SR-Martin/metacortex, is implemented in C and supported on MacOS and Linux. The version used for the results presented in this article is available at doi.org/10.5281/zenodo.7273627. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Metagenoma , Metagenômica , Haplótipos , SoftwareRESUMO
Anthropogenic activities are triggering global changes in the environment, causing entire communities of plants, pollinators and their interactions to restructure, and ultimately leading to species declines. To understand the mechanisms behind community shifts and declines, as well as monitoring and managing impacts, a global effort must be made to characterize plant-pollinator communities in detail, across different habitat types, latitudes, elevations, and levels and types of disturbances. Generating data of this scale will only be feasible with rapid, high-throughput methods. Pollen DNA metabarcoding provides advantages in throughput, efficiency and taxonomic resolution over traditional methods, such as microscopic pollen identification and visual observation of plant-pollinator interactions. This makes it ideal for understanding complex ecological networks and their responses to change. Pollen DNA metabarcoding is currently being applied to assess plant-pollinator interactions, survey ecosystem change and model the spatiotemporal distribution of allergenic pollen. Where samples are available from past collections, pollen DNA metabarcoding has been used to compare contemporary and past ecosystems. New avenues of research are possible with the expansion of pollen DNA metabarcoding to intraspecific identification, analysis of DNA in ancient pollen samples, and increased use of museum and herbarium specimens. Ongoing developments in sequencing technologies can accelerate progress towards these goals. Global ecological change is happening rapidly, and we anticipate that high-throughput methods such as pollen DNA metabarcoding are critical for understanding the evolutionary and ecological processes that support biodiversity, and predicting and responding to the impacts of change.
Assuntos
Código de Barras de DNA Taxonômico , Ecossistema , Código de Barras de DNA Taxonômico/métodos , Pólen/genética , Plantas/genética , DNA , Polinização/genéticaRESUMO
BACKGROUND: Phytoplankton communities significantly contribute to global biogeochemical cycles of elements and underpin marine food webs. Although their uncultured genomic diversity has been estimated by planetary-scale metagenome sequencing and subsequent reconstruction of metagenome-assembled genomes (MAGs), this approach has yet to be applied for complex phytoplankton microbiomes from polar and non-polar oceans consisting of microbial eukaryotes and their associated prokaryotes. RESULTS: Here, we have assembled MAGs from chlorophyll a maximum layers in the surface of the Arctic and Atlantic Oceans enriched for species associations (microbiomes) with a focus on pico- and nanophytoplankton and their associated heterotrophic prokaryotes. From 679 Gbp and estimated 50 million genes in total, we recovered 143 MAGs of medium to high quality. Although there was a strict demarcation between Arctic and Atlantic MAGs, adjacent sampling stations in each ocean had 51-88% MAGs in common with most species associations between Prasinophytes and Proteobacteria. Phylogenetic placement revealed eukaryotic MAGs to be more diverse in the Arctic whereas prokaryotic MAGs were more diverse in the Atlantic Ocean. Approximately 70% of protein families were shared between Arctic and Atlantic MAGs for both prokaryotes and eukaryotes. However, eukaryotic MAGs had more protein families unique to the Arctic whereas prokaryotic MAGs had more families unique to the Atlantic. CONCLUSION: Our study provides a genomic context to complex phytoplankton microbiomes to reveal that their community structure was likely driven by significant differences in environmental conditions between the polar Arctic and warm surface waters of the tropical and subtropical Atlantic Ocean. Video Abstract.
Assuntos
Metagenoma , Microbiota , Oceano Atlântico , Clorofila A , Eucariotos/genética , Metagenoma/genética , Microbiota/genética , Filogenia , Fitoplâncton/genéticaRESUMO
Adaptive sampling is a method of software-controlled enrichment unique to nanopore sequencing platforms. To test its potential for enrichment of rarer species within metagenomic samples, we create a synthetic mock community and construct sequencing libraries with a range of mean read lengths. Enrichment is up to 13.87-fold for the least abundant species in the longest read length library; factoring in reduced yields from rejecting molecules the calculated efficiency raises this to 4.93-fold. Finally, we introduce a mathematical model of enrichment based on molecule length and relative abundance, whose predictions correlate strongly with mock and complex real-world microbial communities.
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma , Metagenômica , Análise de Sequência de DNARESUMO
Eukaryotic phytoplankton are responsible for at least 20% of annual global carbon fixation. Their diversity and activity are shaped by interactions with prokaryotes as part of complex microbiomes. Although differences in their local species diversity have been estimated, we still have a limited understanding of environmental conditions responsible for compositional differences between local species communities on a large scale from pole to pole. Here, we show, based on pole-to-pole phytoplankton metatranscriptomes and microbial rDNA sequencing, that environmental differences between polar and non-polar upper oceans most strongly impact the large-scale spatial pattern of biodiversity and gene activity in algal microbiomes. The geographic differentiation of co-occurring microbes in algal microbiomes can be well explained by the latitudinal temperature gradient and associated break points in their beta diversity, with an average breakpoint at 14 °C ± 4.3, separating cold and warm upper oceans. As global warming impacts upper ocean temperatures, we project that break points of beta diversity move markedly pole-wards. Hence, abrupt regime shifts in algal microbiomes could be caused by anthropogenic climate change.
Assuntos
Variação Genética , Microalgas/genética , Microbiota/genética , Fitoplâncton/genética , Transcriptoma/genética , Regiões Antárticas , Regiões Árticas , Biodiversidade , Ciclo do Carbono , Mudança Climática , Ontologia Genética , Geografia , Aquecimento Global , Microalgas/classificação , Microalgas/crescimento & desenvolvimento , Oceanos e Mares , Fitoplâncton/classificação , Fitoplâncton/crescimento & desenvolvimento , RNA Ribossômico 16S/genética , RNA Ribossômico 18S/genética , Análise de Sequência de DNA/métodos , Especificidade da Espécie , TemperaturaRESUMO
BACKGROUND: The analysis of long reads or the assessment of assembly or target capture data often necessitates running alignments against reference genomes or gene sets. The aligner outputs are often parsed automatically by scripts, but many kinds of analysis can benefit from the understanding that can follow human inspection of individual alignments. Additionally, diagrams are a useful means of communicating assembly results to others. RESULTS: We developed Alvis, a simple command line tool that can generate visualisations for a number of common alignment analysis tasks. Alvis is a fast and portable tool that accepts input in a variety of alignment formats and will output production ready vector images. Additionally, Alvis will highlight potentially chimeric reads or contigs, a common source of misassemblies. CONCLUSION: Alvis diagrams facilitate improved understanding of assembly quality, enable read coverage to be visualised and potential errors to be identified. Additionally, we found that splitting chimeric reads using the output provided by Alvis can improve the contiguity of assemblies, while maintaining correctness.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Humanos , Visualização de Dados , GenomaRESUMO
BACKGROUND: Riverine ecosystems are biogeochemical powerhouses driven largely by microbial communities that inhabit water columns and sediments. Because rivers are used extensively for anthropogenic purposes (drinking water, recreation, agriculture, and industry), it is essential to understand how these activities affect the composition of river microbial consortia. Recent studies have shown that river metagenomes vary considerably, suggesting that microbial community data should be included in broad-scale river ecosystem models. But such ecogenomic studies have not been applied on a broad "aquascape" scale, and few if any have applied the newest nanopore technology. RESULTS: We investigated the metagenomes of 11 rivers across 3 continents using MinION nanopore sequencing, a portable platform that could be useful for future global river monitoring. Up to 10 Gb of data per run were generated with average read lengths of 3.4 kb. Diversity and diagnosis of river function potential was accomplished with 0.5-1.0 â 106 long reads. Our observations for 7 of the 11 rivers conformed to other river-omic findings, and we exposed previously unrecognized microbial biodiversity in the other 4 rivers. CONCLUSIONS: Deeper understanding that emerged is that river microbial consortia and the ecological functions they fulfil did not align with geographic location but instead implicated ecological responses of microbes to urban and other anthropogenic effects, and that changes in taxa manifested over a very short geographic space.
Assuntos
Metagenoma , Metagenômica/métodos , Consórcios Microbianos , Microbiota , Plâncton/genética , Biodiversidade , Sequenciamento por Nanoporos , Rios/microbiologia , Microbiologia da ÁguaRESUMO
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Humanos , Microbiota/genética , Plantas/genéticaRESUMO
The MinION sequencing platform offers near real-time analysis of DNA sequence; this makes the tool attractive for deployment in fieldwork or clinical settings. We used the MinION platform coupled to the NanoOK RT software package to perform shotgun metagenomic sequencing and profile mock communities and faecal samples from healthy and ill preterm infants. Using Nanopore data, we reliably classified a 20-species mock community and captured the diversity of the immature gut microbiota over time and in response to interventions such as probiotic supplementation, antibiotic treatment or episodes of suspected sepsis. We also performed rapid real-time runs to assess gut-associated microbial communities in critically ill and healthy infants, facilitated by NanoOK RT software package, which analysed sequences as they were generated. Our pipeline reliably identified pathogenic bacteria (that is, Klebsiella pneumoniae and Enterobacter cloacae) and their corresponding antimicrobial resistance gene profiles within as little as 1 h of sequencing. Results were confirmed using pathogen isolation, whole-genome sequencing and antibiotic susceptibility testing, as well as mock communities and clinical samples with known antimicrobial resistance genes. Our results demonstrate that MinION (including cost-effective Flongle flow cells) with NanoOK RT can process metagenomic samples to a rich dataset in < 5 h, which creates a platform for future studies aimed at developing these tools and approaches in clinical settings with a focus on providing tailored patient antimicrobial treatment options.
Assuntos
Farmacorresistência Bacteriana/efeitos dos fármacos , Farmacorresistência Bacteriana/genética , Recém-Nascido Prematuro , Microbiota/efeitos dos fármacos , Microbiota/genética , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/genética , Biologia Computacional , DNA Bacteriano/análise , DNA Bacteriano/genética , Enterobacter cloacae/efeitos dos fármacos , Enterobacter cloacae/genética , Enterobacter cloacae/isolamento & purificação , Microbioma Gastrointestinal/efeitos dos fármacos , Microbioma Gastrointestinal/genética , Klebsiella pneumoniae/efeitos dos fármacos , Klebsiella pneumoniae/genética , Klebsiella pneumoniae/isolamento & purificação , Metagenoma , Testes de Sensibilidade Microbiana , Nanoporos , Análise de Sequência de DNA , Software , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Human tissue is increasingly being whole genome sequenced as we transition into an era of genomic medicine. With this arises the potential to detect sequences originating from microorganisms, including pathogens amid the plethora of human sequencing reads. In cancer research, the tumorigenic ability of pathogens is being recognized, for example, Helicobacter pylori and human papillomavirus in the cases of gastric non-cardia and cervical carcinomas, respectively. As of yet, no benchmark has been carried out on the performance of computational approaches for bacterial and viral detection within host-dominated sequence data. RESULTS: We present the results of benchmarking over 70 distinct combinations of tools and parameters on 100 simulated cancer datasets spiked with realistic proportions of bacteria. mOTUs2 and Kraken are the highest performing individual tools achieving median genus-level F1 scores of 0.90 and 0.91, respectively. mOTUs2 demonstrates a high performance in estimating bacterial proportions. Employing Kraken on unassembled sequencing reads produces a good but variable performance depending on post-classification filtering parameters. These approaches are investigated on a selection of cervical and gastric cancer whole genome sequences where Alphapapillomavirus and Helicobacter are detected in addition to a variety of other interesting genera. CONCLUSIONS: We provide the top-performing pipelines from this benchmark in a unifying tool called SEPATH, which is amenable to high throughput sequencing studies across a range of high-performance computing clusters. SEPATH provides a benchmarked and convenient approach to detect pathogens in tissue sequence data helping to determine the relationship between metagenomics and disease.
Assuntos
Metagenômica/métodos , Neoplasias/microbiologia , Sequenciamento Completo do Genoma , Alphapapillomavirus/isolamento & purificação , Benchmarking , Helicobacter/isolamento & purificação , HumanosRESUMO
The gold standard for clinical diagnosis of bacterial lower respiratory infections (LRIs) is culture, which has poor sensitivity and is too slow to guide early, targeted antimicrobial therapy. Metagenomic sequencing could identify LRI pathogens much faster than culture, but methods are needed to remove the large amount of human DNA present in these samples for this approach to be feasible. We developed a metagenomics method for bacterial LRI diagnosis that features efficient saponin-based host DNA depletion and nanopore sequencing. Our pilot method was tested on 40 samples, then optimized and tested on a further 41 samples. Our optimized method (6 h from sample to result) was 96.6% sensitive and 41.7% specific for pathogen detection compared with culture and we could accurately detect antibiotic resistance genes. After confirmatory quantitative PCR and pathobiont-specific gene analyses, specificity and sensitivity increased to 100%. Nanopore metagenomics can rapidly and accurately characterize bacterial LRIs and might contribute to a reduction in broad-spectrum antibiotic use.
Assuntos
Bactérias/isolamento & purificação , Bronquite/diagnóstico , DNA Bacteriano/genética , Metagenômica/métodos , Nanoporos , Pneumonia Bacteriana/diagnóstico , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/genética , Bronquite/microbiologia , Farmacorresistência Bacteriana/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Projetos Piloto , Pneumonia Bacteriana/microbiologia , Sensibilidade e EspecificidadeRESUMO
Oxford Nanopore Technologies' MinION sequencer was launched in pre-release form in 2014 and represents an exciting new sequencing paradigm. The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated. Crucially, it is an extremely compact device that is powered from the USB port of a laptop computer, enabling it to be taken out of the lab and facilitating previously impossible in-field sequencing experiments to be undertaken. Many of the initial publications concerning the platform focused on provision of tools to access and analyse the new sequence formats and then demonstrating the assembly of microbial genomes. More recently, as throughput and accuracy have increased, it has been possible to begin work involving more complex genomes and metagenomes. With the release of the high-throughput GridION X5 and PromethION platforms, the sequencing of large genomes will become more cost efficient, and enable the leveraging of extremely long (>100 kb) reads for resolution of complex genomic structures. This review provides a brief overview of nanopore sequencing technology, describes the growing range of nanopore bioinformatics tools, and highlights some of the most influential publications that have emerged over the last 2 years. Finally, we look to the future and the potential the platform has to disrupt work in human, microbiome, and plant genomics.
Assuntos
Biologia Computacional/métodos , Genoma de Planta/genética , Nanoporos , Plantas/genética , Análise de Sequência de DNA/métodos , Biologia Computacional/instrumentação , Genoma Humano/genética , Humanos , Microbiota/genética , Análise de Sequência de DNA/instrumentaçãoRESUMO
BACKGROUND: Long-read sequencing is rapidly evolving and reshaping the suite of opportunities for genomic analysis. For the MinION in particular, as both the platform and chemistry develop, the user community requires reference data to set performance expectations and maximally exploit third-generation sequencing. We performed an analysis of MinION data derived from whole genome sequencing of Escherichiacoli K-12 using the R9.0 chemistry, comparing the results with the older R7.3 chemistry. METHODS: We computed the error-rate estimates for insertions, deletions, and mismatches in MinION reads. RESULTS: Run-time characteristics of the flow cell and run scripts for R9.0 were similar to those observed for R7.3 chemistry, but with an 8-fold increase in bases per second (from 30 bps in R7.3 and SQK-MAP005 library preparation, to 250 bps in R9.0) processed by individual nanopores, and less drop-off in yield over time. The 2-dimensional ("2D") N50 read length was unchanged from the prior chemistry. Using the proportion of alignable reads as a measure of base-call accuracy, 99.9% of "pass" template reads from 1-dimensional ("1D") experiments were mappable and ~97% from 2D experiments. The median identity of reads was ~89% for 1D and ~94% for 2D experiments. The total error rate (miscall + insertion + deletion ) decreased for 2D "pass" reads from 9.1% in R7.3 to 7.5% in R9.0 and for template "pass" reads from 26.7% in R7.3 to 14.5% in R9.0. CONCLUSIONS: These Phase 2 MinION experiments serve as a baseline by providing estimates for read quality, throughput, and mappability. The datasets further enable the development of bioinformatic tools tailored to the new R9.0 chemistry and the design of novel biological applications for this technology. ABBREVIATIONS: K: thousand, Kb: kilobase (one thousand base pairs), M: million, Mb: megabase (one million base pairs), Gb: gigabase (one billion base pairs).
RESUMO
MOTIVATION: The Oxford Nanopore MinION sequencer, currently in pre-release testing through the MinION Access Programme (MAP), promises long reads in real-time from an inexpensive, compact, USB device. Tools have been released to extract FASTA/Q from the MinION base calling output and to provide basic yield statistics. However, no single tool yet exists to provide comprehensive alignment-based quality control and error profile analysis--something that is extremely important given the speed with which the platform is evolving. RESULTS: NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others. AVAILABILITY AND IMPLEMENTATION: NanoOK is an open-source software, implemented in Java with supporting R scripts. It has been tested on Linux and Mac OS X and can be downloaded from https://github.com/TGAC/NanoOK. A VirtualBox VM containing all dependencies and the DH10B read set used in this article is available from http://opendata.tgac.ac.uk/nanook/. A Docker image is also available from Docker Hub--see program documentation https://documentation.tgac.ac.uk/display/NANOOK. CONTACT: richard.leggett@tgac.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Confiabilidade dos Dados , Nanoporos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA , Software , Sequência de Bases , Escherichia coli K12/genéticaRESUMO
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.
Assuntos
Mapeamento de Sequências Contíguas/métodos , DNA Viral/química , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Contaminação por DNA , Humanos , Fígado/virologia , Análise de Sequência de DNA/métodosRESUMO
The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance.