RESUMO
Microbes drive most ecosystems and are modulated by viruses that impact their lifespan, gene flow, and metabolic outputs. However, ecosystem-level impacts of viral community diversity remain difficult to assess due to classification issues and few reference genomes. Here, we establish an â¼12-fold expanded global ocean DNA virome dataset of 195,728 viral populations, now including the Arctic Ocean, and validate that these populations form discrete genotypic clusters. Meta-community analyses revealed five ecological zones throughout the global ocean, including two distinct Arctic regions. Across the zones, local and global patterns and drivers in viral community diversity were established for both macrodiversity (inter-population diversity) and microdiversity (intra-population genetic variation). These patterns sometimes, but not always, paralleled those from macro-organisms and revealed temperate and tropical surface waters and the Arctic as biodiversity hotspots and mechanistic hypotheses to explain them. Such further understanding of ocean viruses is critical for broader inclusion in ecosystem models.
Assuntos
Organismos Aquáticos/genética , Biodiversidade , Vírus de DNA/genética , DNA Viral/genética , Metagenoma , Microbiologia da ÁguaRESUMO
Microbial genome annotation is the process of identifying structural and functional elements in DNA sequences and subsequently attaching biological information to those elements. DRAM is a tool developed to annotate bacterial, archaeal, and viral genomes derived from pure cultures or metagenomes. DRAM goes beyond traditional annotation tools by distilling multiple gene annotations to genome level summaries of functional potential. Despite these benefits, a downside of DRAM is the requirement of large computational resources, which limits its accessibility. Further, it did not integrate with downstream metabolic modeling tools that require genome annotation. To alleviate these constraints, DRAM and the viral counterpart, DRAM-v, are now available and integrated with the freely accessible KBase cyberinfrastructure. With kb_DRAM users can generate DRAM annotations and functional summaries from microbial or viral genomes in a point-and-click interface, as well as generate genome-scale metabolic models from DRAM annotations. AVAILABILITY AND IMPLEMENTATION: For kb_DRAM users, the kb_DRAM apps on KBase can be found in the catalog at https://narrative.kbase.us/#catalog/modules/kb_DRAM. For kb_DRAM users, a tutorial workflow with all documentation is available at https://narrative.kbase.us/narrative/129480. For kb_DRAM developers, software is available at https://github.com/shafferm/kb_DRAM.
Assuntos
Bactérias , Software , Anotação de Sequência Molecular , Bactérias/genética , Archaea/genética , MetabolômicaRESUMO
MOTIVATION: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. RESULTS: Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from 'conservative' to 'eXtremely Conservative' resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by â¼24% on average (up to â¼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. AVAILABILITY AND IMPLEMENTATION: The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Microbiota , Vírus , Animais , Proteínas Virais , Software , Metagenômica/métodosRESUMO
BACKGROUND: Viruses are a significant player in many biosphere and human ecosystems, but most signals remain "hidden" in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. RESULTS: Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2's modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. CONCLUSION: With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available ( https://bitbucket.org/MAVERICLab/virsorter2 ), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse ( https://de.cyverse.org/de ). Video abstract.
Assuntos
Vírus de DNA/classificação , Genoma Viral/genética , Metagenômica , Vírus de RNA/classificação , Software , Vírus de DNA/genética , Ecossistema , Humanos , Vírus de RNA/genéticaRESUMO
Although millions of distinct virus species likely exist, only approximately 9000 are catalogued in GenBank's RefSeq database. We selectively enriched for the genomes of circular DNA viruses in over 70 animal samples, ranging from nematodes to human tissue specimens. A bioinformatics pipeline, Cenote-Taker, was developed to automatically annotate over 2500 complete genomes in a GenBank-compliant format. The new genomes belong to dozens of established and emerging viral families. Some appear to be the result of previously undescribed recombination events between ssDNA and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these 'dark matter' sequences, we used an artificial neural network to identify candidate viral capsid proteins, several of which formed virus-like particles when expressed in culture. These data further the understanding of viral sequence diversity and allow for high throughput documentation of the virosphere.
Assuntos
Vírus de DNA , DNA Circular/genética , Animais , Proteínas do Capsídeo/química , Proteínas do Capsídeo/genética , Proteínas do Capsídeo/metabolismo , Infecções por Vírus de DNA/virologia , Vírus de DNA/classificação , Vírus de DNA/genética , DNA Viral/genética , Genoma Viral/genética , Humanos , Anotação de Sequência Molecular , SoftwareRESUMO
Oceanic viruses that infect bacteria, or phages, are known to modulate host diversity, metabolisms, and biogeochemical cycling, while the viruses that infect marine Archaea remain understudied despite the critical ecosystem roles played by their hosts. Here we introduce "MArVD", for Metagenomic Archaeal Virus Detector, an annotation tool designed to identify putative archaeal virus contigs in metagenomic datasets. MArVD is made publicly available through the online iVirus analytical platform. Benchmarking analysis of MArVD showed it to be >99% accurate and 100% sensitive in identifying the 127 known archaeal viruses among the 12,499 viruses in the VirSorter curated dataset. Application of MArVD to 10 viral metagenomes from two depth profiles in the Eastern Tropical North Pacific (ETNP) oxygen minimum zone revealed 43 new putative archaeal virus genomes and large genome fragments ranging in size from 10 to 31 kb. Network-based classifications, which were consistent with marker gene phylogenies where available, suggested that these putative archaeal virus contigs represented six novel candidate genera. Ecological analyses, via fragment recruitment and ordination, revealed that the diversity and relative abundances of these putative archaeal viruses were correlated with oxygen concentration and temperature along two OMZ-spanning depth profiles, presumably due to structuring of the host Archaea community. Peak viral diversity and abundances were found in surface waters, where Thermoplasmata 16S rRNA genes are prevalent, suggesting these archaea as hosts in the surface habitats. Together these findings provide a baseline for identifying archaeal viruses in sequence datasets, and an initial picture of the ecology of such viruses in non-extreme environments.