RESUMO
High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.
Assuntos
Bacteriófagos , Vírus de RNA , Bacteriófagos/genética , RNA Polimerases Dirigidas por DNA/genética , Genoma Viral , Filogenia , RNA , Vírus de RNA/genética , RNA Polimerase Dependente de RNA/genética , ViromaRESUMO
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Assuntos
Metagenoma , Metagenômica , Microbiologia , Proteínas , Análise por Conglomerados , Metagenoma/genética , Metagenômica/métodos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Bases de Dados de Proteínas , Conformação ProteicaRESUMO
CRISPR-Cas12c/d proteins share limited homology with Cas12a and Cas9 bacterial CRISPR RNA (crRNA)-guided nucleases used widely for genome editing and DNA detection. However, Cas12c (C2c3)- and Cas12d (CasY)-catalyzed DNA cleavage and genome editing activities have not been directly observed. We show here that a short-complementarity untranslated RNA (scoutRNA), together with crRNA, is required for Cas12d-catalyzed DNA cutting. The scoutRNA differs in secondary structure from previously described tracrRNAs used by CRISPR-Cas9 and some Cas12 enzymes, and in Cas12d-containing systems, scoutRNA includes a conserved five-nucleotide sequence that is essential for activity. In addition to supporting crRNA-directed DNA recognition, biochemical and cell-based experiments establish scoutRNA as an essential cofactor for Cas12c-catalyzed pre-crRNA maturation. These results define scoutRNA as a third type of transcript encoded by a subset of CRISPR-Cas genomic loci and explain how Cas12c/d systems avoid requirements for host factors including ribonuclease III for bacterial RNA-mediated adaptive immunity.
Assuntos
Bactérias/genética , Proteínas de Bactérias/genética , Sistemas CRISPR-Cas , Endodesoxirribonucleases/genética , Genoma Bacteriano/imunologia , RNA Bacteriano/genética , Pequeno RNA não Traduzido/genética , Bactérias/classificação , Bactérias/imunologia , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Sequência de Bases , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Endodesoxirribonucleases/metabolismo , Escherichia coli/genética , Escherichia coli/imunologia , Escherichia coli/metabolismo , Conformação de Ácido Nucleico , Filogenia , RNA Bacteriano/química , RNA Bacteriano/metabolismo , RNA Guia de Cinetoplastídeos/genética , RNA Guia de Cinetoplastídeos/metabolismo , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/metabolismo , Alinhamento de Sequência , Homologia de Sequência do Ácido NucleicoRESUMO
CRISPR-Cas immunity requires integration of short, foreign DNA fragments into the host genome at the CRISPR locus, a site consisting of alternating repeat sequences and foreign-derived spacers. In most CRISPR systems, the proteins Cas1 and Cas2 form the integration complex and are both essential for DNA acquisition. Most type V-C and V-D systems lack the cas2 gene and have unusually short CRISPR repeats and spacers. Here, we show that a mini-integrase comprising the type V-C Cas1 protein alone catalyzes DNA integration with a preference for short (17- to 19-base-pair) DNA fragments. The mini-integrase has weak specificity for the CRISPR array. We present evidence that the Cas1 proteins form a tetramer for integration. Our findings support a model of a minimal integrase with an internal ruler mechanism that favors shorter repeats and spacers. This minimal integrase may represent the function of the ancestral Cas1 prior to Cas2 adoption.
Assuntos
Proteínas Associadas a CRISPR/genética , Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , DNA Bacteriano/genética , Endodesoxirribonucleases/genética , Endonucleases/genética , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Edição de Genes/métodos , Integrases/genética , Pareamento de Bases , Proteínas Associadas a CRISPR/metabolismo , DNA Bacteriano/metabolismo , Endodesoxirribonucleases/metabolismo , Endonucleases/metabolismo , Escherichia coli/enzimologia , Proteínas de Escherichia coli/metabolismo , Regulação Bacteriana da Expressão Gênica , Integrases/metabolismo , Motivos de Nucleotídeos , Especificidade por SubstratoRESUMO
Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data. This led to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. Analysis of 58,023 major capsid proteins from large and giant viruses using metagenomic data revealed the global distribution patterns and cosmopolitan nature of these viruses. The discovered viral genomes encoded a wide range of proteins with putative roles in photosynthesis and diverse substrate transport processes, indicating that host reprogramming is probably a common strategy in the NCLDVs. Furthermore, inferences of horizontal gene transfer connected viral lineages to diverse eukaryotic hosts. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses-which are associated with most major eukaryotic lineages-as important players in ecosystems across Earth's biomes.
Assuntos
Biodiversidade , Vírus de DNA/classificação , Vírus de DNA/genética , Células Eucarióticas/metabolismo , Células Eucarióticas/virologia , Interações entre Hospedeiro e Microrganismos/genética , Metagenômica , Animais , Proteínas do Capsídeo/genética , Transferência Genética Horizontal , Genoma Viral/genética , Vírus Gigantes/classificação , Vírus Gigantes/genética , FilogeniaRESUMO
Viruses are integral components of all ecosystems and microbiomes on Earth. Through pervasive infections of their cellular hosts, viruses can reshape microbial community structure and drive global nutrient cycling. Over the past decade, viral sequences identified from genomes and metagenomes have provided an unprecedented view of viral genome diversity in nature. Since 2016, the IMG/VR database has provided access to the largest collection of viral sequences obtained from (meta)genomes. Here, we present the third version of IMG/VR, composed of 18 373 cultivated and 2 314 329 uncultivated viral genomes (UViGs), nearly tripling the total number of sequences compared to the previous version. These clustered into 935 362 viral Operational Taxonomic Units (vOTUs), including 188 930 with two or more members. UViGs in IMG/VR are now reported as single viral contigs, integrated proviruses or genome bins, and are annotated with a new standardized pipeline including genome quality estimation using CheckV, taxonomic classification reflecting the latest ICTV update, and expanded host taxonomy prediction. The new IMG/VR interface enables users to efficiently browse, search, and select UViGs based on genome features and/or sequence similarity. IMG/VR v3 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.
Assuntos
Bases de Dados Genéticas , Ecossistema , Evolução Molecular , Genoma Viral , Vírus/genética , Sequência de Bases , Análise por Conglomerados , Geografia , Anotação de Sequência Molecular , Homologia de Sequência do Ácido Nucleico , Interface Usuário-ComputadorRESUMO
MOTIVATION: Two key steps in the analysis of uncultured viruses recovered from metagenomes are the taxonomic classification of the viral sequences and the identification of putative host(s). Both steps rely mainly on the assignment of viral proteins to orthologs in cultivated viruses. Viral Protein Families (VPFs) can be used for the robust identification of new viral sequences in large metagenomics datasets. Despite the importance of VPF information for viral discovery, VPFs have not yet been explored for determining viral taxonomy and host targets. RESULTS: In this work, we classified the set of VPFs from the IMG/VR database and developed VPF-Class. VPF-Class is a tool that automates the taxonomic classification and host prediction of viral contigs based on the assignment of their proteins to a set of classified VPFs. Applying VPF-Class on 731K uncultivated virus contigs from the IMG/VR database, we were able to classify 363K contigs at the genus level and predict the host of over 461K contigs. In the RefSeq database, VPF-class reported an accuracy of nearly 100% to classify dsDNA, ssDNA and retroviruses, at the genus level, considering a membership ratio and a confidence score of 0.2. The accuracy in host prediction was 86.4%, also at the genus level, considering a membership ratio of 0.3 and a confidence score of 0.5. And, in the prophages dataset, the accuracy in host prediction was 86% considering a membership ratio of 0.6 and a confidence score of 0.8. Moreover, from the Global Ocean Virome dataset, over 817K viral contigs out of 1 million were classified. AVAILABILITY AND IMPLEMENTATION: The implementation of VPF-Class can be downloaded from https://github.com/biocom-uib/vpf-tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Understanding CRISPR-Cas systems-the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks-is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical 'rich-get-richer' mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems.
Assuntos
Sistemas CRISPR-Cas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Metagenoma/genética , Modelos Genéticos , Archaea/genética , Bactérias/genética , Bacteriófagos/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/imunologia , DNA Intergênico/genética , DNA Viral/genética , MetagenômicaRESUMO
Viruses are the most abundant biological entities on Earth, but challenges in detecting, isolating, and classifying unknown viruses have prevented exhaustive surveys of the global virome. Here we analysed over 5 Tb of metagenomic sequence data from 3,042 geographically diverse samples to assess the global distribution, phylogenetic diversity, and host specificity of viruses. We discovered over 125,000 partial DNA viral genomes, including the largest phage yet identified, and increased the number of known viral genes by 16-fold. Half of the predicted partial viral genomes were clustered into genetically distinct groups, most of which included genes unrelated to those in known viruses. Using CRISPR spacers and transfer RNA matches to link viral groups to microbial host(s), we doubled the number of microbial phyla known to be infected by viruses, and identified viruses that can infect organisms from different phyla. Analysis of viral distribution across diverse ecosystems revealed strong habitat-type specificity for the vast majority of viruses, but also identified some cosmopolitan groups. Our results highlight an extensive global viral diversity and provide detailed insight into viral habitat distribution and hostvirus interactions.
Assuntos
Planeta Terra , Ecossistema , Genoma Viral/genética , Metagenômica , Vírus/genética , Animais , Organismos Aquáticos/virologia , Bacteriófagos/genética , Biodiversidade , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , DNA Viral/análise , DNA Viral/genética , Conjuntos de Dados como Assunto , Genes Virais , Especificidade de Hospedeiro , Interações Hospedeiro-Patógeno , Humanos , Metagenoma/genética , Filogenia , Filogeografia , RNA de Transferência/genética , Análise de Sequência , Vírus/classificação , Vírus/isolamento & purificaçãoRESUMO
The Integrated Microbial Genome/Virus (IMG/VR) system v.2.0 (https://img.jgi.doe.gov/vr/) is the largest publicly available data management and analysis platform dedicated to viral genomics. Since the last report published in the 2016, NAR Database Issue, the data has tripled in size and currently contains genomes of 8389 cultivated reference viruses, 12 498 previously published curated prophages derived from cultivated microbial isolates, and 735 112 viral genomic fragments computationally predicted from assembled shotgun metagenomes. Nearly 60% of the viral genomes and genome fragments are clustered into 110 384 viral Operational Taxonomic Units (vOTUs) with two or more members. To improve data quality and predictions of host specificity, IMG/VR v.2.0 now separates prokaryotic and eukaryotic viruses, utilizes known prophage sequences to improve taxonomic assignments, and provides viral genome quality scores based on the estimated genome completeness. New features also include enhanced BLAST search capabilities for external queries. Finally, geographic map visualization to locate user-selected viral genomes or genome fragments has been implemented and download options have been extended. All of these features make IMG/VR v.2.0 a key resource for the study of viruses.
Assuntos
Gerenciamento de Dados/métodos , Genoma Viral , Genômica/métodos , SoftwareRESUMO
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community.
Assuntos
Vírus de DNA/genética , Bases de Dados Genéticas , Genoma Viral , Genômica/métodos , Metagenômica/métodos , Retroviridae/genética , Software , Microbiologia Ambiental , Interações Hospedeiro-Patógeno , Metagenoma , Análise de Sequência de DNARESUMO
Historically neglected by microbial ecologists, soil viruses are now thought to be critical to global biogeochemical cycles. However, our understanding of their global distribution, activities and interactions with the soil microbiome remains limited. Here we present the Global Soil Virus Atlas, a comprehensive dataset compiled from 2,953 previously sequenced soil metagenomes and composed of 616,935 uncultivated viral genomes and 38,508 unique viral operational taxonomic units. Rarefaction curves from the Global Soil Virus Atlas indicate that most soil viral diversity remains unexplored, further underscored by high spatial turnover and low rates of shared viral operational taxonomic units across samples. By examining genes associated with biogeochemical functions, we also demonstrate the viral potential to impact soil carbon and nutrient cycling. This study represents an extensive characterization of soil viral diversity and provides a foundation for developing testable hypotheses regarding the role of the virosphere in the soil microbiome and global biogeochemistry.
Assuntos
Biodiversidade , Genoma Viral , Metagenoma , Microbiota , Microbiologia do Solo , Solo , Vírus , Vírus/genética , Vírus/classificação , Vírus/isolamento & purificação , Solo/química , Genoma Viral/genética , Microbiota/genética , Carbono/metabolismo , Metagenômica , Filogenia , Viroma/genética , Bactérias/genética , Bactérias/classificação , Bactérias/isolamento & purificaçãoRESUMO
[This corrects the article DOI: 10.3389/fbioe.2020.00034.].
RESUMO
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
RESUMO
IMPORTANCE: The findings of this study are significant, as N4-like viruses represent a unique viral lineage with a distinct replication mechanism and a conserved core genome. This work has resulted in a comprehensive global map of the entire N4-like viral lineage, including information on their distribution in different biomes, evolutionary divergence, genomic diversity, and the potential for viral-mediated host metabolic reprogramming. As such, this work significantly contributes to our understanding of the ecological function and viral-host interactions of bacteriophages.
Assuntos
Bacteriófagos , Vírus , Genoma Viral/genética , Filogenia , Vírus/genética , Bacteriófagos/genética , GenômicaRESUMO
Recent advances in environmental genomics have provided unprecedented opportunities for the investigation of viruses in natural settings. Yet, our knowledge of viral biogeographic patterns and the corresponding drivers is still limited. Here, we perform metagenomic deep sequencing on 90 acid mine drainage (AMD) sediments sampled across Southern China and examine the biogeography of viruses in this extreme environment. The results demonstrate that prokaryotic communities dictate viral taxonomic and functional diversity, abundance and structure, whereas other factors especially latitude and mean annual temperature also impact viral populations and functions. In silico predictions highlight lineage-specific virus-host abundance ratios and richness-dependent virus-host interaction structure. Further functional analyses reveal important roles of environmental conditions and horizontal gene transfers in shaping viral auxiliary metabolic genes potentially involved in phosphorus assimilation. Our findings underscore the importance of both abiotic and biotic factors in predicting the taxonomic and functional biogeographic dynamics of viruses in the AMD sediments.
Assuntos
Biodiversidade , Vírus , Ácidos , Metagenoma/genética , Mineração , Vírus/genéticaRESUMO
Metagenomics is unearthing the previously hidden world of soil viruses. Many soil viral sequences in metagenomes contain putative auxiliary metabolic genes (AMGs) that are not associated with viral replication. Here, we establish that AMGs on soil viruses actually produce functional, active proteins. We focus on AMGs that potentially encode chitosanase enzymes that metabolize chitin - a common carbon polymer. We express and functionally screen several chitosanase genes identified from environmental metagenomes. One expressed protein showing endo-chitosanase activity (V-Csn) is crystalized and structurally characterized at ultra-high resolution, thus representing the structure of a soil viral AMG product. This structure provides details about the active site, and together with structure models determined using AlphaFold, facilitates understanding of substrate specificity and enzyme mechanism. Our findings support the hypothesis that soil viruses contribute auxiliary functions to their hosts.
Assuntos
Solo , Vírus , Carbono , Quitina , Glicosídeo Hidrolases/metabolismo , Proteínas Virais/genética , Vírus/genéticaRESUMO
Hadal ocean biosphere, that is, the deepest part of the world's oceans, harbors a unique microbial community, suggesting a potential uncovered co-occurring virioplankton assemblage. Herein, we reveal the unique virioplankton assemblages of the Challenger Deep, comprising 95,813 non-redundant viral contigs from the surface to the hadal zone. Almost all of the dominant viral contigs in the hadal zone were unclassified, potentially related to Alteromonadales and Oceanospirillales. 2,586 viral auxiliary metabolic genes from 132 different KEGG orthologous groups were mainly related to the carbon, nitrogen, sulfur, and arsenic metabolism. Lysogenic viral production and integrase genes were augmented in the hadal zone, suggesting the prevalence of viral lysogenic life strategy. Abundant rve genes in the hadal zone, which function as transposase in the caudoviruses, further suggest the prevalence of viral-mediated horizontal gene transfer. This study provides fundamental insights into the virioplankton assemblages of the hadal zone, reinforcing the necessity of incorporating virioplankton into the hadal biogeochemical cycles.
RESUMO
Prokaryotic tolerance to inorganic arsenic is a widespread trait habitually determined by operons encoding an As (III)-responsive repressor (ArsR), an As (V)-reductase (ArsC), and an As (III)-export pump (ArsB), often accompanied by other complementary genes. Enigmatically, the genomes of many environmental bacteria typically contain two or more copies of this basic genetic device arsRBC. To shed some light on the logic of such apparently unnecessary duplication(s) we have inspected the regulation-together and by separate-of the two ars clusters borne by the soil bacterium Pseudomonas putida strain KT2440, in particular the cross talk between the two repressors ArsR1/ArsR2 and the respective promoters. DNase I footprinting and gel retardation analyses of Pars1 and Pars2 with their matching regulators revealed non-identical binding sequences and interaction patterns for each of the systems. However, in vitro transcription experiments exposed that the repressors could downregulate each other's promoters, albeit within a different set of parameters. The regulatory frame that emerges from these data corresponds to a particular type of bifan motif where all key interactions have a negative sign. The distinct regulatory architecture that stems from coexistence of various ArsR variants in the same cells could enter an adaptive advantage that favors the maintenance of the two proteins as separate repressors.
RESUMO
Viruses are ubiquitous and abundant in the oceans, and viral metagenomes (viromes) have been investigated extensively via several large-scale ocean sequencing projects. However, there have not been any systematic viromic studies in estuaries. Here, we investigated the viromes of the Delaware Bay and Chesapeake Bay, two Mid-Atlantic estuaries. Deep sequencing generated a total of 48,190 assembled viral sequences (>5 kb) and 26,487 viral populations (9,204 virus clusters and 17,845 singletons), including 319 circular viral contigs between 7.5 kb and 161.8 kb. Unknown viruses represented the vast majority of the dominant populations, while the composition of known viruses, such as pelagiphage and cyanophage, appeared to be relatively consistent across a wide range of salinity gradients and in different seasons. A difference between estuarine and ocean viromes was reflected by the proportions of Myoviridae, Podoviridae, Siphoviridae, Phycodnaviridae, and a few well-studied virus representatives. The difference in viral community between the Delaware Bay and Chesapeake Bay is significantly more pronounced than the difference caused by temperature or salinity, indicating strong local profiles caused by the unique ecology of each estuary. Interestingly, a viral contig similar to phages infecting Acinetobacter baumannii ("Iraqibacter") was found to be highly abundant in the Delaware Bay but not in the Chesapeake Bay, the source of which is yet to be identified. Highly abundant viruses in both estuaries have close hits to viral sequences derived from the marine single-cell genomes or long-read single-molecule sequencing, suggesting that important viruses are still waiting to be discovered in the estuarine environment.IMPORTANCE This is the first systematic study about spatial and temporal variation of virioplankton communities in estuaries using deep metagenomics sequencing. It is among the highest-quality viromic data sets to date, showing remarkably consistent sequencing depth and quality across samples. Our results indicate that there exists a large pool of abundant and diverse viruses in estuaries that have not yet been cultivated, their genomes only available thanks to single-cell genomics or single-molecule sequencing, demonstrating the importance of these methods for viral discovery. The spatiotemporal pattern of these abundant uncultivated viruses is more variable than that of cultured viruses. Despite strong environmental gradients, season and location had surprisingly little impact on the viral community within an estuary, but we saw a significant distinction between the two estuaries and also between estuarine and open ocean viromes.