RESUMO
BACKGROUND: This study presents an analysis of machine-learning model performance in image analysis, with a specific focus on videolaryngoscopy procedures. The research aimed to explore how dataset diversity and size affect the performance of machine-learning models, an issue vital to the advancement of clinical artificial intelligence tools. METHODS: A total of 377 videolaryngoscopy videos from YouTube were used to create 6 varied datasets, each differing in patient diversity and image count. The study also incorporates data augmentation techniques to enhance these datasets further. Two machine-learning models, YOLOv5-Small and YOLOv8-Small, were trained and evaluated on metrics such as F1 score (a statistical measure that combines the precision and recall of the model into a single metric, reflecting its overall accuracy), precision, recall, mAP@50, and mAP@50-95. RESULTS: The findings indicate a significant impact of dataset configuration on model performance, especially the balance between diversity and quantity. The Multi-25 × 10 dataset, featuring 25 images from 10 different patients, demonstrates superior performance, highlighting the value of a well-balanced dataset. The study also finds that the effects of data augmentation vary across different types of datasets. CONCLUSIONS: Overall, this study emphasizes the critical role of dataset structure in the performance of machine-learning models in medical image analysis. It underscores the necessity of striking an optimal balance between dataset size and diversity, thereby illuminating the complexities inherent in data-driven machine-learning development.
RESUMO
Carbon fixation is a key metabolic function shaping marine life, but the underlying taxonomic and functional diversity involved is only partially understood. Using metagenomic resources targeted at marine piconanoplankton, we provide a reproducible machine learning framework to derive the potential biogeography of genomic functions through the multi-output regression of gene read counts on environmental climatologies. Leveraging the Marine Atlas of Tara Oceans Unigenes, we investigate the genomic potential of primary production in the global ocean. The latter is performed by ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO) and is often associated with carbon concentration mechanisms in piconanoplankton, major marine unicellular photosynthetic organisms. We show that the genomic potential supporting C4 enzymes and RUBISCO exhibits strong functional redundancy and important affinity toward tropical oligotrophic waters. This redundancy is taxonomically structured by the dominance of Mamiellophyceae and Prymnesiophyceae in mid and high latitudes. These findings enhance our understanding of the relationship between functional and taxonomic diversity of microorganisms and environmental drivers of key biogeochemical cycles.
Assuntos
Fotossíntese , Ribulose-Bifosfato Carboxilase , Fotossíntese/genética , Ribulose-Bifosfato Carboxilase/genética , Ribulose-Bifosfato Carboxilase/metabolismo , Plâncton/genética , Plâncton/metabolismo , Genômica/métodos , Filogenia , Ciclo do Carbono , Metagenômica/métodos , Metagenoma , Água do MarRESUMO
Native amine dehydrogenases offer sustainable access to chiral amines, so the search for scaffolds capable of converting more diverse carbonyl compounds is required to reach the full potential of this alternative to conventional synthetic reductive aminations. Here we report a multidisciplinary strategy combining bioinformatics, chemoinformatics and biocatalysis to extensively screen billions of sequences in silico and to efficiently find native amine dehydrogenases features using computational approaches. In this way, we achieve a comprehensive overview of the initial native amine dehydrogenase family, extending it from 2,011 to 17,959 sequences, and identify native amine dehydrogenases with non-reported substrate spectra, including hindered carbonyls and ethyl ketones, and accepting methylamine and cyclopropylamine as amine donor. We also present preliminary model-based structural information to inform the design of potential (R)-selective amine dehydrogenases, as native amine dehydrogenases are mostly (S)-selective. This integrated strategy paves the way for expanding the resource of other enzyme families and in highlighting enzymes with original features.
Assuntos
Aminas , Aminas/metabolismo , Aminas/química , Especificidade por Substrato , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/metabolismo , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/genética , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/química , Biologia Computacional/métodos , Biocatálise , Biodiversidade , Modelos MolecularesRESUMO
Suicide is a complex, multidimensional event, and a significant challenge for prevention globally. Artificial intelligence (AI) and machine learning (ML) have emerged to harness large-scale datasets to enhance risk detection. In order to trust and act upon the predictions made with ML, more intuitive user interfaces must be validated. Thus, Interpretable AI is one of the crucial directions which could allow policy and decision makers to make reasonable and data-driven decisions that can ultimately lead to better mental health services planning and suicide prevention. This research aimed to develop sex-specific ML models for predicting the population risk of suicide and to interpret the models. Data were from the Quebec Integrated Chronic Disease Surveillance System (QICDSS), covering up to 98% of the population in the province of Quebec and containing data for over 20,000 suicides between 2002 and 2019. We employed a case-control study design. Individuals were considered cases if they were aged 15+ and had died from suicide between January 1st, 2002, and December 31st, 2019 (n = 18339). Controls were a random sample of 1% of the Quebec population aged 15+ of each year, who were alive on December 31st of each year, from 2002 to 2019 (n = 1,307,370). We included 103 features, including individual, programmatic, systemic, and community factors, measured up to five years prior to the suicide events. We trained and then validated the sex-specific predictive risk model using supervised ML algorithms, including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Multilayer perceptron (MLP). We computed operating characteristics, including sensitivity, specificity, and Positive Predictive Value (PPV). We then generated receiver operating characteristic (ROC) curves to predict suicides and calibration measures. For interpretability, Shapley Additive Explanations (SHAP) was used with the global explanation to determine how much the input features contribute to the models' output and the largest absolute coefficients. The best sensitivity was 0.38 with logistic regression for males and 0.47 with MLP for females; the XGBoost Classifier with 0.25 for males and 0.19 for females had the best precision (PPV). This study demonstrated the useful potential of explainable AI models as tools for decision-making and population-level suicide prevention actions. The ML models included individual, programmatic, systemic, and community levels variables available routinely to decision makers and planners in a public managed care system. Caution shall be exercised in the interpretation of variables associated in a predictive model since they are not causal, and other designs are required to establish the value of individual treatments. The next steps are to produce an intuitive user interface for decision makers, planners and other stakeholders like clinicians or representatives of families and people with live experience of suicidal behaviors or death by suicide. For example, how variations in the quality of local area primary care programs for depression or substance use disorders or increased in regional mental health and addiction budgets would lower suicide rates.
Assuntos
Inteligência Artificial , Suicídio , Feminino , Masculino , Humanos , Estudos de Casos e Controles , Quebeque/epidemiologia , Dados de Saúde Coletados RotineiramenteRESUMO
Public sequencing databases contain vast amounts of biological information, yet they are largely underutilized as it is challenging to efficiently search them for any sequence(s) of interest. We present kmindex, an approach that can index thousands of metagenomes and perform sequence searches in a fraction of a second. The index construction is an order of magnitude faster than previous methods, while search times are two orders of magnitude faster. With negligible false positive rates below 0.01%, kmindex outperforms the precision of existing approaches by four orders of magnitude. Here we demonstrate the scalability of kmindex by successfully indexing 1,393 marine seawater metagenome samples from the Tara Oceans project. Additionally, we introduce the publicly accessible web server Ocean Read Atlas, which enables real-time queries on the Tara Oceans dataset.
Assuntos
Genômica , Água do Mar , Oceanos e Mares , Metagenoma/genética , Bases de Dados de Ácidos NucleicosRESUMO
Despite being perennially frigid, polar oceans form an ecosystem hosting high and unique biodiversity. Various organisms show different adaptive strategies in this habitat, but how viruses adapt to this environment is largely unknown. Viruses of phyla Nucleocytoviricota and Mirusviricota are groups of eukaryote-infecting large and giant DNA viruses with genomes encoding a variety of functions. Here, by leveraging the Global Ocean Eukaryotic Viral database, we investigate the biogeography and functional repertoire of these viruses at a global scale. We first confirm the existence of an ecological barrier that clearly separates polar and nonpolar viral communities, and then demonstrate that temperature drives dramatic changes in the virus-host network at the polar-nonpolar boundary. Ancestral niche reconstruction suggests that adaptation of these viruses to polar conditions has occurred repeatedly over the course of evolution, with polar-adapted viruses in the modern ocean being scattered across their phylogeny. Numerous viral genes are specifically associated with polar adaptation, although most of their homologues are not identified as polar-adaptive genes in eukaryotes. These results suggest that giant viruses adapt to cold environments by changing their functional repertoire, and this viral evolutionary strategy is distinct from the polar adaptation strategy of their hosts.
Assuntos
Vírus Gigantes , Vírus , Vírus Gigantes/genética , Genoma Viral/genética , Ecossistema , Oceanos e Mares , Filogenia , Vírus de DNA/genética , Genômica , Vírus/genética , Eucariotos/genéticaRESUMO
At high latitudes, strong seasonal differences in light availability affect marine organisms and regulate the timing of ecosystem processes. Marine protists are key players in Arctic aquatic ecosystems, yet little is known about their ecological roles over yearly cycles. This is especially true for the dark polar night period, which up until recently was assumed to be devoid of biological activity. A 12 million transcripts catalogue was built from 0.45 to 10 µm protist assemblages sampled over 13 months in a time series station in an Arctic fjord in Svalbard. Community gene expression was correlated with seasonality, with light as the main driving factor. Transcript diversity and evenness were higher during polar night compared to polar day. Light-dependent functions had higher relative expression during polar day, except phototransduction. 64% of the most expressed genes could not be functionally annotated, yet up to 78% were identified in Arctic samples from Tara Oceans, suggesting that Arctic marine assemblages are distinct from those from other oceans. Our study increases understanding of the links between extreme seasonality and biological processes in pico- and nanoplanktonic protists. Our results set the ground for future monitoring studies investigating the seasonal impact of climate change on the communities of microbial eukaryotes in the High Arctic.
Assuntos
Mudança Climática , Ecossistema , Estuários , Eucariotos , Expressão GênicaRESUMO
DNA viruses have a major influence on the ecology and evolution of cellular organisms1-4, but their overall diversity and evolutionary trajectories remain elusive5. Here we carried out a phylogeny-guided genome-resolved metagenomic survey of the sunlit oceans and discovered plankton-infecting relatives of herpesviruses that form a putative new phylum dubbed Mirusviricota. The virion morphogenesis module of this large monophyletic clade is typical of viruses from the realm Duplodnaviria6, with multiple components strongly indicating a common ancestry with animal-infecting Herpesvirales. Yet, a substantial fraction of mirusvirus genes, including hallmark transcription machinery genes missing in herpesviruses, are closely related homologues of giant eukaryotic DNA viruses from another viral realm, Varidnaviria. These remarkable chimaeric attributes connecting Mirusviricota to herpesviruses and giant eukaryotic viruses are supported by more than 100 environmental mirusvirus genomes, including a near-complete contiguous genome of 432 kilobases. Moreover, mirusviruses are among the most abundant and active eukaryotic viruses characterized in the sunlit oceans, encoding a diverse array of functions used during the infection of microbial eukaryotes from pole to pole. The prevalence, functional activity, diversification and atypical chimaeric attributes of mirusviruses point to a lasting role of Mirusviricota in the ecology of marine ecosystems and in the evolution of eukaryotic DNA viruses.
Assuntos
Organismos Aquáticos , Vírus Gigantes , Herpesviridae , Oceanos e Mares , Filogenia , Plâncton , Animais , Ecossistema , Eucariotos/virologia , Genoma Viral/genética , Vírus Gigantes/classificação , Vírus Gigantes/genética , Herpesviridae/classificação , Herpesviridae/genética , Plâncton/virologia , Metagenômica , Metagenoma , Luz Solar , Transcrição Gênica/genética , Organismos Aquáticos/virologiaRESUMO
INTRODUCTION: Suicide has a complex aetiology and is a result of the interaction among the risk and protective factors at the individual, healthcare system and population levels. Therefore, policy and decision makers and mental health service planners can play an important role in suicide prevention. Although a number of suicide risk predictive tools have been developed, these tools were designed to be used by clinicians for assessing individual risk of suicide. There have been no risk predictive models to be used by policy and decision makers for predicting population risk of suicide at the national, provincial and regional levels. This paper aimed to describe the rationale and methodology for developing risk predictive models for population risk of suicide. METHODS AND ANALYSIS: A case-control study design will be used to develop sex-specific risk predictive models for population risk of suicide, using statistical regression and machine learning techniques. Routinely collected health administrative data in Quebec, Canada, and community-level social deprivation and marginalisation data will be used. The developed models will be transformed into the models that can be readily used by policy and decision makers. Two rounds of qualitative interviews with end-users and other stakeholders were proposed to understand their views about the developed models and potential systematic, social and ethical issues for implementation; the first round of qualitative interviews has been completed. We included 9440 suicide cases (7234 males and 2206 females) and 661 780 controls for model development. Three hundred and forty-seven variables at individual, healthcare system and community levels have been identified and will be included in least absolute shrinkage and selection operator regression for feature selection. ETHICS AND DISSEMINATION: This study is approved by the Health Research Ethnics Committee of Dalhousie University, Canada. This study takes an integrated knowledge translation approach, involving knowledge users from the beginning of the process.
Assuntos
Suicídio , Feminino , Masculino , Humanos , Estudos de Casos e Controles , Prevenção do Suicídio , Fatores de Proteção , Canadá/epidemiologiaRESUMO
Microbial communities in the world ocean are affected strongly by oceanic circulation, creating characteristic marine biomes. The high connectivity of most of the ocean makes it difficult to disentangle selective retention of colonizing genotypes (with traits suited to biome specific conditions) from evolutionary selection, which would act on founder genotypes over time. The Arctic Ocean is exceptional with limited exchange with other oceans and ice covered since the last ice age. To test whether Arctic microalgal lineages evolved apart from algae in the global ocean, we sequenced four lineages of microalgae isolated from Arctic waters and sea ice. Here we show convergent evolution and highlight geographically limited HGT as an ecological adaptive force in the form of PFAM complements and horizontal acquisition of key adaptive genes. Notably, ice-binding proteins were acquired and horizontally transferred among Arctic strains. A comparison with Tara Oceans metagenomes and metatranscriptomes confirmed mostly Arctic distributions of these IBPs. The phylogeny of Arctic-specific genes indicated that these events were independent of bacterial-sourced HGTs in Antarctic Southern Ocean microalgae.
Assuntos
Transferência Genética Horizontal , Microalgas , Transferência Genética Horizontal/genética , Microalgas/genética , Regiões Árticas , Oceanos e Mares , Camada de Gelo , BactériasRESUMO
Phytoplankton account for >45% of global primary production, and have an enormous impact on aquatic food webs and on the entire Earth System. Their members are found among prokaryotes (cyanobacteria) and multiple eukaryotic lineages containing chloroplasts. Genetic surveys of phytoplankton communities generally consist of PCR amplification of bacterial (16S), nuclear (18S) and/or chloroplastic (16S) rRNA marker genes from DNA extracted from environmental samples. However, our appreciation of phytoplankton abundance or biomass is limited by PCR-amplification biases, rRNA gene copy number variations across taxa, and the fact that rRNA genes do not provide insights into metabolic traits such as photosynthesis. Here, we targeted the photosynthetic gene psbO from metagenomes to circumvent these limitations: the method is PCR-free, and the gene is universally and exclusively present in photosynthetic prokaryotes and eukaryotes, mainly in one copy per genome. We applied and validated this new strategy with the size-fractionated marine samples collected by Tara Oceans, and showed improved correlations with flow cytometry and microscopy than when based on rRNA genes. Furthermore, we revealed unexpected features of the ecology of these ecosystems, such as the high abundance of picocyanobacterial aggregates and symbionts in the ocean, and the decrease in relative abundance of phototrophs towards the larger size classes of marine dinoflagellates. To facilitate the incorporation of psbO in molecular-based surveys, we compiled a curated database of >18,000 unique sequences. Overall, psbO appears to be a promising new gene marker for molecular-based evaluations of entire phytoplankton communities.
Assuntos
Metagenoma , Fitoplâncton , Fitoplâncton/genética , Ecossistema , Variações do Número de Cópias de DNA , Oceanos e Mares , RNA Ribossômico 16S/genética , Eucariotos/genéticaRESUMO
Diatoms form a diverse and abundant group of photosynthetic protists that are essential players in marine ecosystems. However, the microevolutionary structure of their populations remains poorly understood, particularly in polar regions. Exploring how closely related diatoms adapt to different environments is essential given their short generation times, which may allow rapid adaptations, and their prevalence in marine regions dramatically impacted by climate change, such as the Arctic and Southern Oceans. Here, we address genetic diversity patterns in Chaetoceros, the most abundant diatom genus and one of the most diverse, using 11 metagenome-assembled genomes (MAGs) reconstructed from Tara Oceans metagenomes. Genome-resolved metagenomics on these MAGs confirmed a prevalent distribution of Chaetoceros in the Arctic Ocean with lower dispersal in the Pacific and Southern Oceans as well as in the Mediterranean Sea. Single-nucleotide variants identified within the different MAG populations allowed us to draw a landscape of Chaetoceros genetic diversity and revealed an elevated genetic structure in some Arctic Ocean populations. Gene flow patterns of closely related Chaetoceros populations seemed to correlate with distinct abiotic factors rather than with geographic distance. We found clear positive selection of genes involved in nutrient availability responses, in particular for iron (e.g., ISIP2a, flavodoxin), silicate, and phosphate (e.g., polyamine synthase), that were further supported by analysis of Chaetoceros transcriptomes. Altogether, these results highlight the importance of environmental selection in shaping diatom diversity patterns and provide new insights into their metapopulation genomics through the integration of metagenomic and environmental data.
Assuntos
Diatomáceas , Diatomáceas/genética , Ecossistema , Genômica , MetagenômicaRESUMO
The smallest phytoplankton species are key actors in oceans biogeochemical cycling and their abundance and distribution are affected with global environmental changes. Among them, algae of the Pelagophyceae class encompass coastal species causative of harmful algal blooms while others are cosmopolitan and abundant. The lack of genomic reference in this lineage is a main limitation to study its ecological importance. Here, we analysed Pelagomonas calceolata relative abundance, ecological niche and potential for the adaptation in all oceans using a complete chromosome-scale assembled genome sequence. Our results show that P. calceolata is one of the most abundant eukaryotic species in the oceans with a relative abundance favoured by high temperature, low-light and iron-poor conditions. Climate change projections based on its relative abundance suggest an extension of the P. calceolata habitat toward the poles at the end of this century. Finally, we observed a specific gene repertoire and expression level variations potentially explaining its ecological success in low-iron and low-nitrate environments. Collectively, these findings reveal the ecological importance of P. calceolata and lay the foundation for a global scale analysis of the adaptation and acclimation strategies of this small phytoplankton in a changing environment.
Assuntos
Ferro , Estramenópilas , Aclimatação/genética , Cromossomos , Genômica , Ferro/metabolismo , Nitratos/metabolismo , Oceanos e Mares , Fitoplâncton/genética , Fitoplâncton/metabolismo , Estramenópilas/genéticaRESUMO
For more than a decade, high-throughput sequencing has transformed the study of marine planktonic communities and has highlighted the extent of protist diversity in these ecosystems. Nevertheless, little is known relative to their genomic diversity at the species-scale as well as their major speciation mechanisms. An increasing number of data obtained from global scale sampling campaigns is becoming publicly available, and we postulate that metagenomic data could contribute to deciphering the processes shaping protist genomic differentiation in the marine realm. As a proof of concept, we developed a findable, accessible, interoperable and reusable (FAIR) pipeline and focused on the Mediterranean Sea to study three a priori abundant protist species: Bathycoccus prasinos, Pelagomonas calceolata and Phaeocystis cordata. We compared the genomic differentiation of each species in light of geographic, environmental and oceanographic distances. We highlighted that isolation-by-environment shapes the genomic differentiation of B. prasinos, whereas P. cordata is impacted by geographic distance (i.e. isolation-by-distance). At present time, the use of metagenomics to accurately estimate the genomic differentiation of protists remains challenging since coverages are lower compared to traditional population surveys. However, our approach sheds light on ecological and evolutionary processes occurring within natural marine populations and paves the way for future protist population metagenomic studies.
Assuntos
Fitoplâncton , Estramenópilas , Mar Mediterrâneo , Fitoplâncton/genética , Ecossistema , GenômicaRESUMO
Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution of microscopic organisms, whose biogeographical patterns have long been debated. Here we assessed the global structure of plankton geography and its relation to the biological, chemical, and physical context of the ocean (the 'seascape') by analyzing metagenomes of plankton communities sampled across oceans during the Tara Oceans expedition, in light of environmental data and ocean current transport. Using a consistent approach across organismal sizes that provides unprecedented resolution to measure changes in genomic composition between communities, we report a pan-ocean, size-dependent plankton biogeography overlying regional heterogeneity. We found robust evidence for a basin-scale impact of transport by ocean currents on plankton biogeography, and on a characteristic timescale of community dynamics going beyond simple seasonality or life history transitions of plankton.
Oceans are brimming with life invisible to our eyes, a myriad of species of bacteria, viruses and other microscopic organisms essential for the health of the planet. These 'marine plankton' are unable to swim against currents and should therefore be constantly on the move, yet previous studies have suggested that distinct species of plankton may in fact inhabit different oceanic regions. However, proving this theory has been challenging; collecting plankton is logistically difficult, and it is often impossible to distinguish between species simply by examining them under a microscope. However, within the last decade, a research schooner called Tara has travelled the globe to gather thousands of plankton samples. At the same time, advances in genomics have made it possible to identify species based only on fragments of their DNA sequence. To understand the hidden geography of plankton communities in Earth's oceans, Richter et al. pored over DNA from the Tara Oceans expedition. This revealed that, despite being unable to resist the flow of water, various planktonic species which live close to the surface manage to occupy distinct, stable provinces shaped by currents. Different sizes of plankton are distributed in different sized provinces, with the smallest organisms tending to inhabit the smallest areas. Comparing DNA similarities and speeds of currents at the ocean surface revealed how these might stretch and mix plankton communities. Plankton play a critical role in the health of the ocean and the chemical cycles of planet Earth. These results could allow deeper investigation by marine modellers, ecologists, and evolutionary biologists. Meanwhile, work is already underway to investigate how climate change might impact this hidden geography.
Assuntos
Ecossistema , Plâncton , Genômica , Geografia , Oceanos e Mares , Plâncton/genéticaRESUMO
DNA viruses are increasingly recognized as influencing marine microbes and microbe-mediated biogeochemical cycling. However, little is known about global marine RNA virus diversity, ecology, and ecosystem roles. In this study, we uncover patterns and predictors of marine RNA virus community- and "species"-level diversity and contextualize their ecological impacts from pole to pole. Our analyses revealed four ecological zones, latitudinal and depth diversity patterns, and environmental correlates for RNA viruses. Our findings only partially parallel those of cosampled plankton and show unexpectedly high polar ecological interactions. The influence of RNA viruses on ecosystems appears to be large, as predicted hosts are ecologically important. Moreover, the occurrence of auxiliary metabolic genes indicates that RNA viruses cause reprogramming of diverse host metabolisms, including photosynthesis and carbon cycling, and that RNA virus abundances predict ocean carbon export.
Assuntos
Plâncton , Vírus de RNA , Água do Mar , Viroma , Ciclo do Carbono , Ecossistema , Oceanos e Mares , Plâncton/classificação , Plâncton/metabolismo , Plâncton/virologia , Vírus de RNA/classificação , Vírus de RNA/genética , Vírus de RNA/isolamento & purificação , Água do Mar/virologia , Viroma/genéticaRESUMO
Testing hypothesis about the biogeography of genes using large data resources such as Tara Oceans marine metagenomes and metatranscriptomes requires significant hardware resources and programming skills. The new release of the 'Ocean Gene Atlas' (OGA2) is a freely available intuitive online service to mine large and complex marine environmental genomic databases. OGA2 datasets available have been extended and now include, from the Tara Oceans portfolio: (i) eukaryotic Metagenome-Assembled-Genomes (MAGs) and Single-cell Assembled Genomes (SAGs) (10.2E+6 coding genes), (ii) version 2 of Ocean Microbial Reference Gene Catalogue (46.8E+6 non-redundant genes), (iii) 924 MetaGenomic Transcriptomes (7E+6 unigenes), (iv) 530 MAGs from an Arctic MAG catalogue (1E+6 genes) and (v) 1888 Bacterial and Archaeal Genomes (4.5E+6 genes), and an additional dataset from the Malaspina 2010 global circumnavigation: (vi) 317 Malaspina Deep Metagenome Assembled Genomes (0.9E+6 genes). Novel analyses enabled by OGA2 include phylogenetic tree inference to visualize user queries within their context of sequence homologues from both the marine environmental dataset and the RefSeq database. An Application Programming Interface (API) now allows users to query OGA2 using command-line tools, hence providing local workflow integration. Finally, gene abundance can be interactively filtered directly on map displays using any of the available environmental variables. Ocean Gene Atlas v2.0 is freely-available at: https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.
Assuntos
Bactérias , Eucariotos , Biologia Marinha , Plâncton , Bactérias/genética , Eucariotos/genética , Metagenoma , Filogenia , Plâncton/genéticaRESUMO
Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA viruses are insufficiently studied outside disease settings. In this study, we analyzed ≈28 terabases of Global Ocean RNA sequences to expand Earth's RNA virus catalogs and their taxonomy, investigate their evolutionary origins, and assess their marine biogeography from pole to pole. Using new approaches to optimize discovery and classification, we identified RNA viruses that necessitate substantive revisions of taxonomy (doubling phyla and adding >50% new classes) and evolutionary understanding. "Species"-rank abundance determination revealed that viruses of the new phyla "Taraviricota," a missing link in early RNA virus evolution, and "Arctiviricota" are widespread and dominant in the oceans. These efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.