RESUMEN
Profile hidden Markov models (pHMMs) are able to achieve high sensitivity in remote homology search, making them popular choices for detecting novel or highly diverged viruses in metagenomic data. However, many existing pHMM databases have different design focuses, making it difficult for users to decide the proper one to use. In this review, we provide a thorough evaluation and comparison for multiple commonly used profile HMM databases for viral sequence discovery in metagenomic data. We characterized the databases by comparing their sizes, their taxonomic coverage, and the properties of their models using quantitative metrics. Subsequently, we assessed their performance in virus identification across multiple application scenarios, utilizing both simulated and real metagenomic data. We aim to offer researchers a thorough and critical assessment of the strengths and limitations of different databases. Furthermore, based on the experimental results obtained from the simulated and real metagenomic data, we provided practical suggestions for users to optimize their use of pHMM databases, thus enhancing the quality and reliability of their findings in the field of viral metagenomics.
Asunto(s)
Cadenas de Markov , Metagenómica , Virus , Metagenómica/métodos , Virus/genética , Virus/clasificación , Bases de Datos Genéticas , Humanos , Biología Computacional/métodos , AlgoritmosRESUMEN
Segmented RNA viruses are a complex group of RNA viruses with multisegment genomes. Reconstructing complete segmented viruses is crucial for advancing our understanding of viral diversity, evolution, and public health impact. Using metatranscriptomic data to identify known and novel segmented viruses has sped up the survey of segmented viruses in various ecosystems. However, the high genetic diversity and the difficulty in binning complete segmented genomes present significant challenges in segmented virus reconstruction. Current virus detection tools are primarily used to identify nonsegmented viral genomes. This study presents SegVir, a novel tool designed to identify segmented RNA viruses and reconstruct their complete genomes from complex metatranscriptomes. SegVir leverages both close and remote homology searches to accurately detect conserved and divergent viral segments. Additionally, we introduce a new method that can evaluate the genome completeness and conservation based on gene content. Our evaluations on simulated datasets demonstrate SegVir's superior sensitivity and precision compared to existing tools. Moreover, in experiments using real data, we identified some virus segments missing in the NCBI database, underscoring SegVir's potential to enhance viral metagenome analysis. The source code and supporting data of SegVir are available via https://github.com/HubertTang/SegVir.
Asunto(s)
Genoma Viral , Virus ARN , Virus ARN/genética , Transcriptoma , ARN Viral/genética , Programas Informáticos , Metagenoma , Metagenómica/métodosRESUMEN
RNA viruses exhibit vast phylogenetic diversity and can significantly impact public health and agriculture. However, current bioinformatics tools for viral discovery from metagenomic data frequently generate false positive virus results, overestimate viral diversity, and misclassify virus sequences. Additionally, current tools often fail to determine virus-host associations, which hampers investigation of the potential threat posed by a newly detected virus. To address these issues we developed VirID, a software tool specifically designed for the discovery and characterization of RNA viruses from metagenomic data. The basis of VirID is a comprehensive RNA-dependent RNA polymerase database to enhance a workflow that includes RNA virus discovery, phylogenetic analysis, and phylogeny-based virus characterization. Benchmark tests on a simulated data set demonstrated that VirID had high accuracy in profiling viruses and estimating viral richness. In evaluations with real-world samples, VirID was able to identify RNA viruses of all types, but also provided accurate estimations of viral genetic diversity and virus classification, as well as comprehensive insights into virus associations with humans, animals, and plants. VirID therefore offers a robust tool for virus discovery and serves as a valuable resource in basic virological studies, pathogen surveillance, and early warning systems for infectious disease outbreaks.
Asunto(s)
Metagenómica , Filogenia , Virus ARN , Programas Informáticos , Virus ARN/genética , Metagenómica/métodos , Humanos , ARN Polimerasa Dependiente del ARN/genética , Biología Computacional/métodosRESUMEN
Bacteriophages (or phages), which infect bacteria, have two distinct lifestyles: virulent and temperate. Predicting the lifestyle of phages helps decipher their interactions with their bacterial hosts, aiding phages' applications in fields such as phage therapy. Because experimental methods for annotating the lifestyle of phages cannot keep pace with the fast accumulation of sequenced phages, computational method for predicting phages' lifestyles has become an attractive alternative. Despite some promising results, computational lifestyle prediction remains difficult because of the limited known annotations and the sheer amount of sequenced phage contigs assembled from metagenomic data. In particular, most of the existing tools cannot precisely predict phages' lifestyles for short contigs. In this work, we develop PhaTYP (Phage TYPe prediction tool) to improve the accuracy of lifestyle prediction on short contigs. We design two different training tasks, self-supervised and fine-tuning tasks, to overcome lifestyle prediction difficulties. We rigorously tested and compared PhaTYP with four state-of-the-art methods: DeePhage, PHACTS, PhagePred and BACPHLIP. The experimental results show that PhaTYP outperforms all these methods and achieves more stable performance on short contigs. In addition, we demonstrated the utility of PhaTYP for analyzing the phage lifestyle on human neonates' gut data. This application shows that PhaTYP is a useful means for studying phages in metagenomic data and helps extend our understanding of microbial communities.
Asunto(s)
Bacteriófagos , Microbiota , Recién Nacido , Humanos , Bacteriófagos/genética , Metagenómica/métodos , Bacterias , MetagenomaRESUMEN
Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.
Asunto(s)
COVID-19 , Virus ARN , Virus , Humanos , Análisis de Secuencia de ADN/métodos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodosRESUMEN
MOTIVATION: Bacteriophages (phages for short), which prey on and replicate within bacterial cells, have a significant role in modulating microbial communities and hold potential applications in treating antibiotic resistance. The advancement of high-throughput sequencing technology contributes to the discovery of phages tremendously. However, the taxonomic classification of assembled phage contigs still faces several challenges, including high genetic diversity, lack of a stable taxonomy system and limited knowledge of phage annotations. Despite extensive efforts, existing tools have not yet achieved an optimal balance between prediction rate and accuracy. RESULTS: In this work, we develop a learning-based model named PhaGenus, which conducts genus-level taxonomic classification for phage contigs. PhaGenus utilizes a powerful Transformer model to learn the association between protein clusters and support the classification of up to 508 genera. We tested PhaGenus on four datasets in different scenarios. The experimental results show that PhaGenus outperforms state-of-the-art methods in predicting low-similarity datasets, achieving an improvement of at least 13.7%. Additionally, PhaGenus is highly effective at identifying previously uncharacterized genera that are not represented in reference databases, with an improvement of 8.52%. The analysis of the infants' gut and GOV2.0 dataset demonstrates that PhaGenus can be used to classify more contigs with higher accuracy.
Asunto(s)
Bacteriófagos , Microbiota , Humanos , Bacteriófagos/genética , Secuenciación de Nucleótidos de Alto RendimientoRESUMEN
Viruses are the most ubiquitous and diverse entities in the biome. Due to the rapid growth of newly identified viruses, there is an urgent need for accurate and comprehensive virus classification, particularly for novel viruses. Here, we present PhaGCN2, which can rapidly classify the taxonomy of viral sequences at the family level and supports the visualization of the associations of all families. We evaluate the performance of PhaGCN2 and compare it with the state-of-the-art virus classification tools, such as vConTACT2, CAT and VPF-Class, using the widely accepted metrics. The results show that PhaGCN2 largely improves the precision and recall of virus classification, increases the number of classifiable virus sequences in the Global Ocean Virome dataset (v2.0) by four times and classifies more than 90% of the Gut Phage Database. PhaGCN2 makes it possible to conduct high-throughput and automatic expansion of the database of the International Committee on Taxonomy of Viruses. The source code is freely available at https://github.com/KennthShang/PhaGCN2.0.
Asunto(s)
Virus , Virus/genética , Genoma Viral , Bases de Datos Factuales , Programas Informáticos , GenómicaRESUMEN
MOTIVATION: The microbiome of a sampled habitat often consists of microbial communities from various sources, including potential contaminants. Microbial source tracking (MST) can be used to discern the contribution of each source to the observed microbiome data, thus enabling the identification and tracking of microbial communities within a sample. Therefore, MST has various applications, from monitoring microbial contamination in clinical labs to tracing the source of pollution in environmental samples. Despite promising results in MST development, there is still room for improvement, particularly for applications where precise quantification of each source's contribution is critical. RESULTS: In this study, we introduce a novel tool called SourceID-NMF towards more precise microbial source tracking. SourceID-NMF utilizes a non-negative matrix factorization (NMF) algorithm to trace the microbial sources contributing to a target sample. By leveraging the taxa abundance in both available sources and the target sample, SourceID-NMF estimates the proportion of available sources present in the target sample. To evaluate the performance of SourceID-NMF, we conducted a series of benchmarking experiments using simulated and real data. The simulated experiments mimic realistic yet challenging scenarios for identifying highly similar sources, irrelevant sources, unknown sources, low abundance sources, and noise sources. The results demonstrate the superior accuracy of SourceID-NMF over existing methods. Particularly, SourceID-NMF accurately estimated the proportion of irrelevant and unknown sources while other tools either over- or under-estimated them. In addition, the noise sources experiment also demonstrated the robustness of SourceID-NMF for MST. AVAILABILITY AND IMPLEMENTATION: SourceID-NMF is available online at https://github.com/ZiyiHuang0708/SourceID-NMF.
Asunto(s)
Algoritmos , Microbiota , HumanosRESUMEN
SUMMARY: RNA viruses are ubiquitous across a broad spectrum of ecosystems. Therefore, beyond their significant implications for public health, RNA viruses are also key players in ecological processes. High-through sequencing has accelerated the discovery of RNA viruses. Nevertheless, many of these viruses lack taxonomic annotation, posing a challenge to functional inference and evolutionary study. In particular, virus classification at the genus level remains difficult due to the limited reference data and ambiguous boundaries between some closely related genera. We introduce VirTAXA, a robust classification tool that combines remote homology search and tree-based validation to enhance the genus-level taxonomic classification of RNA viruses. VirTAXA is able to predict the genus label of an assembled viral contig and provide evidence type for each prediction. It achieves comparable accuracy to state-of-the-art methods while assigning genus labels to a greater number of sequences. Specifically, on the Global Ocean RNA metatranscriptomic data, VirTAXA can assign genus labels for 18% more contigs than the second-best classification tool. Furthermore, we demonstrated that VirTAXA can be conveniently extended to other types of viruses. AVAILABILITY AND IMPLEMENTATION: The source code and data of VirTAXA are available via https://github.com/JudithEllyn/VirTAXA.
Asunto(s)
Virus ARN , Programas Informáticos , Virus ARN/genética , Virus ARN/clasificación , ARN Viral/genética , Filogenia , Análisis de Secuencia de ARN/métodos , Genoma Viral , Algoritmos , Biología Computacional/métodosRESUMEN
MOTIVATION: Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction prediction, and protein structure prediction. However, existing protein embedding methods are often computationally expensive due to their large number of parameters, which can reach millions or even billions. The growing availability of large-scale protein datasets and the need for efficient analysis tools have created a pressing demand for efficient protein embedding methods. RESULTS: We propose a novel protein embedding approach based on multi-teacher distillation learning, which leverages the knowledge of multiple pre-trained protein embedding models to learn a compact and informative representation of proteins. Our method achieves comparable performance to state-of-the-art methods while significantly reducing computational costs and resource requirements. Specifically, our approach reduces computational time by â¼70% and maintains ±1.5% accuracy as the original large models. This makes our method well-suited for large-scale protein analysis and enables the bioinformatics community to perform protein embedding tasks more efficiently. AVAILABILITY AND IMPLEMENTATION: The source code of MTDP is available via https://github.com/KennthShang/MTDP.
Asunto(s)
Biología Computacional , Proteínas , Proteínas/química , Biología Computacional/métodos , Bases de Datos de Proteínas , Aprendizaje Automático , AlgoritmosRESUMEN
Plasmids are mobile genetic elements that carry important accessory genes. Cataloging plasmids is a fundamental step to elucidate their roles in promoting horizontal gene transfer between bacteria. Next generation sequencing (NGS) is the main source for discovering new plasmids today. However, NGS assembly programs tend to return contigs, making plasmid detection difficult. This problem is particularly grave for metagenomic assemblies, which contain short contigs of heterogeneous origins. Available tools for plasmid contig detection still suffer from some limitations. In particular, alignment-based tools tend to miss diverged plasmids while learning-based tools often have lower precision. In this work, we develop a plasmid detection tool PLASMe that capitalizes on the strength of alignment and learning-based methods. Closely related plasmids can be easily identified using the alignment component in PLASMe while diverged plasmids can be predicted using order-specific Transformer models. By encoding plasmid sequences as a language defined on the protein cluster-based token set, Transformer can learn the importance of proteins and their correlation through positionally token embedding and the attention mechanism. We compared PLASMe and other tools on detecting complete plasmids, plasmid contigs, and contigs assembled from CAMI2 simulated data. PLASMe achieved the highest F1-score. After validating PLASMe on data with known labels, we also tested it on real metagenomic and plasmidome data. The examination of some commonly used marker genes shows that PLASMe exhibits more reliable performance than other tools.
Asunto(s)
Genoma Bacteriano , Programas Informáticos , Plásmidos/genética , Metagenoma , Metagenómica/métodos , Análisis de Secuencia de ADN/métodosRESUMEN
Prokaryotic viruses, which infect bacteria and archaea, are key players in microbial communities. Predicting the hosts of prokaryotic viruses helps decipher the dynamic relationship between microbes. Experimental methods for host prediction cannot keep pace with the fast accumulation of sequenced phages. Thus, there is a need for computational host prediction. Despite some promising results, computational host prediction remains a challenge because of the limited known interactions and the sheer amount of sequenced phages by high-throughput sequencing technologies. The state-of-the-art methods can only achieve 43% accuracy at the species level. In this work, we formulate host prediction as link prediction in a knowledge graph that integrates multiple protein and DNA-based sequence features. Our implementation named CHERRY can be applied to predict hosts for newly discovered viruses and to identify viruses infecting targeted bacteria. We demonstrated the utility of CHERRY for both applications and compared its performance with 11 popular host prediction methods. To our best knowledge, CHERRY has the highest accuracy in identifying virus-prokaryote interactions. It outperforms all the existing methods at the species level with an accuracy increase of 37%. In addition, CHERRY's performance on short contigs is more stable than other tools.
Asunto(s)
Bacteriófagos , Virus , Bacterias , Bacteriófagos/genética , ADN , Células Procariotas , Virus/genéticaRESUMEN
With advances in library construction protocols and next-generation sequencing technologies, viral metagenomic sequencing has become the major source for novel virus discovery. Conducting taxonomic classification for metagenomic data is an important means to characterize the viral composition in the underlying samples. However, RNA viruses are abundant and highly diverse, jeopardizing the sensitivity of comparison-based classification methods. To improve the sensitivity of read-level taxonomic classification, we developed an RNA-dependent RNA polymerase (RdRp) gene-based read classification tool RdRpBin. It combines alignment-based strategy with machine learning models in order to fully exploit the sequence properties of RdRp. We tested our method and compared its performance with the state-of-the-art tools on the simulated and real sequencing data. RdRpBin competes favorably with all. In particular, when the query RNA viruses share low sequence similarity with the known viruses ($\sim 0.4$), our tool can still maintain a higher F-score than the state-of-the-art tools. The experimental results on real data also showed that RdRpBin can classify more RNA viral reads with a relatively low false-positive rate. Thus, RdRpBin can be utilized to classify novel and diverged RNA viruses.
Asunto(s)
Virus ARN , Virus , Metagenoma , Metagenómica/métodos , Virus ARN/genética , ARN Polimerasa Dependiente del ARN/genética , Virus/genéticaRESUMEN
MOTIVATION: Bacteriophages are viruses infecting bacteria. Being key players in microbial communities, they can regulate the composition/function of microbiome by infecting their bacterial hosts and mediating gene transfer. Recently, metagenomic sequencing, which can sequence all genetic materials from various microbiome, has become a popular means for new phage discovery. However, accurate and comprehensive detection of phages from the metagenomic data remains difficult. High diversity/abundance, and limited reference genomes pose major challenges for recruiting phage fragments from metagenomic data. Existing alignment-based or learning-based models have either low recall or precision on metagenomic data. RESULTS: In this work, we adopt the state-of-the-art language model, Transformer, to conduct contextual embedding for phage contigs. By constructing a protein-cluster vocabulary, we can feed both the protein composition and the proteins' positions from each contig into the Transformer. The Transformer can learn the protein organization and associations using the self-attention mechanism and predicts the label for test contigs. We rigorously tested our developed tool named PhaMer on multiple datasets with increasing difficulty, including quality RefSeq genomes, short contigs, simulated metagenomic data, mock metagenomic data and the public IMG/VR dataset. All the experimental results show that PhaMer outperforms the state-of-the-art tools. In the real metagenomic data experiment, PhaMer improves the F1-score of phage detection by 27%.
Asunto(s)
Bacteriófagos , Microbiota , Bacterias/genética , Bacteriófagos/genética , Metagenoma , Metagenómica/métodosRESUMEN
MOTIVATION: With advances in metagenomic sequencing technologies, there are accumulating studies revealing the associations between the human gut microbiome and some human diseases. These associations shed light on using gut microbiome data to distinguish case and control samples of a specific disease, which is also called host disease status classification. Importantly, using learning-based models to distinguish the disease and control samples is expected to identify important biomarkers more accurately than abundance-based statistical analysis. However, available tools have not fully addressed two challenges associated with this task: limited labeled microbiome data and decreased accuracy in cross-studies. The confounding factors, such as the diet, technical biases in sample collection/sequencing across different studies/cohorts often jeopardize the generalization of the learning model. RESULTS: To address these challenges, we develop a new tool GDmicro, which combines semi-supervised learning and domain adaptation to achieve a more generalized model using limited labeled samples. We evaluated GDmicro on human gut microbiome data from 11 cohorts covering 5 different diseases. The results show that GDmicro has better performance and robustness than state-of-the-art tools. In particular, it improves the AUC from 0.783 to 0.949 in identifying inflammatory bowel disease. Furthermore, GDmicro can identify potential biomarkers with greater accuracy than abundance-based statistical analysis methods. It also reveals the contribution of these biomarkers to the host's disease status. AVAILABILITY AND IMPLEMENTATION: https://github.com/liaoherui/GDmicro.
Asunto(s)
Microbioma Gastrointestinal , Enfermedades Inflamatorias del Intestino , Microbiota , Humanos , Metagenoma , BiomarcadoresRESUMEN
MOTIVATION: RNA viruses tend to mutate constantly. While many of the variants are neutral, some can lead to higher transmissibility or virulence. Accurate assembly of complete viral genomes enables the identification of underlying variants, which are essential for studying virus evolution and elucidating the relationship between genotypes and virus properties. Recently, third-generation sequencing platforms such as Nanopore sequencers have been used for real-time virus sequencing for Ebola, Zika, coronavirus disease 2019, etc. However, their high per-base error rate prevents the accurate reconstruction of the viral genome. RESULTS: In this work, we introduce a new tool, AccuVIR, for viral genome assembly and polishing using error-prone long reads. It can better distinguish sequencing errors from true variants based on the key observation that sequencing errors can disrupt the gene structures of viruses, which usually have a high density of coding regions. Our experimental results on both simulated and real third-generation sequencing data demonstrated its superior performance on generating more accurate viral genomes than generic assembly or polish tools. AVAILABILITY AND IMPLEMENTATION: The source code and the documentation of AccuVIR are available at https://github.com/rainyrubyzhou/AccuVIR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
COVID-19 , Infección por el Virus Zika , Virus Zika , Humanos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Genoma ViralRESUMEN
SUMMARY: Without relying on cultivation, metagenomic sequencing greatly accelerated the novel RNA virus detection. However, it is not trivial to accurately identify RNA viral contigs from a mixture of species. The low content of RNA viruses in metagenomic data requires a highly specific detector, while new RNA viruses can exhibit high genetic diversity, posing a challenge for alignment-based tools. In this work, we developed VirBot, a simple yet effective RNA virus identification tool based on the protein families and the corresponding adaptive score cutoffs. We benchmarked it with seven popular tools for virus identification on both simulated and real sequencing data. VirBot shows its high specificity in metagenomic datasets and superior sensitivity in detecting novel RNA viruses. AVAILABILITY AND IMPLEMENTATION: https://github.com/GreyGuoweiChen/RNA_virus_detector. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Virus ARN , Programas Informáticos , Virus ARN/genética , Metagenoma , Metagenómica , Análisis de Secuencia de ADNRESUMEN
MOTIVATION: As prevalent extrachromosomal replicons in many bacteria, plasmids play an essential role in their hosts' evolution and adaptation. The host range of a plasmid refers to the taxonomic range of bacteria in which it can replicate and thrive. Understanding host ranges of plasmids sheds light on studying the roles of plasmids in bacterial evolution and adaptation. Metagenomic sequencing has become a major means to obtain new plasmids and derive their hosts. However, host prediction for assembled plasmid contigs still needs to tackle several challenges: different sequence compositions and copy numbers between plasmids and the hosts, high diversity in plasmids, and limited plasmid annotations. Existing tools have not yet achieved an ideal tradeoff between sensitivity and precision on metagenomic assembled contigs. RESULTS: In this work, we construct a hierarchical classification tool named HOTSPOT, whose backbone is a phylogenetic tree of the bacterial hosts from phylum to species. By incorporating the state-of-the-art language model, Transformer, in each node's taxon classifier, the top-down tree search achieves an accurate host taxonomy prediction for the input plasmid contigs. We rigorously tested HOTSPOT on multiple datasets, including RefSeq complete plasmids, artificial contigs, simulated metagenomic data, mock metagenomic data, the Hi-C dataset, and the CAMI2 marine dataset. All experiments show that HOTSPOT outperforms other popular methods. AVAILABILITY AND IMPLEMENTATION: The source code of HOTSPOT is available via: https://github.com/Orin-beep/HOTSPOT.
Asunto(s)
Metagenoma , Programas Informáticos , Filogenia , Plásmidos/genética , Metagenómica/métodos , Bacterias/genéticaRESUMEN
MOTIVATION: As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification. RESULTS: In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins. AVAILABILITY AND IMPLEMENTATION: The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP.
Asunto(s)
Bacteriófagos , Microbiota , Virión , Secuencia de Aminoácidos , BenchmarkingRESUMEN
Recent years have witnessed promising artificial intelligence (AI) applications in many disciplines, including optics, engineering, medicine, economics, and education. In particular, the synergy of AI and meta-optics has greatly benefited both fields. Meta-optics are advanced flat optics with novel functions and light-manipulation abilities. The optical properties can be engineered with a unique design to meet various optical demands. This review offers comprehensive coverage of meta-optics and artificial intelligence in synergy. After providing an overview of AI and meta-optics, we categorize and discuss the recent developments integrated by these two topics, namely AI for meta-optics and meta-optics for AI. The former describes how to apply AI to the research of meta-optics for design, simulation, optical information analysis, and application. The latter reports the development of the optical Al system and computation via meta-optics. This review will also provide an in-depth discussion of the challenges of this interdisciplinary field and indicate future directions. We expect that this review will inspire researchers in these fields and benefit the next generation of intelligent optical device design.