RESUMO
Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Metagenômica/métodos , Vírus/genética , Biologia Computacional/métodos , Variação Genética , Genoma Viral , Interações Hospedeiro-Patógeno , Humanos , Interface Usuário-Computador , Proteínas Virais/genética , Proteínas Virais/metabolismo , Vírus/metabolismo , NavegadorRESUMO
This article is a summary of the activities of the ICTV's Bacterial and Archaeal Viruses Subcommittee for the years 2018 and 2019. Highlights include the creation of a new order, 10 families, 22 subfamilies, 424 genera and 964 species. Some of our concerns about the ICTV's ability to adjust to and incorporate new DNA- and protein-based taxonomic tools are discussed.
Assuntos
Vírus de Archaea/classificação , Bacteriófagos/classificação , Classificação/métodos , Archaea/virologia , Bactérias/virologiaRESUMO
Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world's biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.