Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38712272

ABSTRACT

Tens of thousands of influenza sequences are deposited into the GenBank database each year. The software tool FLAN has been used by GenBank since 2007 to validate and annotate incoming influenza sequence submissions, and has been publicly available as a webserver but not as a standalone tool. VADR is a general sequence validation and annotation software package used by GenBank for Norovirus, Dengue virus and SARS-CoV-2 virus sequence processing that is available as a standalone tool. We have created VADR influenza models based on the FLAN reference sequences and adapted VADR to accurately annotate influenza sequences. VADR and FLAN show consistent results on the vast majority of influenza sequences, and when they disagree VADR is usually correct. VADR can also accurately process influenza D sequences as well as influenza A H17, H18, H19, N10 and N11 subtype sequences, which FLAN cannot. VADR 1.6.3 and the associated influenza models are now freely available for users to download and use.

2.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37994677

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , National Library of Medicine (U.S.) , Biotechnology/instrumentation , Databases, Nucleic Acid , Internet , United States
3.
Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36370100

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , Databases, Nucleic Acid , United States , National Library of Medicine (U.S.) , Sequence Alignment , Biotechnology , Internet
4.
Emerg Infect Dis ; 27(6): 1-9, 2021 06.
Article in English | MEDLINE | ID: mdl-34013862

ABSTRACT

Human respiratory syncytial virus (HRSV) is the leading viral cause of serious pediatric respiratory disease, and lifelong reinfections are common. Its 2 major subgroups, A and B, exhibit some antigenic variability, enabling HRSV to circulate annually. Globally, research has increased the number of HRSV genomic sequences available. To ensure accurate molecular epidemiology analyses, we propose a uniform nomenclature for HRSV-positive samples and isolates, and HRSV sequences, namely: HRSV/subgroup identifier/geographic identifier/unique sequence identifier/year of sampling. We also propose a template for submitting associated metadata. Universal nomenclature would help researchers retrieve and analyze sequence data to better understand the evolution of this virus.


Subject(s)
Respiratory Syncytial Virus Infections , Respiratory Syncytial Virus, Human , Child , Genetic Variation , Genotype , Humans , Molecular Epidemiology , Phylogeny , Respiratory Syncytial Virus, Human/genetics
5.
BMC Bioinformatics ; 21(1): 211, 2020 May 24.
Article in English | MEDLINE | ID: mdl-32448124

ABSTRACT

BACKGROUND: GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions. RESULTS: We developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of "alerts" that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank's submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Since March 2020, VADR has also been used to check SARS-CoV-2 sequence submissions. Other viruses with high numbers of submissions will be added incrementally. CONCLUSION: VADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.


Subject(s)
Betacoronavirus , Coronavirus Infections , Databases, Nucleic Acid , Molecular Sequence Annotation , Pandemics , Pneumonia, Viral , Software , Betacoronavirus/genetics , COVID-19 , Coronavirus Infections/genetics , DNA Viruses , Genomics , Humans , Molecular Sequence Annotation/standards , Pneumonia, Viral/genetics , SARS-CoV-2 , Viruses
6.
Nat Commun ; 10(1): 3313, 2019 07 25.
Article in English | MEDLINE | ID: mdl-31346170

ABSTRACT

FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials.


Subject(s)
Communicable Diseases/diagnosis , Databases, Nucleic Acid/standards , Genome , Access to Information , Communicable Diseases/microbiology , Databases, Nucleic Acid/organization & administration , High-Throughput Nucleotide Sequencing , Humans , United States , United States Food and Drug Administration
7.
Viruses ; 11(1)2019 01 14.
Article in English | MEDLINE | ID: mdl-30646581

ABSTRACT

RNA viruses that contain single-stranded RNA genomes of positive sense make up the largest group of pathogens infecting honey bees. Sacbrood virus (SBV) is one of the most widely distributed honey bee viruses and infects the larvae of honey bees, resulting in failure to pupate and death. Among all of the viruses infecting honey bees, SBV has the greatest number of complete genomes isolated from both European honey bees Apis mellifera and Asian honey bees A. cerana worldwide. To enhance our understanding of the evolution and pathogenicity of SBV, in this study, we present the first report of whole genome sequences of two U.S. strains of SBV. The complete genome sequences of the two U.S. SBV strains were deposited in GenBank under accession numbers: MG545286.1 and MG545287.1. Both SBV strains show the typical genomic features of the Iflaviridae family. The phylogenetic analysis of the single polyprotein coding region of the U.S. strains, and other GenBank SBV submissions revealed that SBV strains split into two distinct lineages, possibly reflecting host affiliation. The phylogenetic analysis based on the 5'UTR revealed a monophyletic clade with the deep parts of the tree occupied by SBV strains from both A. cerane and A. mellifera, and the tips of branches of the tree occupied by SBV strains from A. mellifera. The study of the cold stress on the pathogenesis of the SBV infection showed that cold stress could have profound effects on sacbrood disease severity manifested by increased mortality of infected larvae. This result suggests that the high prevalence of sacbrood disease in early spring may be due to the fluctuating temperatures during the season. This study will contribute to a better understanding of the evolution and pathogenesis of SBV infection in honey bees, and have important epidemiological relevance.


Subject(s)
Bees/virology , Genome, Viral , Insect Viruses/genetics , Phylogeny , RNA Viruses/pathogenicity , Animals , Cold-Shock Response , Genetic Variation , Insect Viruses/pathogenicity , RNA Virus Infections , RNA Viruses/genetics , United States , Whole Genome Sequencing
8.
Nucleic Acids Res ; 45(D1): D482-D490, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899678

ABSTRACT

The Virus Variation Resource is a value-added viral sequence data resource hosted by the National Center for Biotechnology Information. The resource is located at http://www.ncbi.nlm.nih.gov/genome/viruses/variation/ and includes modules for seven viral groups: influenza virus, Dengue virus, West Nile virus, Ebolavirus, MERS coronavirus, Rotavirus A and Zika virus Each module is supported by pipelines that scan newly released GenBank records, annotate genes and proteins and parse sample descriptors and then map them to controlled vocabulary. These processes in turn support a purpose-built search interface where users can select sequences based on standardized gene, protein and metadata terms. Once sequences are selected, a suite of tools for downloading data, multi-sequence alignment and tree building supports a variety of user directed activities. This manuscript describes a series of features and functionalities recently added to the Virus Variation Resource.


Subject(s)
Computational Biology/methods , Disease Outbreaks , Genetic Variation , Software , Virus Diseases/epidemiology , Virus Diseases/virology , Viruses/genetics , Databases, Genetic
9.
Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26553804

ABSTRACT

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Subject(s)
Databases, Genetic , Genomics , Animals , Cattle , Gene Expression Profiling , Genome, Fungal , Genome, Human , Genome, Microbial , Genome, Plant , Genome, Viral , Genomics/standards , Humans , Invertebrates/genetics , Mice , Molecular Sequence Annotation , Nematoda/genetics , Phylogeny , RNA, Long Noncoding/genetics , Rats , Reference Standards , Sequence Analysis, Protein , Sequence Analysis, RNA , Vertebrates/genetics
10.
Viruses ; 7(4): 2126-46, 2015 Apr 22.
Article in English | MEDLINE | ID: mdl-25912716

ABSTRACT

To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites-simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution.


Subject(s)
Chordopoxvirinae/genetics , Genetic Variation , Genome, Viral , Genomic Instability , Microsatellite Repeats , Codon, Nonsense , Computational Biology , Evolution, Molecular , Gene Deletion
11.
J Forensic Sci ; 60(2): 315-25, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25677086

ABSTRACT

Ebolaviruses are a diverse group of RNA viruses comprising five different species, four of which cause fatal hemorrhagic fever in humans. Because of their high infectivity and lethality, ebolaviruses are considered major biothreat agents. Although detection assays exist, no forensic assays are currently available. Here, we report the development of forensic assays that differentiate ebolaviruses. We performed phylogenetic analyses and identified canonical SNPs for all species, major clades and isolates. TaqMan-MGB allelic discrimination assays based on these SNPs were designed, screened against synthetic RNA templates, and validated against ebolavirus genomic RNAs. A total of 45 assays were validated to provide 100% coverage of the species and variants with additional resolution at the isolate level. These assays enabled accurate forensic analysis on 4 "unknown" ebolaviruses. Unknowns were correctly classified to species and variant. A goal of providing resolution below the isolate level was not successful. These high-resolution forensic assays allow rapid and accurate genotyping of ebolaviruses for forensic investigations.


Subject(s)
Ebolavirus/genetics , Polymorphism, Single Nucleotide , Alleles , Forensic Genetics , Genome, Viral , Phylogeny , RNA, Viral/analysis , Real-Time Polymerase Chain Reaction , Sequence Analysis
12.
J Virol ; 88(23): 13651-68, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25231308

ABSTRACT

UNLABELLED: Poxviruses are composed of large double-stranded DNA (dsDNA) genomes coding for several hundred genes whose variation has supported virus adaptation to a wide variety of hosts over their long evolutionary history. Comparative genomics has suggested that the Orthopoxvirus genus in particular has undergone reductive evolution, with the most recent common ancestor likely possessing a gene complement consisting of all genes present in any existing modern-day orthopoxvirus species, similar to the current Cowpox virus species. As orthopoxviruses adapt to new environments, the selection pressure on individual genes may be altered, driving sequence divergence and possible loss of function. This is evidenced by accumulation of mutations and loss of protein-coding open reading frames (ORFs) that progress from individual missense mutations to gene truncation through the introduction of early stop mutations (ESMs), gene fragmentation, and in some cases, a total loss of the ORF. In this study, we have constructed a whole-genome alignment for representative isolates from each Orthopoxvirus species and used it to identify the nucleotide-level changes that have led to gene content variation. By identifying the changes that have led to ESMs, we were able to determine that short indels were the major cause of gene truncations and that the genome length is inversely proportional to the number of ESMs present. We also identified the number and types of protein functional motifs still present in truncated genes to assess their functional significance. IMPORTANCE: This work contributes to our understanding of reductive evolution in poxviruses by identifying genomic remnants such as single nucleotide polymorphisms (SNPs) and indels left behind by evolutionary processes. Our comprehensive analysis of the genomic changes leading to gene truncation and fragmentation was able to detect some of the remnants of these evolutionary processes still present in orthopoxvirus genomes and suggests that these viruses are under continual adaptation due to changes in their environment. These results further our understanding of the evolutionary mechanisms that drive virus variation, allowing orthopoxviruses to adapt to particular environmental niches. Understanding the evolutionary history of these virus pathogens may help predict their future evolutionary potential.


Subject(s)
Evolution, Molecular , Genes, Viral , Genetic Variation , Genome, Viral , Orthopoxvirus/classification , Orthopoxvirus/genetics , Synteny
13.
Viruses ; 2(9): 1933-1967, 2010 Sep.
Article in English | MEDLINE | ID: mdl-21994715

ABSTRACT

Poxviruses are highly successful pathogens, known to infect a variety of hosts. The family Poxviridae includes Variola virus, the causative agent of smallpox, which has been eradicated as a public health threat but could potentially reemerge as a bioterrorist threat. The risk scenario includes other animal poxviruses and genetically engineered manipulations of poxviruses. Studies of orthologous gene sets have established the evolutionary relationships of members within the Poxviridae family. It is not clear, however, how variations between family members arose in the past, an important issue in understanding how these viruses may vary and possibly produce future threats. Using a newly developed poxvirus-specific tool, we predicted accurate gene sets for viruses with completely sequenced genomes in the genus Orthopoxvirus. Employing sensitive sequence comparison techniques together with comparison of syntenic gene maps, we established the relationships between all viral gene sets. These techniques allowed us to unambiguously identify the gene loss/gain events that have occurred over the course of orthopoxvirus evolution. It is clear that for all existing Orthopoxvirus species, no individual species has acquired protein-coding genes unique to that species. All existing species contain genes that are all present in members of the species Cowpox virus and that cowpox virus strains contain every gene present in any other orthopoxvirus strain. These results support a theory of reductive evolution in which the reduction in size of the core gene set of a putative ancestral virus played a critical role in speciation and confining any newly emerging virus species to a particular environmental (host or tissue) niche.

SELECTION OF CITATIONS
SEARCH DETAIL
...