Search | VHL Regional Portal

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Bolton, Evan E; Brister, J Rodney; Chan, Jessica; Comeau, Donald C; Connor, Ryan; DiCuccio, Michael; Farrell, Catherine M; Feldgarden, Michael; Fine, Anna M; Funk, Kathryn; Hatcher, Eneida; Hoeppner, Marilu; Kane, Megan; Kannan, Sivakumar; Katz, Kenneth S; Kelly, Christopher; Klimke, William; Kim, Sunghwan; Kimchi, Avi; Landrum, Melissa; Lathrop, Stacy; Lu, Zhiyong; Malheiro, Adriana; Marchler-Bauer, Aron; Murphy, Terence D; Phan, Lon; Prasad, Arjun B; Pujar, Shashikant; Sawyer, Amanda; Schmieder, Erin; Schneider, Valerie A; Schoch, Conrad L; Sharma, Shobha; Thibaud-Nissen, Françoise; Trawick, Barton W; Venkatapathi, Thilakam; Wang, Jiyao; Pruitt, Kim D; Sherry, Stephen T.

Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37994677

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.

Subject(s)

Databases, Genetic , National Library of Medicine (U.S.) , Biotechnology/instrumentation , Databases, Nucleic Acid , Internet , United States

Collection and curation of prokaryotic genome assemblies from type strains at NCBI.

Kannan, Sivakumar; Sharma, Shobha; Ciufo, Stacy; Clark, Karen; Turner, Seán; Kitts, Paul A; Schoch, Conrad L; DiCuccio, Michael; Kimchi, Avi.

Int J Syst Evol Microbiol ; 73(1)2023 Feb.

Article in English | MEDLINE | ID: mdl-36748495

ABSTRACT

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.

Subject(s)

Databases, Nucleic Acid , Fatty Acids , Sequence Analysis, DNA , Reproducibility of Results , RNA, Ribosomal, 16S/genetics , Phylogeny , Base Composition , DNA, Bacterial/genetics , Bacterial Typing Techniques , Fatty Acids/chemistry

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Beck, Jeff; Brister, J Rodney; Bolton, Evan E; Canese, Kathi; Comeau, Donald C; Funk, Kathryn; Ketter, Anne; Kim, Sunghwan; Kimchi, Avi; Kitts, Paul A; Kuznetsov, Anatoliy; Lathrop, Stacy; Lu, Zhiyong; McGarvey, Kelly; Madden, Thomas L; Murphy, Terence D; O'Leary, Nuala; Phan, Lon; Schneider, Valerie A; Thibaud-Nissen, Françoise; Trawick, Bart W; Pruitt, Kim D; Ostell, James.

Nucleic Acids Res ; 48(D1): D9-D16, 2020 01 08.

Article in English | MEDLINE | ID: mdl-31602479

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Computational Biology/methods , Computational Biology/organization & administration , Databases, Genetic , National Library of Medicine (U.S.) , Databases, Nucleic Acid , Genomics/methods , Humans , PubMed , United States , Web Browser

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Agarwala, Richa; Bolton, Evan E; Brister, J Rodney; Canese, Kathi; Clark, Karen; Connor, Ryan; Fiorini, Nicolas; Funk, Kathryn; Hefferon, Timothy; Holmes, J Bradley; Kim, Sunghwan; Kimchi, Avi; Kitts, Paul A; Lathrop, Stacy; Lu, Zhiyong; Madden, Thomas L; Marchler-Bauer, Aron; Phan, Lon; Schneider, Valerie A; Schoch, Conrad L; Pruitt, Kim D; Ostell, James.

Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.

Article in English | MEDLINE | ID: mdl-30395293

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Subject(s)

Biotechnology/organization & administration , Databases, Genetic , Animals , Biotechnology/methods , Databases, Chemical , Humans , Software , United States/epidemiology , Web Browser

Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI.

Ciufo, Stacy; Kannan, Sivakumar; Sharma, Shobha; Badretdin, Azat; Clark, Karen; Turner, Seán; Brover, Slava; Schoch, Conrad L; Kimchi, Avi; DiCuccio, Michael.

Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.

Article in English | MEDLINE | ID: mdl-29792589

ABSTRACT

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.

Subject(s)

Databases, Nucleic Acid , Genome, Archaeal , Genome, Bacterial , Nucleotides/genetics , Phylogeny , Base Composition , Prokaryotic Cells , Sequence Analysis, DNA

Assembly: a resource for assembled genomes at NCBI.

Kitts, Paul A; Church, Deanna M; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D; Pruitt, Kim D; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.

Article in English | MEDLINE | ID: mdl-26578580

ABSTRACT

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

Subject(s)

Databases, Nucleic Acid , Genomics , Animals , Genome , Humans , Internet , Mice

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.

Article in English | MEDLINE | ID: mdl-26553804

ABSTRACT

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

Subject(s)

Databases, Genetic , Genomics , Animals , Cattle , Gene Expression Profiling , Genome, Fungal , Genome, Human , Genome, Microbial , Genome, Plant , Genome, Viral , Genomics/standards , Humans , Invertebrates/genetics , Mice , Molecular Sequence Annotation , Nematoda/genetics , Phylogeny , RNA, Long Noncoding/genetics , Rats , Reference Standards , Sequence Analysis, Protein , Sequence Analysis, RNA , Vertebrates/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL