Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 57
Filter
1.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37994677

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Subject(s)
Databases, Genetic , National Library of Medicine (U.S.) , Biotechnology/instrumentation , Databases, Nucleic Acid , Internet , United States
2.
Syst Biol ; 2023 Nov 13.
Article in English | MEDLINE | ID: mdl-37956405

ABSTRACT

Scientific names permit humans and search engines to access knowledge about the biodiversity that surrounds us, and names linked to DNA sequences are playing an ever-greater role in search-and-match identification procedures. Here, we analyze how users and curators of the National Center for Biotechnology Information (NCBI) are flagging and curating sequences derived from nomenclatural type material, which is the only way to improve the quality of DNA-based identification in the long run. For prokaryotes, 18,281 genome assemblies from type strains have been curated by NCBI staff and improve the quality of prokaryote naming. For Fungi, type-derived sequences representing over 21,000 species are now essential for fungus naming and identification. For the remaining eukaryotes, however, the numbers of sequences identifiable as type-derived are minuscule, representing only 1,000 species of arthropods, 8,441 vertebrates, and 430 embryophytes. An increase in the production and curation of such sequences will come from (i) sequencing of types or topotypic specimens in museum collections, (ii) the March 2023 rule changes at the International Nucleotide Sequence Database Collaboration requiring more metadata for specimens, and (iii) efforts by data submitters to facilitate curation, including informing NCBI curators about a specimen's type status. We illustrate different type-data submission journeys and provide best-practice examples from a range of organisms. Expanding the number of type-derived sequences in DNA databases, especially of eukaryotes, is crucial for capturing, documenting, and protecting biodiversity.

3.
Article in English | MEDLINE | ID: mdl-36748495

ABSTRACT

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.


Subject(s)
Databases, Nucleic Acid , Fatty Acids , Sequence Analysis, DNA , Reproducibility of Results , RNA, Ribosomal, 16S/genetics , Phylogeny , Base Composition , DNA, Bacterial/genetics , Bacterial Typing Techniques , Fatty Acids/chemistry
4.
Plant Dis ; 106(6): 1573-1596, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35538602

ABSTRACT

Publicly available and validated DNA reference sequences useful for phylogeny estimation and identification of fungal pathogens are an increasingly important resource in the efforts of plant protection organizations to facilitate safe international trade of agricultural commodities. Colletotrichum species are among the most frequently encountered and regulated plant pathogens at U.S. ports-of-entry. The RefSeq Targeted Loci (RTL) project at NCBI (BioProject no. PRJNA177353) contains a database of curated fungal internal transcribed spacer (ITS) sequences that interact extensively with NCBI Taxonomy, resulting in verified name-strain-sequence type associations for >12,000 species. We present a publicly available dataset of verified and curated name-type strain-sequence associations for all available Colletotrichum species. This includes an updated GenBank Taxonomy for 238 species associated with up to 11 protein coding loci and an updated RTL ITS dataset for 226 species. We demonstrate that several marker loci are well suited for phylogenetic inference and identification. We improve understanding of phylogenetic relationships among verified species, verify or improve phylogenetic circumscriptions of 14 species complexes, and reveal that determining relationships among these major clades will require additional data. We present detailed comparisons between phylogenetic and similarity-based approaches to species identification, revealing complex patterns among single marker loci that often lead to misidentification when based on single-locus similarity approaches. We also demonstrate that species-level identification is elusive for a subset of samples regardless of analytical approach, which may be explained by novel species diversity in our dataset and incomplete lineage sorting and lack of accumulated synapomorphies at these loci.


Subject(s)
Colletotrichum , Colletotrichum/genetics , Commerce , DNA , Internationality , Phylogeny
5.
Nucleic Acids Res ; 50(D1): D161-D164, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34850943

ABSTRACT

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 15.3 trillion base pairs from over 2.5 billion nucleotide sequences for 504 000 formally described species. Recent updates include resources for data from the SARS-CoV-2 virus, including a SARS-CoV-2 landing page, NCBI Datasets, NCBI Virus and the Submission Portal. We also discuss upcoming changes to GI identifiers, a new data management interface for BioProject, and advice for providing contextual metadata in submissions.


Subject(s)
Databases, Nucleic Acid , Viruses/genetics , Genome, Viral , National Library of Medicine (U.S.) , SARS-CoV-2/genetics , United States , User-Computer Interface
6.
BMC Bioinformatics ; 22(1): 400, 2021 Aug 12.
Article in English | MEDLINE | ID: mdl-34384346

ABSTRACT

BACKGROUND: The DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron. RESULTS: To improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available blastn rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the blastn browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa. CONCLUSION: Ribovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.


Subject(s)
Databases, Nucleic Acid , RNA, Ribosomal , DNA, Ribosomal , Phylogeny , RNA, Ribosomal, 16S/genetics , RNA, Ribosomal, 18S/genetics , Sequence Analysis, RNA
7.
IMA Fungus ; 12(1): 11, 2021 May 03.
Article in English | MEDLINE | ID: mdl-33934723

ABSTRACT

It is now a decade since The International Commission on the Taxonomy of Fungi (ICTF) produced an overview of requirements and best practices for describing a new fungal species. In the meantime the International Code of Nomenclature for algae, fungi, and plants (ICNafp) has changed from its former name (the International Code of Botanical Nomenclature) and introduced new formal requirements for valid publication of species scientific names, including the separation of provisions specific to Fungi and organisms treated as fungi in a new Chapter F. Equally transformative have been changes in the data collection, data dissemination, and analytical tools available to mycologists. This paper provides an updated and expanded discussion of current publication requirements along with best practices for the description of new fungal species and publication of new names and for improving accessibility of their associated metadata that have developed over the last 10 years. Additionally, we provide: (1) model papers for different fungal groups and circumstances; (2) a checklist to simplify meeting (i) the requirements of the ICNafp to ensure the effective, valid and legitimate publication of names of new taxa, and (ii) minimally accepted standards for description; and, (3) templates for preparing standardized species descriptions.

9.
Nat Microbiol ; 6(5): 540-548, 2021 05.
Article in English | MEDLINE | ID: mdl-33903746

ABSTRACT

The identification and proper naming of microfungi, in particular plant, animal and human pathogens, remains challenging. Molecular identification is becoming the default approach for many fungal groups, and environmental metabarcoding is contributing an increasing amount of sequence data documenting fungal diversity on a global scale. This includes lineages represented only by sequence data. At present, these taxa cannot be formally described under the current nomenclature rules. By considering approaches used in bacterial taxonomy, we propose solutions for the nomenclature of taxa known only from sequences to facilitate consistent reporting and communication in the literature and public sequence repositories.


Subject(s)
Fungi/classification , Fungi/isolation & purification , Animals , DNA, Fungal/genetics , Environmental Microbiology , Fungi/genetics , Humans , Mycoses/microbiology , Plant Diseases/microbiology , Sequence Analysis, DNA , Terminology as Topic
10.
Nucleic Acids Res ; 49(D1): D92-D96, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33196830

ABSTRACT

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Animals , COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , Pandemics
11.
Database (Oxford) ; 20202020 01 01.
Article in English | MEDLINE | ID: mdl-32761142

ABSTRACT

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.


Subject(s)
Classification , Database Management Systems , Databases, Genetic , Animals , Bacteria/genetics , Humans , National Library of Medicine (U.S.) , Plants/genetics , United States , Viruses/genetics
12.
IMA Fungus ; 11: 14, 2020.
Article in English | MEDLINE | ID: mdl-32714773

ABSTRACT

True fungi (Fungi) and fungus-like organisms (e.g. Mycetozoa, Oomycota) constitute the second largest group of organisms based on global richness estimates, with around 3 million predicted species. Compared to plants and animals, fungi have simple body plans with often morphologically and ecologically obscure structures. This poses challenges for accurate and precise identifications. Here we provide a conceptual framework for the identification of fungi, encouraging the approach of integrative (polyphasic) taxonomy for species delimitation, i.e. the combination of genealogy (phylogeny), phenotype (including autecology), and reproductive biology (when feasible). This allows objective evaluation of diagnostic characters, either phenotypic or molecular or both. Verification of identifications is crucial but often neglected. Because of clade-specific evolutionary histories, there is currently no single tool for the identification of fungi, although DNA barcoding using the internal transcribed spacer (ITS) remains a first diagnosis, particularly in metabarcoding studies. Secondary DNA barcodes are increasingly implemented for groups where ITS does not provide sufficient precision. Issues of pairwise sequence similarity-based identifications and OTU clustering are discussed, and multiple sequence alignment-based phylogenetic approaches with subsequent verification are recommended as more accurate alternatives. In metabarcoding approaches, the trade-off between speed and accuracy and precision of molecular identifications must be carefully considered. Intragenomic variation of the ITS and other barcoding markers should be properly documented, as phylotype diversity is not necessarily a proxy of species richness. Important strategies to improve molecular identification of fungi are: (1) broadly document intraspecific and intragenomic variation of barcoding markers; (2) substantially expand sequence repositories, focusing on undersampled clades and missing taxa; (3) improve curation of sequence labels in primary repositories and substantially increase the number of sequences based on verified material; (4) link sequence data to digital information of voucher specimens including imagery. In parallel, technological improvements to genome sequencing offer promising alternatives to DNA barcoding in the future. Despite the prevalence of DNA-based fungal taxonomy, phenotype-based approaches remain an important strategy to catalog the global diversity of fungi and establish initial species hypotheses.

13.
Phytobiomes J ; 4(2): 103-114, 2020.
Article in English | MEDLINE | ID: mdl-35265781

ABSTRACT

Species names are fundamental to managing biological information. The surge of interest in microbial diversity has resulted in an increase in the number of microbes that need to be identified and assigned a species name. This article provides an introduction to the principles of DNA-based identification of Archaea and Bacteria traditionally known as prokaryotes, and Fungi, the Oomycetes and other protists, collectively referred to as fungi. The prokaryotes and fungi are the most commonly studied microbes from plants, and we introduce the most relevant concepts of prokaryote and fungal taxonomy and nomenclature. We first explain how prokaryote and fungal species are defined, delimited, and named, and then summarize the criteria and methods used to identify prokaryote and fungal organisms to species.

15.
J Eukaryot Microbiol ; 66(1): 4-119, 2019 01.
Article in English | MEDLINE | ID: mdl-30257078

ABSTRACT

This revision of the classification of eukaryotes follows that of Adl et al., 2012 [J. Euk. Microbiol. 59(5)] and retains an emphasis on protists. Changes since have improved the resolution of many nodes in phylogenetic analyses. For some clades even families are being clearly resolved. As we had predicted, environmental sampling in the intervening years has massively increased the genetic information at hand. Consequently, we have discovered novel clades, exciting new genera and uncovered a massive species level diversity beyond the morphological species descriptions. Several clades known from environmental samples only have now found their home. Sampling soils, deeper marine waters and the deep sea will continue to fill us with surprises. The main changes in this revision are the confirmation that eukaryotes form at least two domains, the loss of monophyly in the Excavata, robust support for the Haptista and Cryptista. We provide suggested primer sets for DNA sequences from environmental samples that are effective for each clade. We have provided a guide to trophic functional guilds in an appendix, to facilitate the interpretation of environmental samples, and a standardized taxonomic guide for East Asian users.


Subject(s)
Biodiversity , Eukaryota/classification , Phylogeny , Terminology as Topic
16.
Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30395293

ABSTRACT

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Subject(s)
Biotechnology/organization & administration , Databases, Genetic , Animals , Biotechnology/methods , Databases, Chemical , Humans , Software , United States/epidemiology , Web Browser
17.
Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29792589

ABSTRACT

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.


Subject(s)
Databases, Nucleic Acid , Genome, Archaeal , Genome, Bacterial , Nucleotides/genetics , Phylogeny , Base Composition , Prokaryotic Cells , Sequence Analysis, DNA
18.
Database (Oxford) ; 20182018 01 01.
Article in English | MEDLINE | ID: mdl-29688360

ABSTRACT

The rapidly growing set of GenBank submissions includes sequences that are derived from vouchered specimens. These are associated with culture collections, museums, herbaria and other natural history collections, both living and preserved. Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses. The National Center for Biotechnology Information BioCollections Database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows cross-linking from the home institution for quick identification of all records originating from each collection. Database URL: https://www.ncbi.nlm.nih.gov/biocollections


Subject(s)
Data Accuracy , Databases, Factual , National Library of Medicine (U.S.) , United States
19.
Database (Oxford) ; 20172017 01 01.
Article in English | MEDLINE | ID: mdl-29220466

ABSTRACT

The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL:http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353


Subject(s)
Databases, Nucleic Acid , Fungal Proteins/genetics , Trichoderma/classification , Trichoderma/genetics
20.
Article in English | MEDLINE | ID: mdl-27481788

ABSTRACT

The fungal kingdom is a hyperdiverse group of multicellular eukaryotes with profound impacts on human society and ecosystem function. The challenge of documenting and describing fungal diversity is exacerbated by their typically cryptic nature, their ability to produce seemingly unrelated morphologies from a single individual and their similarity in appearance to distantly related taxa. This multiplicity of hurdles resulted in the early adoption of DNA-based comparisons to study fungal diversity, including linking curated DNA sequence data to expertly identified voucher specimens. DNA-barcoding approaches in fungi were first applied in specimen-based studies for identification and discovery of taxonomic diversity, but are now widely deployed for community characterization based on sequencing of environmental samples. Collectively, fungal barcoding approaches have yielded important advances across biological scales and research applications, from taxonomic, ecological, industrial and health perspectives. A major outstanding issue is the growing problem of 'sequences without names' that are somewhat uncoupled from the traditional framework of fungal classification based on morphology and preserved specimens. This review summarizes some of the most significant impacts of fungal barcoding, its limitations, and progress towards the challenge of effective utilization of the exponentially growing volume of data gathered from high-throughput sequencing technologies.This article is part of the themed issue 'From DNA barcodes to biomes'.


Subject(s)
Biodiversity , DNA Barcoding, Taxonomic , DNA, Fungal/genetics , Fungi/classification , DNA, Ribosomal Spacer/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...