Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-36748495

RESUMEN

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Ácidos Grasos , Análisis de Secuencia de ADN , Reproducibilidad de los Resultados , ARN Ribosómico 16S/genética , Filogenia , Composición de Base , ADN Bacteriano/genética , Técnicas de Tipificación Bacteriana , Ácidos Grasos/química
2.
Int J Syst Evol Microbiol ; 68(7): 2386-2392, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29792589

RESUMEN

Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Nucleótidos/genética , Filogenia , Composición de Base , Células Procariotas , Análisis de Secuencia de ADN
3.
Nucleic Acids Res ; 44(D1): D733-45, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26553804

RESUMEN

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Bovinos , Perfilación de la Expresión Génica , Genoma Fúngico , Genoma Humano , Genoma Microbiano , Genoma de Planta , Genoma Viral , Genómica/normas , Humanos , Invertebrados/genética , Ratones , Anotación de Secuencia Molecular , Nematodos/genética , Filogenia , ARN Largo no Codificante/genética , Ratas , Estándares de Referencia , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Vertebrados/genética
4.
Nucleic Acids Res ; 43(Database issue): D599-605, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25510495

RESUMEN

NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10,000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30,000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Internet , Anotación de Secuencia Molecular
5.
BMC Bioinformatics ; 17 Suppl 8: 276, 2016 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-27586436

RESUMEN

BACKGROUND: Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. RESULTS: Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. CONCLUSION: The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.


Asunto(s)
Proteínas Bacterianas/metabolismo , Genoma Microbiano , Algoritmos , Análisis por Conglomerados , Guanosina Trifosfato/metabolismo , Humanos , Filogenia , Estadística como Asunto
6.
Nucleic Acids Res ; 42(Database issue): D553-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24316578

RESUMEN

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.


Asunto(s)
Bases de Datos Genéticas , Genoma Microbiano , Anotación de Secuencia Molecular , Proteínas Bacterianas/genética , Genoma Bacteriano , Genómica/normas , Internet , Estándares de Referencia
7.
Nucleic Acids Res ; 37(Database issue): D216-23, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940865

RESUMEN

Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/clasificación , Análisis por Conglomerados , Genómica , Proteínas/química , Proteínas/genética , Homología de Secuencia de Aminoácido
8.
Database (Oxford) ; 20202020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-32761142

RESUMEN

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.


Asunto(s)
Clasificación , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Animales , Bacterias/genética , Humanos , National Library of Medicine (U.S.) , Plantas/genética , Estados Unidos , Virus/genética
9.
Database (Oxford) ; 20182018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29688360

RESUMEN

The rapidly growing set of GenBank submissions includes sequences that are derived from vouchered specimens. These are associated with culture collections, museums, herbaria and other natural history collections, both living and preserved. Correct identification of the specimens studied, along with a method to associate the sample with its institution, is critical to the outcome of related studies and analyses. The National Center for Biotechnology Information BioCollections Database was established to allow the association of specimen vouchers and related sequence records to their home institutions. This process also allows cross-linking from the home institution for quick identification of all records originating from each collection. Database URL: https://www.ncbi.nlm.nih.gov/biocollections


Asunto(s)
Exactitud de los Datos , Bases de Datos Factuales , National Library of Medicine (U.S.) , Estados Unidos
10.
Database (Oxford) ; 20172017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29220466

RESUMEN

The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL:http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Proteínas Fúngicas/genética , Trichoderma/clasificación , Trichoderma/genética
11.
Artículo en Inglés | MEDLINE | ID: mdl-27114493

RESUMEN

Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Interfaz Usuario-Computador
13.
Nature ; 416(6882): 767-9, 2002 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-11961561

RESUMEN

Microorganisms that use insoluble Fe(III) oxide as an electron acceptor can have an important function in the carbon and nutrient cycles of aquatic sediments and in the bioremediation of organic and metal contaminants in groundwater. Although Fe(III) oxides are often abundant, Fe(III)-reducing microbes are faced with the problem of how to access effectively an electron acceptor that can not diffuse to the cell. Fe(III)-reducing microorganisms in the genus Shewanella have resolved this problem by releasing soluble quinones that can carry electrons from the cell surface to Fe(III) oxide that is at a distance from the cell. Here we report that another Fe(III)-reducer, Geobacter metallireducens, has an alternative strategy for accessing Fe(III) oxides. Geobacter metallireducens specifically expresses flagella and pili only when grown on insoluble Fe(III) or Mn(IV) oxide, and is chemotactic towards Fe(II) and Mn(II) under these conditions. These results suggest that G. metallireducens senses when soluble electron acceptors are depleted and then synthesizes the appropriate appendages to permit it to search for, and establish contact with, insoluble Fe(III) or Mn(IV) oxide. This approach to the use of an insoluble electron acceptor may explain why Geobacter species predominate over other Fe(III) oxide-reducing microorganisms in a wide variety of sedimentary environments.


Asunto(s)
Quimiotaxis , Deltaproteobacteria/citología , Deltaproteobacteria/metabolismo , Compuestos Férricos/química , Compuestos Férricos/metabolismo , Proteínas Fimbrias , Proteínas Bacterianas/genética , Proteínas de Unión al ADN/genética , Deltaproteobacteria/genética , Deltaproteobacteria/fisiología , Compuestos Ferrosos/metabolismo , Fimbrias Bacterianas/genética , Fimbrias Bacterianas/fisiología , Flagelos/fisiología , Compuestos de Manganeso/metabolismo , Movimiento , Oxidación-Reducción , Óxidos/metabolismo , Solubilidad
14.
Nature ; 415(6869): 312-5, 2002 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-11797006

RESUMEN

The search for extraterrestrial life may be facilitated if ecosystems can be found on Earth that exist under conditions analogous to those present on other planets or moons. It has been proposed, on the basis of geochemical and thermodynamic considerations, that geologically derived hydrogen might support subsurface microbial communities on Mars and Europa in which methanogens form the base of the ecosystem. Here we describe a unique subsurface microbial community in which hydrogen-consuming, methane-producing Archaea far outnumber the Bacteria. More than 90% of the 16S ribosomal DNA sequences recovered from hydrothermal waters circulating through deeply buried igneous rocks in Idaho are related to hydrogen-using methanogenic microorganisms. Geochemical characterization indicates that geothermal hydrogen, not organic carbon, is the primary energy source for this methanogen-dominated microbial community. These results demonstrate that hydrogen-based methanogenic communities do occur in Earth's subsurface, providing an analogue for possible subsurface microbial ecosystems on other planets.


Asunto(s)
Ecosistema , Euryarchaeota/aislamiento & purificación , Bacterias/clasificación , Bacterias/aislamiento & purificación , Bacterias/metabolismo , ADN de Archaea/aislamiento & purificación , ADN Bacteriano/aislamiento & purificación , Euryarchaeota/clasificación , Euryarchaeota/metabolismo , Exobiología , Hidrógeno/metabolismo , Datos de Secuencia Molecular , Filogenia , ARN Ribosómico 16S/genética , Microbiología del Agua
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA