Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-36748495

RESUMEN

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Ácidos Grasos , Análisis de Secuencia de ADN , Reproducibilidad de los Resultados , ARN Ribosómico 16S/genética , Filogenia , Composición de Base , ADN Bacteriano/genética , Técnicas de Tipificación Bacteriana , Ácidos Grasos/química
2.
Nucleic Acids Res ; 48(D1): D9-D16, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31602479

RESUMEN

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Biología Computacional/métodos , Biología Computacional/organización & administración , Bases de Datos Genéticas , National Library of Medicine (U.S.) , Bases de Datos de Ácidos Nucleicos , Genómica/métodos , Humanos , PubMed , Estados Unidos , Navegador Web
3.
Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30395293

RESUMEN

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Biotecnología/organización & administración , Bases de Datos Genéticas , Animales , Biotecnología/métodos , Bases de Datos de Compuestos Químicos , Humanos , Programas Informáticos , Estados Unidos/epidemiología , Navegador Web
4.
Genome Res ; 27(5): 849-864, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28396521

RESUMEN

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Asunto(s)
Mapeo Contig/métodos , Genoma Humano , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Genómica/normas , Haploidia , Haplotipos , Humanos , Polimorfismo Genético , Estándares de Referencia , Análisis de Secuencia de ADN/normas
5.
Bioinformatics ; 34(5): 755-759, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29069347

RESUMEN

Motivation: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches. Results: A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Availability and implementation: Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. Contact: aschaffe@helix.nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos de Ácidos Nucleicos/normas , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bacterias , Eucariontes
6.
Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26578580

RESUMEN

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Animales , Genoma , Humanos , Internet , Ratones
7.
Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24259432

RESUMEN

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Asunto(s)
Bases de Datos Genéticas , Genómica , Mamíferos/genética , Animales , Eucariontes/genética , Exones , Genoma , Genómica/normas , Humanos , Internet , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/genética , ARN/química , Estándares de Referencia
8.
Genome Biol ; 16: 13, 2015 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-25651527

RESUMEN

The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Genómica/métodos , Bases de Datos Genéticas , Humanos , Programas Informáticos
9.
Science ; 327(5963): 343-8, 2010 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-20075255

RESUMEN

We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.


Asunto(s)
Evolución Biológica , Genoma de los Insectos , Avispas/genética , Animales , Artrópodos/parasitología , Metilación de ADN , Elementos Transponibles de ADN , Femenino , Transferencia de Gen Horizontal , Genes de Insecto , Especiación Genética , Variación Genética , Interacciones Huésped-Parásitos , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Virus de Insectos/genética , Insectos/genética , Masculino , Datos de Secuencia Molecular , Sitios de Carácter Cuantitativo , Recombinación Genética , Análisis de Secuencia de ADN , Venenos de Avispas/química , Venenos de Avispas/toxicidad , Avispas/fisiología , Wolbachia/genética
10.
Science ; 324(5926): 522-8, 2009 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-19390049

RESUMEN

To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.


Asunto(s)
Evolución Biológica , Genoma , Empalme Alternativo , Animales , Animales Domésticos , Bovinos , Evolución Molecular , Femenino , Variación Genética , Humanos , Masculino , MicroARNs/genética , Datos de Secuencia Molecular , Proteínas/genética , Análisis de Secuencia de ADN , Especificidad de la Especie , Sintenía
11.
Science ; 314(5801): 941-52, 2006 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-17095691

RESUMEN

We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.


Asunto(s)
Genoma , Análisis de Secuencia de ADN , Strongylocentrotus purpuratus/genética , Animales , Calcificación Fisiológica , Moléculas de Adhesión Celular/genética , Moléculas de Adhesión Celular/fisiología , Activación de Complemento/genética , Biología Computacional , Desarrollo Embrionario/genética , Evolución Molecular , Regulación del Desarrollo de la Expresión Génica , Genes , Inmunidad Innata/genética , Factores Inmunológicos/genética , Factores Inmunológicos/fisiología , Masculino , Fenómenos Fisiológicos del Sistema Nervioso , Proteínas/genética , Proteínas/fisiología , Transducción de Señal , Strongylocentrotus purpuratus/embriología , Strongylocentrotus purpuratus/inmunología , Strongylocentrotus purpuratus/fisiología , Factores de Transcripción/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA