Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-36748495

RESUMO

The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.


Assuntos
Bases de Dados de Ácidos Nucleicos , Ácidos Graxos , Análise de Sequência de DNA , Reprodutibilidade dos Testes , RNA Ribossômico 16S/genética , Filogenia , Composição de Bases , DNA Bacteriano/genética , Técnicas de Tipagem Bacteriana , Ácidos Graxos/química
2.
Nucleic Acids Res ; 48(D1): D9-D16, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31602479

RESUMO

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Assuntos
Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados Genéticas , National Library of Medicine (U.S.) , Bases de Dados de Ácidos Nucleicos , Genômica/métodos , Humanos , PubMed , Estados Unidos , Navegador
3.
Nucleic Acids Res ; 47(D1): D23-D28, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30395293

RESUMO

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 38 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Labs and a new sequence database search. Resources that were updated in the past year include PubMed, PMC, Bookshelf, genome data viewer, Assembly, prokaryotic genomes, Genome, BioProject, dbSNP, dbVar, BLAST databases, igBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Assuntos
Biotecnologia/organização & administração , Bases de Dados Genéticas , Animais , Biotecnologia/métodos , Bases de Dados de Compostos Químicos , Humanos , Software , Estados Unidos/epidemiologia , Navegador
4.
Genome Res ; 27(5): 849-864, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28396521

RESUMO

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haploidia , Haplótipos , Humanos , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas
5.
Bioinformatics ; 34(5): 755-759, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29069347

RESUMO

Motivation: Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches. Results: A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Availability and implementation: Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. Contact: aschaffe@helix.nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Ácidos Nucleicos/normas , Análise de Sequência de DNA/métodos , Software , Bactérias , Eucariotos
6.
Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578580

RESUMO

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Genoma , Humanos , Internet , Camundongos
7.
Nucleic Acids Res ; 42(Database issue): D756-63, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24259432

RESUMO

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.nih.gov/refseq/). We report here on growth of the mammalian and human subsets, changes to NCBI's eukaryotic annotation pipeline and modifications affecting transcript and protein records. Recent changes to NCBI's eukaryotic genome annotation pipeline provide higher throughput, and the addition of RNAseq data to the pipeline results in a significant expansion of the number of transcripts and novel exons annotated on mammalian RefSeq genomes. Recent annotation changes include reporting supporting evidence for transcript records, modification of exon feature annotation and the addition of a structured report of gene and sequence attributes of biological interest. We also describe a revised protein annotation policy for alternatively spliced transcripts with more divergent predicted proteins and we summarize the current status of the RefSeqGene project.


Assuntos
Bases de Dados Genéticas , Genômica , Mamíferos/genética , Animais , Eucariotos/genética , Éxons , Genoma , Genômica/normas , Humanos , Internet , Anotação de Sequência Molecular , Proteínas/química , Proteínas/genética , RNA/química , Padrões de Referência
8.
Genome Biol ; 16: 13, 2015 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-25651527

RESUMO

The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Genômica/métodos , Bases de Dados Genéticas , Humanos , Software
9.
Science ; 327(5963): 343-8, 2010 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-20075255

RESUMO

We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.


Assuntos
Evolução Biológica , Genoma de Inseto , Vespas/genética , Animais , Artrópodes/parasitologia , Metilação de DNA , Elementos de DNA Transponíveis , Feminino , Transferência Genética Horizontal , Genes de Insetos , Especiação Genética , Variação Genética , Interações Hospedeiro-Parasita , Proteínas de Insetos/genética , Proteínas de Insetos/metabolismo , Vírus de Insetos/genética , Insetos/genética , Masculino , Dados de Sequência Molecular , Locos de Características Quantitativas , Recombinação Genética , Análise de Sequência de DNA , Venenos de Vespas/química , Venenos de Vespas/toxicidade , Vespas/fisiologia , Wolbachia/genética
10.
Science ; 324(5926): 522-8, 2009 Apr 24.
Artigo em Inglês | MEDLINE | ID: mdl-19390049

RESUMO

To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.


Assuntos
Evolução Biológica , Genoma , Processamento Alternativo , Animais , Animais Domésticos , Bovinos , Evolução Molecular , Feminino , Variação Genética , Humanos , Masculino , MicroRNAs/genética , Dados de Sequência Molecular , Proteínas/genética , Análise de Sequência de DNA , Especificidade da Espécie , Sintenia
11.
Science ; 314(5801): 941-52, 2006 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-17095691

RESUMO

We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.


Assuntos
Genoma , Análise de Sequência de DNA , Strongylocentrotus purpuratus/genética , Animais , Calcificação Fisiológica , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/fisiologia , Ativação do Complemento/genética , Biologia Computacional , Desenvolvimento Embrionário/genética , Evolução Molecular , Regulação da Expressão Gênica no Desenvolvimento , Genes , Imunidade Inata/genética , Fatores Imunológicos/genética , Fatores Imunológicos/fisiologia , Masculino , Fenômenos Fisiológicos do Sistema Nervoso , Proteínas/genética , Proteínas/fisiologia , Transdução de Sinais , Strongylocentrotus purpuratus/embriologia , Strongylocentrotus purpuratus/imunologia , Strongylocentrotus purpuratus/fisiologia , Fatores de Transcrição/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa