Búsqueda | OPS/OMS Uruguay

1.

NCBI prokaryotic genome annotation pipeline.

Tatusova, Tatiana; DiCuccio, Michael; Badretdin, Azat; Chetvernin, Vyacheslav; Nawrocki, Eric P; Zaslavsky, Leonid; Lomsadze, Alexandre; Pruitt, Kim D; Borodovsky, Mark; Ostell, James.

Nucleic Acids Res ; 44(14): 6614-24, 2016 08 19.

Artículo en Inglés | MEDLINE | ID: mdl-27342282

RESUMEN

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

Asunto(s)

Genoma Bacteriano , Anotación de Secuencia Molecular , Células Procariotas/metabolismo , Bacterias/genética , Proteínas Bacterianas/química , Bases de Datos de Ácidos Nucleicos , Genes Bacterianos

2.

Assembly: a resource for assembled genomes at NCBI.

Kitts, Paul A; Church, Deanna M; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D; Pruitt, Kim D; Kimchi, Avi.

Nucleic Acids Res ; 44(D1): D73-80, 2016 Jan 04.

Artículo en Inglés | MEDLINE | ID: mdl-26578580

RESUMEN

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Genómica , Animales , Genoma , Humanos , Internet , Ratones

3.

Update on RefSeq microbial genomes resources.

Tatusova, Tatiana; Ciufo, Stacy; Federhen, Scott; Fedorov, Boris; McVeigh, Richard; O'Neill, Kathleen; Tolstoy, Igor; Zaslavsky, Leonid.

Nucleic Acids Res ; 43(Database issue): D599-605, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25510495

RESUMEN

NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10,000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30,000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach.

Asunto(s)

Bases de Datos de Ácidos Nucleicos , Genoma Arqueal , Genoma Bacteriano , Internet , Anotación de Secuencia Molecular

4.

Gene: a gene-centered information resource at NCBI.

Brown, Garth R; Hem, Vichet; Katz, Kenneth S; Ovetsky, Michael; Wallin, Craig; Ermolaeva, Olga; Tolstoy, Igor; Tatusova, Tatiana; Pruitt, Kim D; Maglott, Donna R; Murphy, Terence D.

Nucleic Acids Res ; 43(Database issue): D36-42, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25355515

RESUMEN

The National Center for Biotechnology Information's (NCBI) Gene database (www.ncbi.nlm.nih.gov/gene) integrates gene-specific information from multiple data sources. NCBI Reference Sequence (RefSeq) genomes for viruses, prokaryotes and eukaryotes are the primary foundation for Gene records in that they form the critical association between sequence and a tracked gene upon which additional functional and descriptive content is anchored. Additional content is integrated based on the genomic location and RefSeq transcript and protein sequence data. The content of a Gene record represents the integration of curation and automated processing from RefSeq, collaborating model organism databases, consortia such as Gene Ontology, and other databases within NCBI. Records in Gene are assigned unique, tracked integers as identifiers. The content (citations, nomenclature, genomic location, gene products and their attributes, phenotypes, sequences, interactions, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities and Entrez Direct) and for bulk transfer by FTP.

Asunto(s)

Bases de Datos Genéticas , Genes , Variación Genética , Genómica , Internet , National Library of Medicine (U.S.) , Fenotipo , Estados Unidos

5.

Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana.

BMC Bioinformatics ; 17 Suppl 8: 276, 2016 Aug 31.

Artículo en Inglés | MEDLINE | ID: mdl-27586436

RESUMEN

BACKGROUND: Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. RESULTS: Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. CONCLUSION: The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

Asunto(s)

Proteínas Bacterianas/metabolismo , Genoma Microbiano , Algoritmos , Análisis por Conglomerados , Guanosina Trifosfato/metabolismo , Humanos , Filogenia , Estadística como Asunto

6.

RefSeq microbial genomes database: new representation and annotation strategy.

Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor.

Nucleic Acids Res ; 42(Database issue): D553-9, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24316578

RESUMEN

The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.

Asunto(s)

Bases de Datos Genéticas , Genoma Microbiano , Anotación de Secuencia Molecular , Proteínas Bacterianas/genética , Genoma Bacteriano , Genómica/normas , Internet , Estándares de Referencia

7.

Virus Variation Resource--recent updates and future directions.

Brister, J Rodney; Bao, Yiming; Zhdanov, Sergey A; Ostapchuck, Yuri; Chetvernin, Vyacheslav; Kiryutin, Boris; Zaslavsky, Leonid; Kimelman, Michael; Tatusova, Tatiana A.

Nucleic Acids Res ; 42(Database issue): D660-5, 2014 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-24304891

RESUMEN

Virus Variation (http://www.ncbi.nlm.nih.gov/genomes/VirusVariation/) is a comprehensive, web-based resource designed to support the retrieval and display of large virus sequence datasets. The resource includes a value added database, a specialized search interface and a suite of sequence data displays. Virus-specific sequence annotation and database loading pipelines produce consistent protein and gene annotation and capture sequence descriptors from sequence records then map these metadata to a controlled vocabulary. The database supports a metadata driven, web-based search interface where sequences can be selected using a variety of biological and clinical criteria. Retrieved sequences can then be downloaded in a variety of formats or analyzed using a suite of tools and displays. Over the past 2 years, the pre-existing influenza and Dengue virus resources have been combined into a single construct and West Nile virus added to the resultant resource. A number of improvements were incorporated into the sequence annotation and database loading pipelines, and the virus-specific search interfaces were updated to support more advanced functions. Several new features have also been added to the sequence download options, and a new multiple sequence alignment viewer has been incorporated into the resource tool set. Together these enhancements should support enhanced usability and the inclusion of new viruses in the future.

Asunto(s)

Bases de Datos Genéticas , Virus/genética , Genes Virales , Genoma Viral , Genómica , Internet , Anotación de Secuencia Molecular , Orthomyxoviridae/genética , Alineación de Secuencia , Proteínas Virales

8.

The Genomic Standards Consortium.

Field, Dawn; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R; Dawyndt, Peter; Garrity, George M; Gilbert, Jack; Glöckner, Frank Oliver; Hirschman, Lynette; Karsch-Mizrachi, Ilene; Klenk, Hans-Peter; Knight, Rob; Kottmann, Renzo; Kyrpides, Nikos; Meyer, Folker; San Gil, Inigo; Sansone, Susanna-Assunta; Schriml, Lynn M; Sterk, Peter; Tatusova, Tatiana; Ussery, David W; White, Owen; Wooley, John.

PLoS Biol ; 9(6): e1001088, 2011 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-21713030

RESUMEN

A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.

Asunto(s)

Bases de Datos Genéticas , Genómica/normas , Cooperación Internacional , Metagenoma

9.

Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification.

Bao, Yiming; Chetvernin, Vyacheslav; Tatusova, Tatiana.

Arch Virol ; 159(12): 3293-304, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-25119676

RESUMEN

The number of viral genome sequences in the public databases is increasing dramatically, and these sequences are playing an important role in virus classification. Pairwise sequence comparison is a sequence-based virus classification method. A program using this method calculates the pairwise identities of virus sequences within a virus family and displays their distribution, and visual analysis helps to determine demarcations at different taxonomic levels such as strain, species, genus and subfamily. Subsequent comparison of new sequences against existing ones allows viruses from which the new sequences were derived to be classified. Although this method cannot be used as the only criterion for virus classification in some cases, it is a quantitative method and has many advantages over conventional virus classification methods. It has been applied to several virus families, and there is an increasing interest in using this method for other virus families/groups. The Pairwise Sequence Comparison (PASC) classification tool was created at the National Center for Biotechnology Information. The tool's database stores pairwise identities for complete genomes/segments of 56 virus families/groups. Data in the system are updated every day to reflect changes in virus taxonomy and additions of new virus sequences to the public database. The web interface of the tool ( http://www.ncbi.nlm.nih.gov/sutils/pasc/ ) makes it easy to navigate and perform analyses. Multiple new viral genome sequences can be tested simultaneously with this system to suggest the taxonomic position of virus isolates in a specific family. PASC eliminates potential discrepancies in the results caused by different algorithms and/or different data used by researchers.

Asunto(s)

Biología Computacional/métodos , Genoma Viral , Internet , Homología de Secuencia , Virus/clasificación , Virus/genética , National Institutes of Health (U.S.) , Estados Unidos

10.

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.

Pruitt, Kim D; Tatusova, Tatiana; Brown, Garth R; Maglott, Donna R.

Nucleic Acids Res ; 40(Database issue): D130-5, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22121212

RESUMEN

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

Asunto(s)

Bases de Datos Genéticas , Anotación de Secuencia Molecular , Análisis de Secuencia/normas , Genómica/normas , Humanos , Estándares de Referencia , Análisis de Secuencia de ADN/normas , Análisis de Secuencia de Proteína/normas , Análisis de Secuencia de ARN/normas

11.

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata.

Barrett, Tanya; Clark, Karen; Gevorgyan, Robert; Gorelenkov, Vyacheslav; Gribov, Eugene; Karsch-Mizrachi, Ilene; Kimelman, Michael; Pruitt, Kim D; Resenchuk, Sergei; Tatusova, Tatiana; Yaschenko, Eugene; Ostell, James.

Nucleic Acids Res ; 40(Database issue): D57-63, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22139929

RESUMEN

As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases. It captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. Concomitantly, the BioSample database is being developed to capture descriptive information about the biological samples investigated in projects. BioProject and BioSample records link to corresponding data stored in archival repositories. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their projects and samples. Together, these databases offer improved ways for users to query, locate, integrate and interpret the masses of data held in NCBI's archival repositories. The BioProject and BioSample databases are available at http://www.ncbi.nlm.nih.gov/bioproject and http://www.ncbi.nlm.nih.gov/biosample, respectively.

Asunto(s)

Bases de Datos Genéticas , Genómica , Internet , Integración de Sistemas , Transcriptoma , Interfaz Usuario-Computador

12.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22140104

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Asunto(s)

Bases de Datos como Asunto , Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , Internet , Modelos Moleculares , National Library of Medicine (U.S.) , Publicaciones Periódicas como Asunto , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Bibliotecas de Moléculas Pequeñas , Estados Unidos

13.

The complete mitochondrial genome data of Argania spinosa (L.) Skeels.

Idrissi Azami, Abdellah; Pirro, Stacy; Sehli, Sofia; Habib, Nihal; El Ghoubali, Douae; Al Idrissi, Najib; Rahim, Bouchra; Gaboun, Fatima; Msanda, Fouad; Zahidi, Abdelaziz; El Finti, Aissam; Legssyer, Abdelkhalek; Tatusova, Tatiana; Nejjari, Chakib; Amzazi, Saaid; Belyamani, Lahcen; El Mousadik, Abdelhamid; Ghazal, Hassan.

Data Brief ; 57: 110862, 2024 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-39290434

RESUMEN

Argania spinosa (L.) Skeels, an endemic Moroccan plant species from the Sapotaceae family, holds significant ecological, pharmaceutical, and socioeconomic value in the arid mid-western region. However, it is facing rapid degradation. Therefore, understanding its genetic diversity is critical for preserving this national heritage. We sequenced, assembled, and annotated the mitochondrial genome of A. spinosa and compared it to other plants in the Ericales order. Mitochondrial-like sequences from the A. spinosa genome were assembled using GetOrganelle, resulting in a 707,441 base pair mitochondrial genome with 45.75 % GC content. Annotation identified 32 protein-coding genes, 16 transfer RNAs, and 2 ribosomal RNA genes. Phylogenetic analysis of 15 Ericales species affirms that A. spinosa is closely related to the Theaceae family, which is in accordance with results from the chloroplast genome.

14.

Entrez Gene: gene-centered information at NCBI.

Maglott, Donna; Ostell, Jim; Pruitt, Kim D; Tatusova, Tatiana.

Nucleic Acids Res ; 39(Database issue): D52-7, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21115458

RESUMEN

Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) is National Center for Biotechnology Information (NCBI)'s database for gene-specific information. Entrez Gene maintains records from genomes which have been completely sequenced, which have an active research community to submit gene-specific information, or which are scheduled for intense sequence analysis. The content represents the integration of curation and automated processing from NCBI's Reference Sequence project (RefSeq), collaborating model organism databases, consortia such as Gene Ontology and other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, genomic location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programming utilities (E-Utilities) and for bulk transfer by FTP.

Asunto(s)

Bases de Datos Genéticas , Genes , Genómica , Internet , National Library of Medicine (U.S.) , Estados Unidos , Interfaz Usuario-Computador

15.

Cryptic splice sites and split genes.

Kapustin, Yuri; Chan, Elcie; Sarkar, Rupa; Wong, Frederick; Vorechovsky, Igor; Winston, Robert M; Tatusova, Tatiana; Dibb, Nick J.

Nucleic Acids Res ; 39(14): 5837-44, 2011 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-21470962

RESUMEN

We describe a new program called cryptic splice finder (CSF) that can reliably identify cryptic splice sites (css), so providing a useful tool to help investigate splicing mutations in genetic disease. We report that many css are not entirely dormant and are often already active at low levels in normal genes prior to their enhancement in genetic disease. We also report a fascinating correlation between the positions of css and introns, whereby css within the exons of one species frequently match the exact position of introns in equivalent genes from another species. These results strongly indicate that many introns were inserted into css during evolution and they also imply that the splicing information that lies outside some introns can be independently recognized by the splicing machinery and was in place prior to intron insertion. This indicates that non-intronic splicing information had a key role in shaping the split structure of eukaryote genes.

Asunto(s)

Sitios de Empalme de ARN , Programas Informáticos , Secuencia de Bases , Secuencia de Consenso , Evolución Molecular , Etiquetas de Secuencia Expresada/química , Genes , Enfermedades Genéticas Congénitas/genética , Genómica/métodos , Humanos , Intrones , Alineación de Secuencia , Análisis de Secuencia de Proteína

16.

Towards BioDBcore: a community-defined information specification for biological databases.

Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K; Bateman, Alex; Blake, Judith A; Bult, Carol J; Cherry, J Michael; Chisholm, Rex L; Cochrane, Guy; Cook, Charles E; Eppig, Janan T; Galperin, Michael Y; Gentleman, Robert; Goble, Carole A; Gojobori, Takashi; Hancock, John M; Howe, Douglas G; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B F Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L; White, Owen; Yamasaki, Chisato.

Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21097465

RESUMEN

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

Asunto(s)

Bases de Datos Factuales/normas , Difusión de la Información

17.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 39(Database issue): D38-51, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21097890

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Asunto(s)

Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , National Library of Medicine (U.S.) , Estructura Terciaria de Proteína , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Programas Informáticos , Integración de Sistemas , Estados Unidos

18.

Database resources of the National Center for Biotechnology Information.

Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; John Wilbur, W; Yaschenko, Eugene; Ye, Jian.

Nucleic Acids Res ; 38(Database issue): D5-16, 2010 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-19910364

RESUMEN

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.

Asunto(s)

Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Algoritmos , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genoma Bacteriano , Genoma Viral , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Programas Informáticos , Estados Unidos

19.

Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

Ghedin, Elodie; Sengamalay, Naomi A; Shumway, Martin; Zaborsky, Jennifer; Feldblyum, Tamara; Subbu, Vik; Spiro, David J; Sitz, Jeff; Koo, Hean; Bolotov, Pavel; Dernovoy, Dmitry; Tatusova, Tatiana; Bao, Yiming; St George, Kirsten; Taylor, Jill; Lipman, David J; Fraser, Claire M; Taubenberger, Jeffery K; Salzberg, Steven L.

Nature ; 437(7062): 1162-6, 2005 Oct 20.

Artículo en Inglés | MEDLINE | ID: mdl-16208317

RESUMEN

Influenza viruses are remarkably adept at surviving in the human population over a long timescale. The human influenza A virus continues to thrive even among populations with widespread access to vaccines, and continues to be a major cause of morbidity and mortality. The virus mutates from year to year, making the existing vaccines ineffective on a regular basis, and requiring that new strains be chosen for a new vaccine. Less-frequent major changes, known as antigenic shift, create new strains against which the human population has little protective immunity, thereby causing worldwide pandemics. The most recent pandemics include the 1918 'Spanish' flu, one of the most deadly outbreaks in recorded history, which killed 30-50 million people worldwide, the 1957 'Asian' flu, and the 1968 'Hong Kong' flu. Motivated by the need for a better understanding of influenza evolution, we have developed flexible protocols that make it possible to apply large-scale sequencing techniques to the highly variable influenza genome. Here we report the results of sequencing 209 complete genomes of the human influenza A virus, encompassing a total of 2,821,103 nucleotides. In addition to increasing markedly the number of publicly available, complete influenza virus genomes, we have discovered several anomalies in these first 209 genomes that demonstrate the dynamic nature of influenza transmission and evolution. This new, large-scale sequencing effort promises to provide a more comprehensive picture of the evolution of influenza viruses and of their pattern of transmission through human and animal populations. All data from this project are being deposited, without delay, in public archives.

Asunto(s)

Evolución Molecular , Genoma Viral , Virus de la Influenza A/genética , Gripe Humana/virología , Mutagénesis/genética , Animales , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Glicoproteínas Hemaglutininas del Virus de la Influenza/inmunología , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Virus de la Influenza A/clasificación , Virus de la Influenza A/aislamiento & purificación , Virus de la Influenza A/fisiología , Vacunas contra la Influenza/historia , Vacunas contra la Influenza/inmunología , Gripe Humana/epidemiología , Gripe Humana/transmisión , Gripe Humana/veterinaria , Mutación/genética , Neuraminidasa/genética , Neuraminidasa/metabolismo , New York/epidemiología , Filogenia , Sector Público , Virus Reordenados/genética , Análisis de Secuencia , Factores de Tiempo , Replicación Viral

20.

NCBI Reference Sequences: current status, policy and new initiatives.

Pruitt, Kim D; Tatusova, Tatiana; Klimke, William; Maglott, Donna R.

Nucleic Acids Res ; 37(Database issue): D32-6, 2009 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-18927115

RESUMEN

NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.

Asunto(s)

Bases de Datos Genéticas , Análisis de Secuencia/normas , Animales , Exones , Genómica/normas , Humanos , Ratones , Proteínas/química , Seudogenes , ARN no Traducido/química , Estándares de Referencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA