Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Chem Inf Model ; 64(3): 690-696, 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38230885

RESUMEN

The Kováts retention index (RI) is a quantity measured using gas chromatography and is commonly used in the identification of chemical structures. Creating libraries of observed RI values is a laborious task, so we explore the use of a deep neural network for predicting RI values from structure for standard semipolar columns. This network generated predictions with a mean absolute error of 15.1 and, in a quantification of the tail of the error distribution, a 95th percentile absolute error of 46.5. Because of the Artificial Intelligence Retention Indices (AIRI) network's accuracy, it was used to predict RI values for the NIST EI-MS spectral libraries. These RI values are used to improve chemical identification methods and the quality of the library. Estimating uncertainty is an important practical need when using prediction models. To quantify the uncertainty of our network for each individual prediction, we used the outputs of an ensemble of 8 networks to calculate a predicted standard deviation for each RI value prediction. This predicted standard deviation was corrected to follow the error between the observed and predicted RI values. The Z scores using these predicted standard deviations had a standard deviation of 1.52 and a 95th percentile absolute Z score corresponding to a mean RI value of 42.6.


Asunto(s)
Inteligencia Artificial , Redes Neurales de la Computación , Incertidumbre
2.
J Proteome Res ; 22(7): 2246-2255, 2023 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-37232537

RESUMEN

The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.


Asunto(s)
Biblioteca de Péptidos , Proteoma , Proteoma/metabolismo , Inteligencia Artificial , Espectrometría de Masas en Tándem , Algoritmos , Fosfopéptidos , Bases de Datos de Proteínas , Programas Informáticos
3.
Bioinformatics ; 36(1): 131-135, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31218344

RESUMEN

MOTIVATION: Build a web-based 3D molecular structure viewer focusing on interactive structural analysis. RESULTS: iCn3D (I-see-in-3D) can simultaneously show 3D structure, 2D molecular contacts and 1D protein and nucleotide sequences through an integrated sequence/annotation browser. Pre-defined and arbitrary molecular features can be selected in any of the 1D/2D/3D windows as sets of residues and these selections are synchronized dynamically in all displays. Biological annotations such as protein domains, single nucleotide variations, etc. can be shown as tracks in the 1D sequence/annotation browser. These customized displays can be shared with colleagues or publishers via a simple URL. iCn3D can display structure-structure alignments obtained from NCBI's VAST+ service. It can also display the alignment of a sequence with a structure as identified by BLAST, and thus relate 3D structure to a large fraction of all known proteins. iCn3D can also display electron density maps or electron microscopy (EM) density maps, and export files for 3D printing. The following example URL exemplifies some of the 1D/2D/3D representations: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1TUP&showanno=1&show2d=1&showsets=1. AVAILABILITY AND IMPLEMENTATION: iCn3D is freely available to the public. Its source code is available at https://github.com/ncbi/icn3d. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuencia de Bases , Biología Computacional , Internet , Modelos Moleculares , Proteínas , Programas Informáticos , Biología Computacional/métodos , Bases de Datos Genéticas , Conformación Molecular , Proteínas/química
4.
Nucleic Acids Res ; 46(D1): D851-D860, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29112715

RESUMEN

The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination. Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes. Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules. Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.


Asunto(s)
Curaduría de Datos , Bases de Datos de Ácidos Nucleicos , Genoma , Anotación de Secuencia Molecular , Células Procariotas , Archaea/genética , Bacterias/genética , Bases de Datos de Proteínas , Eucariontes/genética , Predicción , Humanos , Homología de Secuencia , Programas Informáticos , Virus/genética
5.
Nucleic Acids Res ; 45(D1): D200-D203, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899674

RESUMEN

NCBI's Conserved Domain Database (CDD) aims at annotating biomolecular sequences with the location of evolutionarily conserved protein domain footprints, and functional sites inferred from such footprints. An archive of pre-computed domain annotation is maintained for proteins tracked by NCBI's Entrez database, and live search services are offered as well. CDD curation staff supplements a comprehensive collection of protein domain and protein family models, which have been imported from external providers, with representations of selected domain families that are curated in-house and organized into hierarchical classifications of functionally distinct families and sub-families. CDD also supports comparative analyses of protein families via conserved domain architectures, and a recent curation effort focuses on providing functional characterizations of distinct subfamily architectures using SPARCLE: Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Dominios y Motivos de Interacción de Proteínas , Proteínas , Difusión de la Información , Internet , Proteínas/química , Proteínas/clasificación , Proteínas/genética
6.
J Cheminform ; 7: 55, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26583046

RESUMEN

BACKGROUND: The enriched biological activity information of compounds in large and freely-accessible chemical databases like the PubChem Bioassay Database has become a powerful research resource for the scientific research community. Currently, 2D fingerprint based conventional similarity search (CSS) is the most common widely used approach for database screening, but it does not typically incorporate the relative importance of fingerprint bits to biological activity. RESULTS: In this study, a large-scale similarity search investigation has been carried out on 208 well-defined compound activity classes extracted from PubChem Bioassay Database. An analysis was performed to compare the search performance of three types of 2D similarity search approaches: 2D fingerprint based conventional similarity search approach (CSS), iterative similarity search approach with multiple active compounds as references (ISS), and fingerprint based iterative similarity search with classification (ISC), which can be regarded as the combination of iterative similarity search with active references and a reversed iterative similarity search with inactive references. Compared to the search results returned by CSS, ISS improves recall but not precision. Although ISC causes the false rejection of active hits, it improves the precision with statistical significance, and outperforms both ISS and CSS. In a second part of this study, we introduce the profile concept into the three types of searches. We find that the profile based non-iterative search can significantly improve the search performance by increasing the recall rate. We also find that profile based ISS (PBISS) and profile based ISC (PBISC) significantly decreases ISS search time without sacrificing search performance. CONCLUSIONS: On the basis of our large-scale investigation directed against a wide spectrum of pharmaceutical targets, we conclude that ISC and ISS searches perform better than 2D fingerprint similarity searching and that profile based versions of these algorithms do nearly as well in less time. We also suggest that the profile version of the iterative similarity searches are both better performing and potentially quicker than the standard algorithm.

7.
Nucleic Acids Res ; 43(Database issue): D222-6, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25414356

RESUMEN

NCBI's CDD, the Conserved Domain Database, enters its 15(th) year as a public resource for the annotation of proteins with the location of conserved domain footprints. Going forward, we strive to improve the coverage and consistency of domain annotation provided by CDD. We maintain a live search system as well as an archive of pre-computed domain annotation for sequences tracked in NCBI's Entrez protein database, which can be retrieved for single sequences or in bulk. We also maintain import procedures so that CDD contains domain models and domain definitions provided by several collections available in the public domain, as well as those produced by an in-house curation effort. The curation effort aims at increasing coverage and providing finer-grained classifications of common protein domains, for which a wealth of functional and structural data has become available. CDD curation generates alignment models of representative sequence fragments, which are in agreement with domain boundaries as observed in protein 3D structure, and which model the structurally conserved cores of domain families as well as annotate conserved features. CDD can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Secuencia Conservada , Curaduría de Datos
8.
Nucleic Acids Res ; 41(Database issue): D348-52, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23197659

RESUMEN

CDD, the Conserved Domain Database, is part of NCBI's Entrez query and retrieval system and is also accessible via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. CDD provides annotation of protein sequences with the location of conserved domain footprints and functional sites inferred from these footprints. Pre-computed annotation is available via Entrez, and interactive search services accept single protein or nucleotide queries, as well as batch submissions of protein query sequences, utilizing RPS-BLAST to rapidly identify putative matches. CDD incorporates several protein domain and full-length protein model collections, and maintains an active curation effort that aims at providing fine grained classifications for major and well-characterized protein domain families, as supported by available protein three-dimensional (3D) structure and the published literature. To this date, the majority of protein 3D structures are represented by models tracked by CDD, and CDD curators are characterizing novel families that emerge from protein structure determination efforts.


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Estructura Terciaria de Proteína , Secuencia de Aminoácidos , Secuencia Conservada , Internet , Modelos Moleculares , Anotación de Secuencia Molecular , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Análisis de Secuencia de Proteína
9.
Anal Chem ; 84(6): 2875-82, 2012 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-22335612

RESUMEN

We describe the first implementation of negative electron-transfer dissociation (NETD) on a hybrid ion trap-orbitrap mass spectrometer and its application to high-throughput sequencing of peptide anions. NETD, coupled with high pH separations, negative electrospray ionization (ESI), and an NETD compatible version of OMSSA, is part of a complete workflow that includes the formation, interrogation, and sequencing of peptide anions. Together these interlocking pieces facilitated the identification of more than 2000 unique peptides from Saccharomyces cerevisiae representing the most comprehensive analysis of peptide anions by tandem mass spectrometry to date. The same S. cerevisiae samples were interrogated using traditional, positive modes of peptide LC-MS/MS analysis (e.g., acidic LC separations, positive ESI, and collision activated dissociation), and the resulting peptide identifications of the different workflows were compared. Due to a decreased flux of peptide anions and a tendency to produce lowly charged precursors, the NETD-based LC-MS/MS workflow was not as sensitive as the positive mode methods. However, the use of NETD readily permits access to underrepresented acidic portions of the proteome by identifying peptides that tend to have lower pI values. As such, NETD improves sequence coverage, filling out the acidic portions of proteins that are often overlooked by the other methods.


Asunto(s)
Proteínas Fúngicas/análisis , Péptidos/análisis , Proteoma/análisis , Proteómica/métodos , Saccharomyces cerevisiae/química , Espectrometría de Masa por Ionización de Electrospray/métodos , Secuencia de Aminoácidos , Cromatografía Liquida/métodos , Concentración de Iones de Hidrógeno , Datos de Secuencia Molecular
10.
Proteome Sci ; 10(1): 8, 2012 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-22321509

RESUMEN

BACKGROUND: Electron Transfer Dissociation [ETD] can dissociate multiply charged precursor polypeptides, providing extensive peptide backbone cleavage. ETD spectra contain charge reduced precursor peaks, usually of high intensity, and whose pattern is dependent on its parent precursor charge. These charge reduced precursor peaks and associated neutral loss peaks should be removed before these spectra are searched for peptide identifications. ETD spectra can also contain ion-types other than c and z˙. Modifying search strategies to accommodate these ion-types may aid in increased peptide identifications. Additionally, if the precursor mass is measured using a lower resolution instrument such as a linear ion trap, the charge of the precursor is often not known, reducing sensitivity and increasing search times. We implemented algorithms to remove these precursor peaks, accommodate new ion-types in noise filtering routine in OMSSA and to estimate any unknown precursor charge, using Linear Discriminant Analysis [LDA]. RESULTS: Spectral pre-processing to remove precursor peaks and their associated neutral losses prior to protein sequence library searches resulted in a 9.8% increase in peptide identifications at a 1% False Discovery Rate [FDR] compared to previous OMSSA filter. Modifications to the OMSSA noise filter to accommodate various ion-types resulted in a further 4.2% increase in peptide identifications at 1% FDR. Moreover, ETD spectra when searched with charge states obtained from the precursor charge determination algorithm is shown to be up to 3.5 times faster than the general range search method, with a minor 3.8% increase in sensitivity. CONCLUSION: Overall, there is an 18.8% increase in peptide identifications at 1% FDR by incorporating the new precursor filter, noise filter and by using the charge determination algorithm, when compared to previous versions of OMSSA.

11.
Nucleic Acids Res ; 40(Database issue): D13-25, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22140104

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos como Asunto , Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , Internet , Modelos Moleculares , National Library of Medicine (U.S.) , Publicaciones Periódicas como Asunto , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína , Análisis de Secuencia de ARN , Bibliotecas de Moléculas Pequeñas , Estados Unidos
12.
Nucleic Acids Res ; 40(Database issue): D461-4, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22135289

RESUMEN

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBI's Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.


Asunto(s)
Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Análisis de Secuencia de Proteína
13.
J Cheminform ; 3(1): 52, 2011 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-22107874

RESUMEN

BACKGROUND: A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation. RESULTS: An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems. CONCLUSIONS: Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.

14.
Database (Oxford) ; 2011: bar019, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21571812

RESUMEN

New generation sequencing technologies have resulted in significant increases in the number of complete genomes. Functional characterization of these genomes, such as by high-throughput proteomics, is an important but challenging task due to the difficulty of scaling up existing experimental techniques. By use of comparative genomics techniques, experimental results can be transferred from one genome to another, while at the same time minimizing errors by requiring discovery in multiple genomes. In this study, protein phosphorylation, an essential component of many cellular processes, is studied using data from large-scale proteomics analyses of the phosphoproteome. Phosphorylation sites from Homo sapiens, Mus musculus and Drosophila melanogaster phosphopeptide data sets were mapped onto conserved domains in NCBI's manually curated portion of Conserved Domain Database (CDD). In this subset, 25 phosphorylation sites are found to be evolutionarily conserved between the three species studied. Transfer of phosphorylation annotation of these conserved sites onto sequences sharing the same conserved domains yield 3253 phosphosite annotations for proteins from coelomata, the taxonomic division that spans H. sapiens, M. musculus and D. melanogaster. The method scales automatically, so as the amount of experimental phosphoproteomics data increases, more conserved phosphorylation sites may be revealed.


Asunto(s)
Automatización , Secuencia Conservada/genética , Evolución Molecular , Genoma/genética , Anotación de Secuencia Molecular/métodos , Procesamiento Proteico-Postraduccional/genética , Algoritmos , Secuencia de Aminoácidos , Animales , Humanos , Datos de Secuencia Molecular , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/metabolismo
15.
Nucleic Acids Res ; 39(Database issue): D38-51, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21097890

RESUMEN

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , Bases de Datos de Proteínas , Expresión Génica , Genómica , National Library of Medicine (U.S.) , Estructura Terciaria de Proteína , PubMed , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN , Programas Informáticos , Integración de Sistemas , Estados Unidos
16.
Nucleic Acids Res ; 39(Database issue): D225-9, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21109532

RESUMEN

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Secuencia de Aminoácidos , Secuencia Conservada , Modelos Biológicos , Proteínas/clasificación , Análisis de Secuencia de Proteína
17.
Nucleic Acids Res ; 38(Database issue): D5-16, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19910364

RESUMEN

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, Reference Sequence, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Peptidome, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Algoritmos , Animales , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genoma Bacteriano , Genoma Viral , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , National Institutes of Health (U.S.) , National Library of Medicine (U.S.) , Programas Informáticos , Estados Unidos
18.
Nucleic Acids Res ; 38(Database issue): D492-6, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19854944

RESUMEN

The NCBI BioSystems database, found at http://www.ncbi.nlm.nih.gov/biosystems/, centralizes and cross-links existing biological systems databases, increasing their utility and target audience by integrating their pathways and systems into NCBI resources. This integration allows users of NCBI's Entrez databases to quickly categorize proteins, genes and small molecules by metabolic pathway, disease state or other BioSystem type, without requiring time-consuming inference of biological relationships from the literature or multiple experimental datasets.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Biología de Sistemas , Animales , Membrana Celular/metabolismo , Biología Computacional/tendencias , Bases de Datos de Proteínas , Genes , Genómica , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , National Library of Medicine (U.S.) , Programas Informáticos , Estados Unidos
19.
Nucleic Acids Res ; 37(Database issue): D5-15, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18940862

RESUMEN

In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , Expresión Génica , Genes , Genómica , Genotipo , National Library of Medicine (U.S.) , Fenotipo , Estructura Terciaria de Proteína , Proteómica , PubMed , Homología de Secuencia , Integración de Sistemas , Estados Unidos
20.
Nucleic Acids Res ; 37(Database issue): D205-10, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18984618

RESUMEN

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either 'specific' (identifying molecular function with high confidence) or as 'non-specific' (identifying superfamily membership only).


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Secuencia de Aminoácidos , Secuencia Conservada , Proteínas/clasificación , Alineación de Secuencia , Análisis de Secuencia de Proteína
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...