Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Nucleic Acids Res ; 40(Database issue): D834-40, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22102591

RESUMO

We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The server's webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein-ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteínas/química , Sítios de Ligação , Gráficos por Computador , Íons/química , Anotação de Sequência Molecular , Complexos Multiproteicos/química , Ácidos Nucleicos/química , Peptídeos/química , Análise de Sequência de Proteína , Integração de Sistemas , Interface Usuário-Computador
2.
Nucleic Acids Res ; 40(Database issue): D461-4, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22135289

RESUMO

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBI's Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Análise de Sequência de Proteína
3.
BMC Genomics ; 14: 654, 2013 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-24063302

RESUMO

BACKGROUND: Advances in high-throughput sequencing technology have yielded a large number of publicly available vertebrate genomes, many of which are selected for inclusion in NCBI's RefSeq project and subsequently processed by NCBI's eukaryotic annotation pipeline. Genome annotation results are affected by differences in available support evidence and may be impacted by annotation pipeline software changes over time. The RefSeq project has not previously assessed annotation trends across organisms or over time. To address this deficiency, we have developed a comparative protocol which integrates analysis of annotated protein-coding regions across a data set of vertebrate orthologs in genomic sequence coordinates, protein sequences, and protein features. RESULTS: We assessed an ortholog dataset that includes 34 annotated vertebrate RefSeq genomes including human. We confirm that RefSeq protein-coding gene annotations in mammals exhibit considerable similarity. Over 50% of the orthologous protein-coding genes in 20 organisms are supported at the level of splicing conservation with at least three selected reference genomes. Approximately 7,500 ortholog sets include at least half of the analyzed organisms, show highly similar sequence and conserved splicing, and may serve as a minimal set of mammalian "core proteins" for initial assessment of new mammalian genomes. Additionally, 80% of the proteins analyzed pass a suite of tests to detect proteins that lack splicing conservation and have unusual sequence or domain annotation. We use these tests to define an annotation quality metric that is based directly on the annotated proteins thus operates independently of other quality metrics such as availability of transcripts or assembly quality measures. Results are available on the RefSeq FTP site [http://ftp.ncbi.nlm.nih.gov/refseq/supplemental/ProtCore/SM1.txt]. CONCLUSIONS: Our multi-factored analysis demonstrates a high level of consistency in RefSeq protein representation among vertebrates. We find that the majority of the RefSeq vertebrate proteins for which we have calculated orthology are good as measured by these metrics. The process flow described provides specific information on the scope and degree of conservation for the analyzed protein sequences and annotations and will be used to enrich the quality of RefSeq records by identifying targets for further improvement in the computational annotation pipeline, and by flagging specific genes for manual curation.


Assuntos
Genoma Humano/genética , Genoma/genética , Fases de Leitura Aberta/genética , Vertebrados/genética , Sequência de Aminoácidos , Animais , Sequência Conservada/genética , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Humanos , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Splicing de RNA/genética , Homologia de Sequência do Ácido Nucleico , Especificidade da Espécie , alfa-Macroglobulinas/genética
4.
Nucleic Acids Res ; 39(Database issue): D225-9, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21109532

RESUMO

NCBI's Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Modelos Biológicos , Proteínas/classificação , Análise de Sequência de Proteína
5.
Nucleic Acids Res ; 38(Database issue): D283-7, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19906708

RESUMO

Most of the proteins in a cell assemble into complexes to carry out their function. In this work, we have created a new database (named ComSin) of protein structures in bound (complex) and unbound (single) states to provide a researcher with exhaustive information on structures of the same or homologous proteins in bound and unbound states. From the complete Protein Data Bank (PDB), we selected 24 910 pairs of protein structures in bound and unbound states, and identified regions of intrinsic disorder. For 2448 pairs, the proteins in bound and unbound states are identical, while 7129 pairs have sequence identity 90% or larger. The developed server enables one to search for proteins in bound and unbound states with several options including sequence similarity between the corresponding proteins in bound and unbound states, and validation of interaction interfaces of protein complexes. Besides that, through our web server, one can obtain necessary information for studying disorder-to-order and order-to-disorder transitions upon complex formation, and analyze structural differences between proteins in bound and unbound states. The database is available at http://antares.protres.ru/comsin/.


Assuntos
Proteínas de Bactérias/química , Biologia Computacional/métodos , Bases de Dados Genéticas , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Moleculares , Conformação Molecular , Ligação Proteica , Conformação Proteica , Estrutura Terciária de Proteína , Software
6.
Nucleic Acids Res ; 38(Database issue): D518-24, 2010 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19843613

RESUMO

IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Estrutura Terciária de Proteína , Algoritmos , Animais , Sítios de Ligação , Domínio Catalítico , Análise por Conglomerados , Biologia Computacional/tendências , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Proteínas Tirosina Quinases/química , Software
7.
Nucleic Acids Res ; 37(Database issue): D205-10, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18984618

RESUMO

NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBI's Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBI's Entrez system, and CDD's collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either 'specific' (identifying molecular function with high confidence) or as 'non-specific' (identifying superfamily membership only).


Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Sequência Conservada , Proteínas/classificação , Alinhamento de Sequência , Análise de Sequência de Proteína
8.
Bioinformatics ; 25(15): 1862-8, 2009 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-19470584

RESUMO

MOTIVATION: Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. RESULTS: CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. AVAILABILITY: CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Sequência Conservada , Bases de Dados de Proteínas , Estrutura Terciária de Proteína
9.
PLoS Comput Biol ; 5(3): e1000316, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19282967

RESUMO

We perform a large-scale study of intrinsically disordered regions in proteins and protein complexes using a non-redundant set of hundreds of different protein complexes. In accordance with the conventional view that folding and binding are coupled, in many of our cases the disorder-to-order transition occurs upon complex formation and can be localized to binding interfaces. Moreover, analysis of disorder in protein complexes depicts a significant fraction of intrinsically disordered regions, with up to one third of all residues being disordered. We find that the disorder in homodimers, especially in symmetrical homodimers, is significantly higher than in heterodimers and offer an explanation for this interesting phenomenon. We argue that the mechanisms of regulation of binding specificity through disordered regions in complexes can be as common as for unbound monomeric proteins. The fascinating diversity of roles of disordered regions in various biological processes and protein oligomeric forms shown in our study may be a subject of future endeavors in this area.


Assuntos
Modelos Químicos , Modelos Moleculares , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Ligação Proteica , Estrutura Terciária de Proteína
10.
J Mol Biol ; 366(1): 307-15, 2007 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-17166515

RESUMO

Domains are basic evolutionary units of proteins and most proteins have more than one domain. Advances in domain modeling and collection are making it possible to annotate a large fraction of known protein sequences by a linear ordering of their domains, yielding their architecture. Protein domain architectures link evolutionarily related proteins and underscore their shared functions. Here, we attempt to better understand this association by identifying the evolutionary pathways by which extant architectures may have evolved. We propose a model of evolution in which architectures arise through rearrangements of inferred precursor architectures and acquisition of new domains. These pathways are ranked using a parsimony principle, whereby scenarios requiring the fewest number of independent recombination events, namely fission and fusion operations, are assumed to be more likely. Using a data set of domain architectures present in 159 proteomes that represent all three major branches of the tree of life allows us to estimate the history of over 85% of all architectures in the sequence database. We find that the distribution of rearrangement classes is robust with respect to alternative parsimony rules for inferring the presence of precursor architectures in ancestral species. Analyzing the most parsimonious pathways, we find 87% of architectures to gain complexity over time through simple changes, among which fusion events account for 5.6 times as many architectures as fission. Our results may be used to compute domain architecture similarities, for example, based on the number of historical recombination events separating them. Domain architecture "neighbors" identified in this way may lead to new insights about the evolution of protein function.


Assuntos
Evolução Molecular , Rearranjo Gênico , Genoma Bacteriano/genética , Modelos Genéticos , Análise por Conglomerados , Simulação por Computador , Filogenia , Estrutura Terciária de Proteína
11.
Mol Biosyst ; 9(7): 1620-6, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23364837

RESUMO

Phosphorylation offers a dynamic way to regulate protein activity, subcellular localization, and stability. The majority of signaling pathways involve an extensive set of protein-protein interactions, and phosphorylation is widely used to regulate protein-protein binding by affecting the stability, kinetics and specificity of interactions. Previously it was found that phosphorylation sites tend to be located on protein-protein binding interfaces and may orthosterically modulate the strength of interactions. Here we studied the effect of phosphorylation on protein binding in relation to intrinsic disorder for different types of human protein complexes with known structure of the binding interface. Our results suggest that the processes of phosphorylation, binding and disorder-order transitions are coupled to each other, with about one quarter of all disordered interface Ser/Thr/Tyr sites being phosphorylated. Namely, residue site disorder and interfacial states significantly affect the phosphorylation of serine and to a lesser extent of threonine. Tyrosine phosphorylation might not be directly associated with binding through disorder, and is often observed in ordered interface regions which are not predicted to be disordered in the unbound state. We analyze possible mechanisms of how phosphorylation might regulate protein-protein binding via intrinsic disorder, and specifically focus on how phosphorylation could prevent disorder-order transitions upon binding.


Assuntos
Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Modelos Biológicos , Proteínas/química , Proteínas/metabolismo , Análise de Variância , Análise por Conglomerados , Humanos , Modelos Moleculares , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Fosforilação , Ligação Proteica , Conformação Proteica , Multimerização Proteica
12.
Mol Biosyst ; 8(1): 320-6, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22012032

RESUMO

We analyze human-specific KEGG pathways trying to understand the functional role of intrinsic disorder in proteins. Pathways provide a comprehensive picture of biological processes and allow better understanding of a protein's function within the specific context of its surroundings. Our study pinpoints a few specific pathways significantly enriched in disorder-containing proteins and identifies the role of these proteins within the framework of pathway relationships. Three major categories of relations are shown to be significantly enriched in disordered proteins: gene expression, protein binding and to a lesser degree, protein phosphorylation. Finally we find that relations involving protein activation and to some extent inhibition are characterized by low disorder content.


Assuntos
Redes e Vias Metabólicas , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Humanos , Ligação Proteica
13.
Mol Biosyst ; 7(3): 784-92, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21127809

RESUMO

Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution.


Assuntos
Células Eucarióticas/metabolismo , Evolução Molecular , Proteínas/química , Proteínas/genética , Animais , Distribuição de Qui-Quadrado , Genoma , Humanos , Filogenia , Estrutura Terciária de Proteína
14.
Mol Biosyst ; 6(10): 1821-8, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20544079

RESUMO

Intrinsic disorder is believed to contribute to the ability of some proteins to interact with multiple partners which is important for protein functional promiscuity and regulation of the cross-talk between pathways. To better understand the mechanisms of molecular recognition through disordered regions, here, we systematically investigate the coupling between disorder and binding within domain families in a structure interaction network and in terminal and inter-domain linker regions. We showed that the canonical domain-domain interaction model should take into account contributions of N- and C-termini and inter-domain linkers, which may form all or part of the binding interfaces. For the majority of proteins, binding interfaces on domain and terminal regions were predicted to be less disordered than non-interface regions. Analysis of all domain families revealed several exceptions, such as kinases, DNA/RNA binding proteins, certain enzymes, and regulatory proteins, which are candidates for disorder-to-order transitions that can occur upon binding. Domain interfaces that bind single or multiple partners do not exhibit significant difference in disorder content if normalized by the number of interactions. In general, protein families with more diverse interactions exhibit less average disorder over all members of the family. Our results shed light on recent controversies regarding the relationship between disorder and binding of multiple partners at common interfaces. In particular, they support the hypothesis that protein domains with many interacting partners should have a pleiotropic effect on functional pathways and consequently might be more constrained in evolution.


Assuntos
Proteínas/metabolismo , Ligação Proteica , Proteínas/química
15.
BMC Res Notes ; 1: 114, 2008 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-19014584

RESUMO

BACKGROUND: Domains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of homologous domain models. FINDINGS: Our analysis of alignment scores from NCBI-curated domain assignments suggests that identifying the correct model among closely related models is more difficult than choosing between non-overlapping domain models. We find that simple heuristics based on sorting scores and domain-specific thresholds are effective at reducing classification error. In fact, in our test set, the heuristics result in almost 90% of current misclassifications due to missing domain subfamilies being replaced by more generic domain assignments, thereby eliminating a significant amount of error within the database. CONCLUSION: Our proposed domain subfamily assignment rule has been incorporated into the CD-Search software for assigning CDD domains to query protein sequences and has significantly improved pre-calculated domain annotations on protein sequences in NCBI's Entrez resource.

16.
Genome Biol ; 5(2): R11, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-14759261

RESUMO

We present a method for predicting protein-protein interactions mediated by the coiled-coil motif. When tested on interactions between nearly all human and yeast bZIP proteins, our method identifies 70% of strong interactions while maintaining that 92% of predictions are correct. Furthermore, cross-validation testing shows that including the bZIP experimental data significantly improves performance. Our method can be used to predict bZIP interactions in other genomes and is a promising approach for predicting coiled-coil interactions more generally.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Fatores de Transcrição/química , Motivos de Aminoácidos , Fatores de Transcrição de Zíper de Leucina Básica , Proteínas de Ligação a DNA/metabolismo , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Fatores de Ligação G-Box , Humanos , Modelos Moleculares , Ligação Proteica , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes , Eletricidade Estática , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA