RESUMO
BACKGROUND: We have recently released a comprehensive, manually curated database of mammalian protein complexes called CORUM. Combining CORUM with other resources, we assembled a dataset of over 2700 mammalian complexes. The availability of a rich information resource allows us to search for organizational properties concerning these complexes. RESULTS: As the complexity of a protein complex in terms of the number of unique subunits increases, we observed that the number of such complexes and the mean non-synonymous to synonymous substitution ratio of associated genes tend to decrease. Similarly, as the number of different complexes a given protein participates in increases, the number of such proteins and the substitution ratio of the associated gene also tends to decrease. These observations provide evidence relating natural selection and the organization of mammalian complexes. We also observed greater homogeneity in terms of predicted protein isoelectric points, secondary structure and substitution ratio in annotated versus randomly generated complexes. A large proportion of the protein content and interactions in the complexes could be predicted from known binary protein-protein and domain-domain interactions. In particular, we found that large proteins interact preferentially with much smaller proteins. CONCLUSION: We observed similar trends in yeast and other data. Our results support the existence of conserved relations associated with the mammalian protein complexes.
Assuntos
Bases de Dados de Proteínas , Evolução Molecular , Complexos Multiproteicos/análise , Mapeamento de Interação de Proteínas , Animais , Biologia Computacional/métodos , Modelos Lineares , Mamíferos , Modelos Moleculares , Estrutura Secundária de Proteína , Proteômica/métodos , Análise de Sequência de ProteínaRESUMO
Conserved domains carry many of the functional features found in the proteins of an organism. This includes not only catalytic activity, substrate binding, and structural features but also molecular adapters, which mediate the physical interactions between proteins or proteins with other molecules. In addition, two conserved domains can be linked not by physical contact but by a common function like forming a binding pocket. Although a wealth of experimental data has been collected and carefully curated for protein-protein interactions, as of today little useful data is available from major databases with respect to relations on the domain level. This lack of data makes computational prediction of domain-domain interactions a very important endeavor. In this chapter, we discuss the available experimental data (iPfam) and describe some important approaches to the problem of identifying interacting and/or functionally linked domain pairs from different kinds of input data. Specifically, we will discuss phylogenetic profiling on the level of conserved protein domains on one hand and inference of domain-interactions from observed or predicted protein-protein interactions datasets on the other. We explore the predictive power of these predictions and point out the importance of deploying as many different methods as possible for the best results.