Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
PLoS Comput Biol ; 4(7): e1000105, 2008 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-18604264

RESUMO

We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de).


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Proteínas/classificação , Proteínas/metabolismo , Homologia Estrutural de Proteína , Relação Estrutura-Atividade , Sequência de Aminoácidos/fisiologia , Inteligência Artificial , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Conformação Proteica , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína
2.
Bioinformatics ; 23(23): 3139-46, 2007 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-17977888

RESUMO

MOTIVATION: An approach for identifying similarities of protein-protein binding sites is presented. The geometric shape of a binding site is described by computing a feature vector based on moment invariants. In order to search for similarities, feature vectors of binding sites are compared. Similar feature vectors indicate binding sites with similar shapes. RESULTS: The approach is validated on a representative set of protein-protein binding sites, extracted from the SCOPPI database. When querying binding sites from a representative set, we search for known similarities among 2819 binding sites. A median area under the ROC curve of 0.98 is observed. For half of the queries, a similar binding site is identified among the first two of 2819 when sorting all binding sites according the proposed similarity measure. Typical examples identified by this method are analyzed and discussed. The nitrogenase iron protein-like SCOP family is clustered hierarchically according to the proposed similarity measure as a case study. AVAILABILITY: Python code is available on request from the authors.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Inteligência Artificial , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Ligação Proteica , Conformação Proteica , Análise de Regressão
3.
PLoS Comput Biol ; 3(3): e58, 2007 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-17397254

RESUMO

HIV-1 cell entry commonly uses, in addition to CD4, one of the chemokine receptors CCR5 or CXCR4 as coreceptor. Knowledge of coreceptor usage is critical for monitoring disease progression as well as for supporting therapy with the novel drug class of coreceptor antagonists. Predictive methods for inferring coreceptor usage based on the third hypervariable (V3) loop region of the viral gene coding for the envelope protein gp120 can provide us with these monitoring facilities while avoiding expensive phenotypic tests. All simple heuristics (such as the 11/25 rule) as well as statistical learning methods proposed to date predict coreceptor usage based on sequence features of the V3 loop exclusively. Here, we show, based on a recently resolved structure of gp120 with an untruncated V3 loop, that using structural information on the V3 loop in combination with sequence features of V3 variants improves prediction of coreceptor usage. In particular, we propose a distance-based descriptor of the spatial arrangement of physicochemical properties that increases discriminative performance. For a fixed specificity of 0.95, a sensitivity of 0.77 was achieved, improving further to 0.80 when combined with a sequence-based representation using amino acid indicators. This compares favorably with the sensitivities of 0.62 for the traditional 11/25 rule and 0.73 for a prediction based on sequence information as input to a support vector machine and constitutes a statistically significant improvement. A detailed analysis and interpretation of structural features important for classification shows the relevance of several specific hydrogen-bond donor sites and aliphatic side chains to coreceptor specificity towards CCR5 or CXCR4. Furthermore, an analysis of side chain orientation of the specificity-determining residues suggests a major role of one side of the V3 loop in the selection of the coreceptor. The proposed method constitutes the first approach to an improved prediction of coreceptor usage based on an original integration of structural bioinformatics methods with statistical learning.


Assuntos
HIV-1/fisiologia , Receptores CCR5/química , Receptores CCR5/metabolismo , Receptores CXCR4/química , Receptores CXCR4/metabolismo , Análise de Sequência de Proteína/métodos , Ligação Viral , Sequência de Aminoácidos , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Relação Estrutura-Atividade , Internalização do Vírus
4.
BMC Bioinformatics ; 7: 14, 2006 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-16405736

RESUMO

BACKGROUND: In recent years protein structure prediction methods using local structure information have shown promising improvements. The quality of new fold predictions has risen significantly and in fold recognition incorporation of local structure predictions led to improvements in the accuracy of results. We developed a local structure prediction method to be integrated into either fold recognition or new fold prediction methods. For each local sequence window of a protein sequence the method predicts probability estimates for the sequence to attain particular local structures from a set of predefined local structure candidates. The first step is to define a set of local structure representatives based on clustering recurrent local structures. In the second step a discriminative model is trained to predict the local structure representative given local sequence information. RESULTS: The step of clustering local structures yields an average RMSD quantization error of 1.19 A for 27 structural representatives (for a fragment length of 7 residues). In the prediction step the area under the ROC curve for detection of the 27 classes ranges from 0.68 to 0.88. CONCLUSION: The described method yields probability estimates for local protein structure candidates, giving signals for all kinds of local structure. These local structure predictions can be incorporated either into fold recognition algorithms to improve alignment quality and the overall prediction accuracy or into new fold prediction methods.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Químicos , Modelos Moleculares , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Análise Discriminante , Dados de Sequência Molecular , Conformação Proteica , Proteínas/classificação , Alinhamento de Sequência/métodos
5.
BMC Bioinformatics ; 7: 27, 2006 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-16423290

RESUMO

BACKGROUND: Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. RESULTS: Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. CONCLUSION: NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.


Assuntos
Cristalografia/métodos , Modelos Químicos , Modelos Moleculares , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Inteligência Artificial , Sítios de Ligação , Simulação por Computador , Sistemas On-Line , Reconhecimento Automatizado de Padrão , Ligação Proteica , Conformação Proteica , Estrutura Quaternária de Proteína , Proteínas/classificação , Proteínas/ultraestrutura
6.
BMC Bioinformatics ; 7: 364, 2006 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-16872519

RESUMO

BACKGROUND: In the area of protein structure prediction, recently a lot of effort has gone into the development of Model Quality Assessment Programs (MQAPs). MQAPs distinguish high quality protein structure models from inferior models. Here, we propose a new method to use an MQAP to improve the quality of models. With a given target sequence and template structure, we construct a number of different alignments and corresponding models for the sequence. The quality of these models is scored with an MQAP and used to choose the most promising model. An SVM-based selection scheme is suggested for combining MQAP partial potentials, in order to optimize for improved model selection. RESULTS: The approach has been tested on a representative set of proteins. The ability of the method to improve models was validated by comparing the MQAP-selected structures to the native structures with the model quality evaluation program TM-score. Using the SVM-based model selection, a significant increase in model quality is obtained (as shown with a Wilcoxon signed rank test yielding p-values below 10(-15)). The average increase in TMscore is 0.016, the maximum observed increase in TM-score is 0.29. CONCLUSION: In template-based protein structure prediction alignment is known to be a bottleneck limiting the overall model quality. Here we show that a combination of systematic alignment variation and modern model scoring functions can significantly improve the quality of alignment-based models.


Assuntos
Modelos Moleculares , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína , Simulação por Computador , Bases de Dados de Proteínas , Conformação Proteica
7.
PLoS One ; 3(4): e1926, 2008 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-18382693

RESUMO

BACKGROUND: The study and comparison of protein-protein interfaces is essential for the understanding of the mechanisms of interaction between proteins. While there are many methods for comparing protein structures and protein binding sites, so far no methods have been reported for comparing the geometry of non-covalent interactions occurring at protein-protein interfaces. METHODOLOGY/PRINCIPAL FINDINGS: Here we present a method for aligning non-covalent interactions between different protein-protein interfaces. The method aligns the vector representations of van der Waals interactions and hydrogen bonds based on their geometry. The method has been applied to a dataset which comprises a variety of protein-protein interfaces. The alignments are consistent to a large extent with the results obtained using two other complementary approaches. In addition, we apply the method to three examples of protein mimicry. The method successfully aligns respective interfaces and allows for recognizing conserved interface regions. CONCLUSIONS/SIGNIFICANCE: The Galinter method has been validated in the comparison of interfaces in which homologous subunits are involved, including cases of mimicry. The method is also applicable to comparing interfaces involving non-peptidic compounds. Galinter assists users in identifying local interface regions with similar patterns of non-covalent interactions. This is particularly relevant to the investigation of the molecular basis of interaction mimicry.


Assuntos
Bioquímica/métodos , Mapeamento de Interação de Proteínas , Sítios de Ligação , Antígenos CD4/química , Análise por Conglomerados , Bases de Dados de Proteínas , Proteína gp120 do Envelope de HIV/química , Humanos , Ligação de Hidrogênio , Interleucina-2/química , Subunidade alfa de Receptor de Interleucina-2/química , Modelos Moleculares , Modelos Estatísticos , Modelos Teóricos , Reprodutibilidade dos Testes
8.
Pac Symp Biocomput ; : 252-63, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-12603033

RESUMO

The problem of computing the tertiary structure of a protein from a given amino acid sequence has been a major subject of bioinformatics research during the last decade. Many different approaches have been taken to tackle the problem, the most successful of which are based on searching databases to identify a similar amino acid sequence in the PDB and using the corresponding structure as a template for modeling the structure of the query sequence. An important advance for the evaluation of sequence similarity in this context has been the use of a frequency profile that represents a part of the protein sequence space close to the query sequence instead of the query sequence itself. In this paper, we present a further extension of this principle by using profiles instead of the template sequences, also. We show that, by using our newly developed scoring model, the profile-profile alignment approach is able to significantly outperform current state of the art methods like PSI-BLAST, HMMs, or threading methods in a fold recognition setup. This is especially interesting since we show that it holds for closely related sequences as well as for very distantly related ones.


Assuntos
Proteínas/química , Proteínas/genética , Alinhamento de Sequência/estatística & dados numéricos , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Dobramento de Proteína , Estrutura Terciária de Proteína
9.
Bioinformatics ; 20(14): 2228-35, 2004 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-15059818

RESUMO

MOTIVATION: Arby is a new server for protein structure prediction that combines several homology-based methods for predicting the three-dimensional structure of a protein, given its sequence. The methods used include a threading approach, which makes use of structural information, and a profile-profile alignment approach that incorporates secondary structure predictions. The combination of the different methods with the help of empirically derived confidence measures affords reliable template selection. RESULTS: According to the recent CAFASP3 experiment, the server is one of the most sensitive methods for predicting the structure of single domain proteins. The quality of template selection is assessed using a fold-recognition experiment. AVAILABILITY: The Arby server is available through the portal of the Helmholtz Network for Bioinformatics at http://www.hnbioinfo.de under the protein structure category.


Assuntos
Algoritmos , Modelos Moleculares , Proteínas/química , Proteínas/classificação , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Simulação por Computador , Conformação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Proteínas/análise , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
10.
Bioinformatics ; 18(6): 802-12, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12075015

RESUMO

MOTIVATION: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures. RESULTS: As a baseline, we review the performances of different methods for detecting remote protein folds (sequence alignment and threading, with and without sequence profiles, global and local). The analysis is performed on a large representative set of protein structures. For fold recognition, we find that methods using sequence profiles generally perform better than methods using plain sequences, and that threading methods perform better than sequence alignment methods. In order to assess the quality of the predictions made, we establish and compare several confidence measures, including raw scores, z-scores, raw score gaps, z-score gaps, and different methods of p-value estimation. We work our way from the theoretically well backed local scores towards more explorative global and threading scores. The methods for assessing the statistical significance of predictions are compared using specificity--sensitivity plots. For local alignment techniques we find that p-value methods work best, albeit computationally cheaper methods such as those based on score gaps achieve similar performance. For global methods where no theory is available methods based on score gaps work best. By using the score gap functions as the measure of confidence we improve the more powerful fold recognition methods for which p-values are unavailable. AVAILABILITY: The benchmark set is available upon request.


Assuntos
Dobramento de Proteína , Sequência de Aminoácidos , Biologia Computacional , Intervalos de Confiança , Estrutura Terciária de Proteína , Alinhamento de Sequência/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA