RESUMO
Many proteins interact with short linear regions of target proteins. For some proteins, however, it is difficult to identify a well-defined sequence motif that defines its target peptides. To overcome this difficulty, we used supervised machine learning to train a model that treats each peptide as a collection of easily-calculated biochemical features rather than as an amino acid sequence. As a test case, we dissected the peptide-recognition rules for human S100A5 (hA5), a low-specificity calcium binding protein. We trained a Random Forest model against a recently released, high-throughput phage display dataset collected for hA5. The model identifies hydrophobicity and shape complementarity, rather than polar contacts, as the primary determinants of peptide binding specificity in hA5. We tested this hypothesis by solving a crystal structure of hA5 and through computational docking studies of diverse peptides onto hA5. These structural studies revealed that peptides exhibit multiple binding modes at the hA5 peptide interface-all of which have few polar contacts with hA5. Finally, we used our trained model to predict new, plausible binding targets in the human proteome. This revealed a fragment of the protein α-1-syntrophin that binds to hA5. Our work helps better understand the biochemistry and biology of hA5, as well as demonstrating how high-throughput experiments coupled with machine learning of biochemical features can reveal the determinants of binding specificity in low-specificity proteins.
Assuntos
Proteínas de Ligação ao Cálcio/química , Proteínas de Membrana/química , Modelos Moleculares , Proteínas Musculares/química , Peptídeos/química , Proteínas S100/química , Proteínas de Ligação ao Cálcio/genética , Proteínas de Ligação ao Cálcio/metabolismo , Cristalografia por Raios X , Humanos , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Proteínas Musculares/genética , Proteínas Musculares/metabolismo , Biblioteca de Peptídeos , Peptídeos/genética , Peptídeos/metabolismo , Ligação Proteica , Proteínas S100/genética , Proteínas S100/metabolismoRESUMO
Many regulatory proteins bind peptide regions of target proteins and modulate their activity. Such regulatory proteins can often interact with highly diverse target peptides. In many instances, it is not known if the peptide-binding interface discriminates targets in a biological context, or whether biological specificity is achieved exclusively through external factors such as subcellular localization. We used an evolutionary biochemical approach to distinguish these possibilities for two such low-specificity proteins: S100A5 and S100A6. We used isothermal titration calorimetry to study the binding of peptides with diverse sequence and biochemistry to human S100A5 and S100A6. These proteins bound distinct, but overlapping, sets of peptide targets. We then studied the peptide binding properties of orthologs sampled from across five amniote species. Binding specificity was conserved along all lineages, for the last 320 million years, despite the low specificity of each protein. We used ancestral sequence reconstruction to determine the binding specificity of the last common ancestor of the paralogs. The ancestor bound the entire set of peptides bound by modern S100A5 and S100A6 proteins, suggesting that paralog specificity evolved via subfunctionalization. To rule out the possibility that specificity is conserved because it is difficult to modify, we identified a single historical mutation that, when reverted in human S100A5, gave it the ability to bind an S100A6-specific peptide. These results reveal strong evolutionary constraints on peptide binding specificity. Despite being able to bind a large number of targets, the specificity of S100 peptide interfaces is likely important for the biology of these proteins.