RESUMO
BACKGROUND: Mutations occurring in nucleic acids or proteins may affect the binding affinities of protein-nucleic acid interactions. Although many efforts have been devoted to the impact of protein mutations, few computational studies have addressed the effect of nucleic acid mutations and explored whether the identical methodology could be applied to the prediction of binding affinity changes caused by these two mutation types. RESULTS: Here, we developed a generalized algorithm named PNBACE for both DNA and protein mutations. We first demonstrated that DNA mutations could induce varying degrees of changes in binding affinity from multiple perspectives. We then designed a group of energy-based topological features based on different energy networks, which were combined with our previous partition-based energy features to construct individual prediction models through feature selections. Furthermore, we created an ensemble model by integrating the outputs of individual models using a differential evolution algorithm. In addition to predicting the impact of single-point mutations, PNBACE could predict the influence of multiple-point mutations and identify mutations significantly reducing binding affinities. Extensive comparisons indicated that PNBACE largely performed better than existing methods on both regression and classification tasks. CONCLUSIONS: PNBACE is an effective method for estimating the binding affinity changes of protein-nucleic acid complexes induced by DNA or protein mutations, therefore improving our understanding of the interactions between proteins and DNA/RNA.
Assuntos
Algoritmos , DNA , Mutação , Ligação Proteica , DNA/metabolismo , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , Proteínas de Ligação a DNA/genéticaRESUMO
The biological functions of DNA and RNA generally depend on their interactions with other molecules, such as small ligands, proteins and nucleic acids. However, our knowledge of the nucleic acid binding sites for different interaction partners is very limited, and identification of these critical binding regions is not a trivial work. Herein, we performed a comprehensive comparison between binding and nonbinding sites and among different categories of binding sites in these two nucleic acid classes. From the structural perspective, RNA may interact with ligands through forming binding pockets and contact proteins and nucleic acids using protruding surfaces, while DNA may adopt regions closer to the middle of the chain to make contacts with other molecules. Based on structural information, we established a feature-based ensemble learning classifier to identify the binding sites by fully using the interplay among different machine learning algorithms, feature spaces and sample spaces. Meanwhile, we designed a template-based classifier by exploiting structural conservation. The complementarity between the two classifiers motivated us to build an integrative framework for improving prediction performance. Moreover, we utilized a post-processing procedure based on the random walk algorithm to further correct the integrative predictions. Our unified prediction framework yielded promising results for different binding sites and outperformed existing methods.