Búsqueda | Portal Regional de la BVS

1.

PANDORA v2.0: Benchmarking peptide-MHC II models and software improvements.

Parizi, Farzaneh M; Marzella, Dario F; Ramakrishnan, Gayatri; 't Hoen, Peter A C; Karimi-Jafari, Mohammad Hossein; Xue, Li C.

Front Immunol ; 14: 1285899, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38143769

RESUMEN

T-cell specificity to differentiate between self and non-self relies on T-cell receptor (TCR) recognition of peptides presented by the Major Histocompatibility Complex (MHC). Investigations into the three-dimensional (3D) structures of peptide:MHC (pMHC) complexes have provided valuable insights of MHC functions. Given the limited availability of experimental pMHC structures and considerable diversity of peptides and MHC alleles, it calls for the development of efficient and reliable computational approaches for modeling pMHC structures. Here we present an update of PANDORA and the systematic evaluation of its performance in modelling 3D structures of pMHC class II complexes (pMHC-II), which play a key role in the cancer immune response. PANDORA is a modelling software that can build low-energy models in a few minutes by restraining peptide residues inside the MHC-II binding groove. We benchmarked PANDORA on 136 experimentally determined pMHC-II structures covering 44 unique αß chain pairs. Our pipeline achieves a median backbone Ligand-Root Mean Squared Deviation (L-RMSD) of 0.42 Å on the binding core and 0.88 Å on the whole peptide for the benchmark dataset. We incorporated software improvements to make PANDORA a pan-allele framework and improved the user interface and software quality. Its computational efficiency allows enriching the wealth of pMHC binding affinity and mass spectrometry data with 3D models. These models can be used as a starting point for molecular dynamics simulations or structure-boosted deep learning algorithms to identify MHC-binding peptides. PANDORA is available as a Python package through Conda or as a source installation at https://github.com/X-lab-3D/PANDORA.

Asunto(s)

Benchmarking , Péptidos , Péptidos/metabolismo , Complejo Mayor de Histocompatibilidad , Antígenos de Histocompatibilidad , Programas Informáticos

2.

Understanding structure-guided variant effect predictions using 3D convolutional neural networks.

Ramakrishnan, Gayatri; Baakman, Coos; Heijl, Stephan; Vroling, Bas; van Horck, Ragna; Hiraki, Jeffrey; Xue, Li C; Huynen, Martijn A.

Front Mol Biosci ; 10: 1204157, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37475887

RESUMEN

Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.

3.

GradPose: a very fast and memory-efficient gradient descent-based tool for superimposing millions of protein structures from computational simulations.

Rademaker, Daniel T; van Geemen, Kevin J; Xue, Li C.

Bioinformatics ; 39(8)2023 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-37471594

RESUMEN

SUMMARY: Computational simulations like molecular dynamics and docking are providing crucial insights into the dynamics and interaction conformations of proteins, complementing experimental methods for determining protein structures. These methods often generate millions of protein conformations, necessitating highly efficient structure comparison and clustering methods to analyze the results. In this article, we introduce GradPose, a fast and memory-efficient structural superimposition tool for models generated by these large-scale simulations. GradPose uses gradient descent to optimally superimpose structures by optimizing rotation quaternions and can handle insertions and deletions compared to the reference structure. It is capable of superimposing thousands to millions of protein structures on standard hardware and utilizes multiple CPU cores and, if available, CUDA acceleration to further decrease superimposition time. Our results indicate that GradPose generally outperforms traditional methods, with a speed improvement of 2-65 times and memory requirement reduction of 1.7-48 times, with larger protein structures benefiting the most. We observed that traditional methods outperformed GradPose only with very small proteins consisting of â¼20 residues. The prerequisite of GradPose is that residue-residue correspondence is predetermined. With GradPose, we aim to provide a computationally efficient solution to the challenge of efficiently handling the demand for structural alignment in the computational simulation field. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/X-lab-3D/GradPose; doi:10.5281/zenodo.7671922.

Asunto(s)

Proteínas , Programas Informáticos , Proteínas/química , Conformación Proteica , Simulación de Dinámica Molecular , Análisis por Conglomerados , Algoritmos

4.

The PANDORA Software for Anchor-Restrained Peptide:MHC Modeling.

Marzella, Dario F; Crocioni, Giulia; Parizi, Farzaneh M; Xue, Li C.

Methods Mol Biol ; 2673: 251-271, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37258920

RESUMEN

Major histocompatibility complexes (MHC) play a key role in the immune surveillance system in all jawed vertebrates. MHC class I molecules randomly sample cytosolic peptides from inside the cell, while MHC class II sample exogenous peptides. Both types of peptide:MHC complex are then presented on the cell surface for recognition by αß T cells (CD8+ and CD4+, respectively). The three-dimensional structure of such complexes can give crucial insights in the presentation and recognition mechanisms. For this reason, softwares like PANDORA have been developed to rapidly and accurately generate peptide:MHC (pMHC) 3D structures. In this chapter, we describe the protocol of PANDORA. PANDORA exploits the structural knowledge on anchor pockets that MHC molecules use to dock peptides. PANDORA provides anchor positions as restraints to guide the modeling process. This allows PANDORA to generate twenty 3D models in just about 5 min. PANDORA is highly customizable, easy to install, supports parallel processing, and is suitable to provide large datasets for deep learning algorithms.

Asunto(s)

Antígenos de Histocompatibilidad , Complejo Mayor de Histocompatibilidad , Animales , Antígenos de Histocompatibilidad Clase I/genética , Péptidos/química , Programas Informáticos

5.

MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations.

Jung, Yong; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C; Honavar, Vasant G.

Biomolecules ; 13(1)2023 01 06.

Artículo en Inglés | MEDLINE | ID: mdl-36671507

RESUMEN

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.

Asunto(s)

Aprendizaje Automático , Proteínas , Proteínas/química , Unión Proteica , Ligandos , Conformación Proteica

6.

DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces.

Réau, Manon; Renaud, Nicolas; Xue, Li C; Bonvin, Alexandre M J J.

Bioinformatics ; 39(1)2023 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-36420989

RESUMEN

MOTIVATION: Gaining structural insights into the protein-protein interactome is essential to understand biological phenomena and extract knowledge for rational drug design or protein engineering. We have previously developed DeepRank, a deep-learning framework to facilitate pattern learning from protein-protein interfaces using convolutional neural network (CNN) approaches. However, CNN is not rotation invariant and data augmentation is required to desensitize the network to the input data orientation which dramatically impairs the computation performance. Representing protein-protein complexes as atomic- or residue-scale rotation invariant graphs instead enables using graph neural networks (GNN) approaches, bypassing those limitations. RESULTS: We have developed DeepRank-GNN, a framework that converts protein-protein interfaces from PDB 3D coordinates files into graphs that are further provided to a pre-defined or user-defined GNN architecture to learn problem-specific interaction patterns. DeepRank-GNN is designed to be highly modularizable, easily customized and is wrapped into a user-friendly python3 package. Here, we showcase DeepRank-GNN's performance on two applications using a dedicated graph interaction neural network: (i) the scoring of docking poses and (ii) the discriminating of biological and crystal interfaces. In addition to the highly competitive performance obtained in those tasks as compared to state-of-the-art methods, we show a significant improvement in speed and storage requirement using DeepRank-GNN as compared to DeepRank. AVAILABILITY AND IMPLEMENTATION: DeepRank-GNN is freely available from https://github.com/DeepRank/DeepRank-GNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Redes Neurales de la Computación , Proteínas , Proteínas/química

7.

Entropy and Variability: A Second Opinion by Deep Learning.

Rademaker, Daniel T; Xue, Li C; 't Hoen, Peter A C; Vriend, Gert.

Biomolecules ; 12(12)2022 11 23.

Artículo en Inglés | MEDLINE | ID: mdl-36551168

RESUMEN

BACKGROUND: Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures of the distribution of the twenty amino acid types found at evolutionary equivalent positions: the columns in multiple sequence alignments. Commonly used measures are variability, average hydrophobicity, or Shannon entropy. One of these techniques, called entropy-variability analysis, as the name already suggests, reduces the distribution of observed residue types in one column to two numbers: the Shannon entropy and the variability as defined by the number of residue types observed. RESULTS: We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments.

Asunto(s)

Aprendizaje Profundo , Humanos , Secuencia de Aminoácidos , Proteínas/química , Aminoácidos , Derivación y Consulta , Algoritmos

8.

PANDORA: A Fast, Anchor-Restrained Modelling Protocol for Peptide: MHC Complexes.

Marzella, Dario F; Parizi, Farzaneh M; van Tilborg, Derek; Renaud, Nicolas; Sybrandi, Daan; Buzatu, Rafaella; Rademaker, Daniel T; 't Hoen, Peter A C; Xue, Li C.

Front Immunol ; 13: 878762, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35619705

RESUMEN

Deeper understanding of T-cell-mediated adaptive immune responses is important for the design of cancer immunotherapies and antiviral vaccines against pandemic outbreaks. T-cells are activated when they recognize foreign peptides that are presented on the cell surface by Major Histocompatibility Complexes (MHC), forming peptide:MHC (pMHC) complexes. 3D structures of pMHC complexes provide fundamental insight into T-cell recognition mechanism and aids immunotherapy design. High MHC and peptide diversities necessitate efficient computational modelling to enable whole proteome structural analysis. We developed PANDORA, a generic modelling pipeline for pMHC class I and II (pMHC-I and pMHC-II), and present its performance on pMHC-I here. Given a query, PANDORA searches for structural templates in its extensive database and then applies anchor restraints to the modelling process. This restrained energy minimization ensures one of the fastest pMHC modelling pipelines so far. On a set of 835 pMHC-I complexes over 78 MHC types, PANDORA generated models with a median RMSD of 0.70 Å and achieved a 93% success rate in top 10 models. PANDORA performs competitively with three pMHC-I modelling state-of-the-art approaches and outperforms AlphaFold2 in terms of accuracy while being superior to it in speed. PANDORA is a modularized and user-configurable python package with easy installation. We envision PANDORA to fuel deep learning algorithms with large-scale high-quality 3D models to tackle long-standing immunology challenges.

Asunto(s)

Antígenos de Histocompatibilidad , Complejo Mayor de Histocompatibilidad , Antígenos de Histocompatibilidad/química , Modelos Moleculares , Péptidos , Receptores de Antígenos de Linfocitos T

9.

DeepRank: a deep learning framework for data mining 3D protein-protein interfaces.

Renaud, Nicolas; Geng, Cunliang; Georgievska, Sonja; Ambrosetti, Francesco; Ridder, Lars; Marzella, Dario F; Réau, Manon F; Bonvin, Alexandre M J J; Xue, Li C.

Nat Commun ; 12(1): 7068, 2021 12 03.

Artículo en Inglés | MEDLINE | ID: mdl-34862392

RESUMEN

Three-dimensional (3D) structures of protein complexes provide fundamental information to decipher biological processes at the molecular scale. The vast amount of experimentally and computationally resolved protein-protein interfaces (PPIs) offers the possibility of training deep learning models to aid the predictions of their biological relevance. We present here DeepRank, a general, configurable deep learning framework for data mining PPIs using 3D convolutional neural networks (CNNs). DeepRank maps features of PPIs onto 3D grids and trains a user-specified CNN on these 3D grids. DeepRank allows for efficient training of 3D CNNs with data sets containing millions of PPIs and supports both classification and regression. We demonstrate the performance of DeepRank on two distinct challenges: The classification of biological versus crystallographic PPIs, and the ranking of docking models. For both problems DeepRank is competitive with, or outperforms, state-of-the-art methods, demonstrating the versatility of the framework for research in structural biology.

Asunto(s)

Minería de Datos/métodos , Aprendizaje Profundo , Mapeo de Interacción de Proteínas/métodos , Cristalografía , Conjuntos de Datos como Asunto , Simulación del Acoplamiento Molecular , Dominios y Motivos de Interacción de Proteínas , Mapas de Interacción de Proteínas

10.

iScore: An MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and support vector machines.

Renaud, Nicolas; Jung, Yong; Honavar, Vasant; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C.

SoftwareX ; 112020.

Artículo en Inglés | MEDLINE | ID: mdl-35419466

RESUMEN

Computational docking is a promising tool to model three-dimensional (3D) structures of protein-protein complexes, which provides fundamental insights of protein functions in the cellular life. Singling out near-native models from the huge pool of generated docking models (referred to as the scoring problem) remains as a major challenge in computational docking. We recently published iScore, a novel graph kernel based scoring function. iScore ranks docking models based on their interface graph similarities to the training interface graph set. iScore uses a support vector machine approach with random-walk graph kernels to classify and rank protein-protein interfaces. Here, we present the software for iScore. The software provides executable scripts that fully automate the computational workflow. In addition, the creation and analysis of the interface graph can be distributed across different processes using Message Passing interface (MPI) and can be offloaded to GPUs thanks to dedicated CUDA kernels.

11.

iScore: a novel graph kernel-based function for scoring protein-protein docking models.

Geng, Cunliang; Jung, Yong; Renaud, Nicolas; Honavar, Vasant; Bonvin, Alexandre M J J; Xue, Li C.

Bioinformatics ; 36(1): 112-121, 2020 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31199455

RESUMEN

MOTIVATION: Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. RESULTS: Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. AVAILABILITY AND IMPLEMENTATION: The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Biología Computacional , Simulación del Acoplamiento Molecular , Proteínas , Biología Computacional/métodos , Simulación del Acoplamiento Molecular/métodos , Unión Proteica , Conformación Proteica , Proteínas/química , Proteínas/metabolismo , Programas Informáticos

12.

An overview of data-driven HADDOCK strategies in CAPRI rounds 38-45.

Koukos, Panagiotis I; Roel-Touris, Jorge; Ambrosetti, Francesco; Geng, Cunliang; Schaarschmidt, Jörg; Trellet, Mikael E; Melquiond, Adrien S J; Xue, Li C; Honorato, Rodrigo V; Moreira, Irina; Kurkcuoglu, Zeynep; Vangone, Anna; Bonvin, Alexandre M J J.

Proteins ; 88(8): 1029-1036, 2020 08.

Artículo en Inglés | MEDLINE | ID: mdl-31886559

RESUMEN

Our information-driven docking approach HADDOCK has demonstrated a sustained performance since the start of its participation to CAPRI. This is due, in part, to its ability to integrate data into the modeling process, and to the robustness of its scoring function. We participated in CAPRI both as server and manual predictors. In CAPRI rounds 38-45, we have used various strategies depending on the available information. These ranged from imposing restraints to a few residues identified from literature as being important for the interaction, to binding pockets identified from homologous complexes or template-based refinement/CA-CA restraint-guided docking from identified templates. When relevant, symmetry restraints were used to limit the conformational sampling. We also tested for a large decamer target a new implementation of the MARTINI coarse-grained force field in HADDOCK. Overall, we obtained acceptable or better predictions for 13 and 11 server and manual submissions, respectively, out of the 22 interfaces. Our server performance (acceptable or higher-quality models when considering the top 10) was better (59%) than the manual (50%) one, in which we typically experiment with various combinations of protocols and data sources. Again, our simple scoring function based on a linear combination of intermolecular van der Waals and electrostatic energies and an empirical desolvation term demonstrated a good performance in the scoring experiment with a 63% success rate across all 22 interfaces. An analysis of model quality indicates that, while we are consistently performing well in generating acceptable models, there is room for improvement for generating/identifying higher quality models.

Asunto(s)

Simulación del Acoplamiento Molecular , Péptidos/química , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Sitios de Unión , Humanos , Ligandos , Péptidos/metabolismo , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas , Multimerización de Proteína , Proteínas/metabolismo , Proyectos de Investigación , Homología Estructural de Proteína , Termodinámica

13.

Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server.

Vangone, Anna; Schaarschmidt, Joerg; Koukos, Panagiotis; Geng, Cunliang; Citro, Nevia; Trellet, Mikael E; Xue, Li C; Bonvin, Alexandre M J J.

Bioinformatics ; 35(9): 1585-1587, 2019 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-31051038

RESUMEN

SUMMARY: Recently we published PROtein binDIng enerGY (PRODIGY), a web-server for the prediction of binding affinity in protein-protein complexes. By using a combination of simple structural properties, such as the residue-contacts made at the interface, PRODIGY has demonstrated a top performance compared with other state-of-the-art predictors in the literature. Here we present an extension of it, named PRODIGY-LIG, aimed at the prediction of affinity in protein-small ligand complexes. The predictive method, properly readapted for small ligand by making use of atomic instead of residue contacts, has been successfully applied for the blind prediction of 102 protein-ligand complexes during the D3R Grand Challenge 2. PRODIGY-LIG has the advantage of being simple, generic and applicable to any kind of protein-ligand complex. It provides an automatic, fast and user-friendly tool ensuring broad accessibility. AVAILABILITY AND IMPLEMENTATION: PRODIGY-LIG is freely available without registration requirements at http://milou.science.uu.nl/services/PRODIGY-LIG.

Asunto(s)

Computadores , Programas Informáticos , Sitios de Unión , Internet , Ligandos , Unión Proteica , Conformación Proteica

14.

Protein-ligand pose and affinity prediction: Lessons from D3R Grand Challenge 3.

Koukos, Panagiotis I; Xue, Li C; Bonvin, Alexandre M J J.

J Comput Aided Mol Des ; 33(1): 83-91, 2019 01.

Artículo en Inglés | MEDLINE | ID: mdl-30128928

RESUMEN

We report the performance of HADDOCK in the 2018 iteration of the Grand Challenge organised by the D3R consortium. Building on the findings of our participation in last year's challenge, we significantly improved our pose prediction protocol which resulted in a mean RMSD for the top scoring pose of 3.04 and 2.67 Å for the cross-docking and self-docking experiments respectively, which corresponds to an overall success rate of 63% and 71% when considering the top1 and top5 models respectively. This performance ranks HADDOCK as the 6th and 3rd best performing group (excluding multiple submissions from a same group) out of a total of 44 and 47 submissions respectively. Our ligand-based binding affinity predictor is the 3rd best predictor overall, behind only the two leading structure-based implementations, and the best ligand-based one with a Kendall's Tau correlation of 0.36 for the Cathepsin challenge. It also performed well in the classification part of the Kinase challenges, with Matthews Correlation Coefficients of 0.49 (ranked 1st), 0.39 (ranked 4th) and 0.21 (ranked 4th) for the JAK2, vEGFR2 and p38a targets respectively. Through our participation in last year's competition we came to the conclusion that template selection is of critical importance for the successful outcome of the docking. This year we have made improvements in two additional areas of importance: ligand conformer selection and initial positioning, which have been key to our excellent pose prediction performance this year.

Asunto(s)

Catepsinas/química , Simulación del Acoplamiento Molecular/métodos , Proteínas Quinasas/química , Sitios de Unión , Diseño Asistido por Computadora , Cristalografía por Rayos X , Bases de Datos de Proteínas , Diseño de Fármacos , Ligandos , Conformación Molecular , Unión Proteica , Termodinámica

15.

iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations.

Geng, Cunliang; Vangone, Anna; Folkers, Gert E; Xue, Li C; Bonvin, Alexandre M J J.

Proteins ; 87(2): 110-119, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30417935

RESUMEN

Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning-based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy-based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein-protein complexes. It competes with existing state-of-the-art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2-p53 complex.

Asunto(s)

Biología Computacional/métodos , Aprendizaje Automático , Mutación , Proteínas/genética , Unión Competitiva , Evolución Molecular , Modelos Moleculares , Unión Proteica , Dominios Proteicos , Proteínas/química , Proteínas/metabolismo , Proteínas Proto-Oncogénicas c-mdm2/química , Proteínas Proto-Oncogénicas c-mdm2/genética , Proteínas Proto-Oncogénicas c-mdm2/metabolismo , Termodinámica , Proteína p53 Supresora de Tumor/química , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo

16.

Performance of HADDOCK and a simple contact-based protein-ligand binding affinity predictor in the D3R Grand Challenge 2.

Kurkcuoglu, Zeynep; Koukos, Panagiotis I; Citro, Nevia; Trellet, Mikael E; Rodrigues, J P G L M; Moreira, Irina S; Roel-Touris, Jorge; Melquiond, Adrien S J; Geng, Cunliang; Schaarschmidt, Jörg; Xue, Li C; Vangone, Anna; Bonvin, A M J J.

J Comput Aided Mol Des ; 32(1): 175-185, 2018 01.

Artículo en Inglés | MEDLINE | ID: mdl-28831657

RESUMEN

We present the performance of HADDOCK, our information-driven docking software, in the second edition of the D3R Grand Challenge. In this blind experiment, participants were requested to predict the structures and binding affinities of complexes between the Farnesoid X nuclear receptor and 102 different ligands. The models obtained in Stage1 with HADDOCK and ligand-specific protocol show an average ligand RMSD of 5.1 Å from the crystal structure. Only 6/35 targets were within 2.5 Å RMSD from the reference, which prompted us to investigate the limiting factors and revise our protocol for Stage2. The choice of the receptor conformation appeared to have the strongest influence on the results. Our Stage2 models were of higher quality (13 out of 35 were within 2.5 Å), with an average RMSD of 4.1 Å. The docking protocol was applied to all 102 ligands to generate poses for binding affinity prediction. We developed a modified version of our contact-based binding affinity predictor PRODIGY, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy. This simple structure-based binding affinity predictor shows a Kendall's Tau correlation of 0.37 in ranking the ligands (7th best out of 77 methods, 5th/25 groups). Those results were obtained from the average prediction over the top10 poses, irrespective of their similarity/correctness, underscoring the robustness of our simple predictor. This results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.

Asunto(s)

Descubrimiento de Drogas , Simulación del Acoplamiento Molecular , Receptores Citoplasmáticos y Nucleares/metabolismo , Programas Informáticos , Sitios de Unión , Diseño Asistido por Computadora , Cristalografía por Rayos X , Diseño de Fármacos , Humanos , Ligandos , Unión Proteica , Conformación Proteica , Receptores Citoplasmáticos y Nucleares/agonistas , Receptores Citoplasmáticos y Nucleares/antagonistas & inhibidores , Receptores Citoplasmáticos y Nucleares/química , Termodinámica

17.

Template-based protein-protein docking exploiting pairwise interfacial residue restraints.

Xue, Li C; Rodrigues, João P G L M; Dobbs, Drena; Honavar, Vasant; Bonvin, Alexandre M J J.

Brief Bioinform ; 18(3): 458-466, 2017 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-27013645

RESUMEN

Although many advanced and sophisticated ab initio approaches for modeling protein-protein complexes have been proposed in past decades, template-based modeling (TBM) remains the most accurate and widely used approach, given a reliable template is available. However, there are many different ways to exploit template information in the modeling process. Here, we systematically evaluate and benchmark a TBM method that uses conserved interfacial residue pairs as docking distance restraints [referred to as alpha carbon-alpha carbon (CA-CA)-guided docking]. We compare it with two other template-based protein-protein modeling approaches, including a conserved non-pairwise interfacial residue restrained docking approach [referred to as the ambiguous interaction restraint (AIR)-guided docking] and a simple superposition-based modeling approach. Our results show that, for most cases, the CA-CA-guided docking method outperforms both superposition with refinement and the AIR-guided docking method. We emphasize the superiority of the CA-CA-guided docking on cases with medium to large conformational changes, and interactions mediated through loops, tails or disordered regions. Our results also underscore the importance of a proper refinement of superimposition models to reduce steric clashes. In summary, we provide a benchmarked TBM protocol that uses conserved pairwise interface distance as restraints in generating realistic 3D protein-protein interaction models, when reliable templates are available. The described CA-CA-guided docking protocol is based on the HADDOCK platform, which allows users to incorporate additional prior knowledge of the target system to further improve the quality of the resulting models.

Asunto(s)

Proteínas/metabolismo , Modelos Moleculares , Unión Proteica

18.

PRODIGY: a web server for predicting the binding affinity of protein-protein complexes.

Xue, Li C; Rodrigues, João Pglm; Kastritis, Panagiotis L; Bonvin, Alexandre Mjj; Vangone, Anna.

Bioinformatics ; 32(23): 3676-3678, 2016 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-27503228

RESUMEN

Gaining insights into the structural determinants of protein-protein interactions holds the key for a deeper understanding of biological functions, diseases and development of therapeutics. An important aspect of this is the ability to accurately predict the binding strength for a given protein-protein complex. Here we present PROtein binDIng enerGY prediction (PRODIGY), a web server to predict the binding affinity of protein-protein complexes from their 3D structure. The PRODIGY server implements our simple but highly effective predictive model based on intermolecular contacts and properties derived from non-interface surface. AVAILABILITY AND IMPLEMENTATION: PRODIGY is freely available at: http://milou.science.uu.nl/services/PRODIGY CONTACT: a.m.j.j.bonvin@uu.nl, a.vangone@uu.nl.

Asunto(s)

Biología Computacional/métodos , Internet , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Unión Proteica , Conformación Proteica

19.

Computational prediction of protein interfaces: A review of data driven methods.

Xue, Li C; Dobbs, Drena; Bonvin, Alexandre M J J; Honavar, Vasant.

FEBS Lett ; 589(23): 3516-26, 2015 Nov 30.

Artículo en Inglés | MEDLINE | ID: mdl-26460190

RESUMEN

Reliably pinpointing which specific amino acid residues form the interface(s) between a protein and its binding partner(s) is critical for understanding the structural and physicochemical determinants of protein recognition and binding affinity, and has wide applications in modeling and validating protein interactions predicted by high-throughput methods, in engineering proteins, and in prioritizing drug targets. Here, we review the basic concepts, principles and recent advances in computational approaches to the analysis and prediction of protein-protein interfaces. We point out caveats for objectively evaluating interface predictors, and discuss various applications of data-driven interface predictors for improving energy model-driven protein-protein docking. Finally, we stress the importance of exploiting binding partner information in reliably predicting interfaces and highlight recent advances in this emerging direction.

Asunto(s)

Biología Computacional/métodos , Proteínas/metabolismo , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/química , Especificidad por Sustrato

20.

RNABindRPlus: a predictor that combines machine learning and sequence homology-based methods to improve the reliability of predicted RNA-binding residues in proteins.

Walia, Rasna R; Xue, Li C; Wilkins, Katherine; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant.

PLoS One ; 9(5): e97725, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-24846307

RESUMEN

Protein-RNA interactions are central to essential cellular processes such as protein synthesis and regulation of gene expression and play roles in human infectious and genetic diseases. Reliable identification of protein-RNA interfaces is critical for understanding the structural bases and functional implications of such interactions and for developing effective approaches to rational drug design. Sequence-based computational methods offer a viable, cost-effective way to identify putative RNA-binding residues in RNA-binding proteins. Here we report two novel approaches: (i) HomPRIP, a sequence homology-based method for predicting RNA-binding sites in proteins; (ii) RNABindRPlus, a new method that combines predictions from HomPRIP with those from an optimized Support Vector Machine (SVM) classifier trained on a benchmark dataset of 198 RNA-binding proteins. Although highly reliable, HomPRIP cannot make predictions for the unaligned parts of query proteins and its coverage is limited by the availability of close sequence homologs of the query protein with experimentally determined RNA-binding sites. RNABindRPlus overcomes these limitations. We compared the performance of HomPRIP and RNABindRPlus with that of several state-of-the-art predictors on two test sets, RB44 and RB111. On a subset of proteins for which homologs with experimentally determined interfaces could be reliably identified, HomPRIP outperformed all other methods achieving an MCC of 0.63 on RB44 and 0.83 on RB111. RNABindRPlus was able to predict RNA-binding residues of all proteins in both test sets, achieving an MCC of 0.55 and 0.37, respectively, and outperforming all other methods, including those that make use of structure-derived features of proteins. More importantly, RNABindRPlus outperforms all other methods for any choice of tradeoff between precision and recall. An important advantage of both HomPRIP and RNABindRPlus is that they rely on readily available sequence and sequence-derived features of RNA-binding proteins. A webserver implementation of both methods is freely available at http://einstein.cs.iastate.edu/RNABindRPlus/.

Asunto(s)

Inteligencia Artificial , Modelos Teóricos , Proteínas de Unión al ARN/genética , Análisis de Secuencia de Proteína/métodos , Análisis de Secuencia de ARN/métodos , Animales , Humanos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA