Pesquisa | Secretaria de Estado da Saúde

1.

GradPose: a very fast and memory-efficient gradient descent-based tool for superimposing millions of protein structures from computational simulations.

Rademaker, Daniel T; van Geemen, Kevin J; Xue, Li C.

Bioinformatics ; 39(8)2023 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-37471594

RESUMO

SUMMARY: Computational simulations like molecular dynamics and docking are providing crucial insights into the dynamics and interaction conformations of proteins, complementing experimental methods for determining protein structures. These methods often generate millions of protein conformations, necessitating highly efficient structure comparison and clustering methods to analyze the results. In this article, we introduce GradPose, a fast and memory-efficient structural superimposition tool for models generated by these large-scale simulations. GradPose uses gradient descent to optimally superimpose structures by optimizing rotation quaternions and can handle insertions and deletions compared to the reference structure. It is capable of superimposing thousands to millions of protein structures on standard hardware and utilizes multiple CPU cores and, if available, CUDA acceleration to further decrease superimposition time. Our results indicate that GradPose generally outperforms traditional methods, with a speed improvement of 2-65 times and memory requirement reduction of 1.7-48 times, with larger protein structures benefiting the most. We observed that traditional methods outperformed GradPose only with very small proteins consisting of â¼20 residues. The prerequisite of GradPose is that residue-residue correspondence is predetermined. With GradPose, we aim to provide a computationally efficient solution to the challenge of efficiently handling the demand for structural alignment in the computational simulation field. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/X-lab-3D/GradPose; doi:10.5281/zenodo.7671922.

Assuntos

Proteínas , Software , Proteínas/química , Conformação Proteica , Simulação de Dinâmica Molecular , Análise por Conglomerados , Algoritmos

2.

DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces.

Réau, Manon; Renaud, Nicolas; Xue, Li C; Bonvin, Alexandre M J J.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36420989

RESUMO

MOTIVATION: Gaining structural insights into the protein-protein interactome is essential to understand biological phenomena and extract knowledge for rational drug design or protein engineering. We have previously developed DeepRank, a deep-learning framework to facilitate pattern learning from protein-protein interfaces using convolutional neural network (CNN) approaches. However, CNN is not rotation invariant and data augmentation is required to desensitize the network to the input data orientation which dramatically impairs the computation performance. Representing protein-protein complexes as atomic- or residue-scale rotation invariant graphs instead enables using graph neural networks (GNN) approaches, bypassing those limitations. RESULTS: We have developed DeepRank-GNN, a framework that converts protein-protein interfaces from PDB 3D coordinates files into graphs that are further provided to a pre-defined or user-defined GNN architecture to learn problem-specific interaction patterns. DeepRank-GNN is designed to be highly modularizable, easily customized and is wrapped into a user-friendly python3 package. Here, we showcase DeepRank-GNN's performance on two applications using a dedicated graph interaction neural network: (i) the scoring of docking poses and (ii) the discriminating of biological and crystal interfaces. In addition to the highly competitive performance obtained in those tasks as compared to state-of-the-art methods, we show a significant improvement in speed and storage requirement using DeepRank-GNN as compared to DeepRank. AVAILABILITY AND IMPLEMENTATION: DeepRank-GNN is freely available from https://github.com/DeepRank/DeepRank-GNN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Neurais de Computação , Proteínas , Proteínas/química

3.

iScore: a novel graph kernel-based function for scoring protein-protein docking models.

Geng, Cunliang; Jung, Yong; Renaud, Nicolas; Honavar, Vasant; Bonvin, Alexandre M J J; Xue, Li C.

Bioinformatics ; 36(1): 112-121, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31199455

RESUMO

MOTIVATION: Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. RESULTS: Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. AVAILABILITY AND IMPLEMENTATION: The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Biologia Computacional , Simulação de Acoplamento Molecular , Proteínas , Biologia Computacional/métodos , Simulação de Acoplamento Molecular/métodos , Ligação Proteica , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Software

4.

An overview of data-driven HADDOCK strategies in CAPRI rounds 38-45.

Koukos, Panagiotis I; Roel-Touris, Jorge; Ambrosetti, Francesco; Geng, Cunliang; Schaarschmidt, Jörg; Trellet, Mikael E; Melquiond, Adrien S J; Xue, Li C; Honorato, Rodrigo V; Moreira, Irina; Kurkcuoglu, Zeynep; Vangone, Anna; Bonvin, Alexandre M J J.

Proteins ; 88(8): 1029-1036, 2020 08.

Artigo em Inglês | MEDLINE | ID: mdl-31886559

RESUMO

Our information-driven docking approach HADDOCK has demonstrated a sustained performance since the start of its participation to CAPRI. This is due, in part, to its ability to integrate data into the modeling process, and to the robustness of its scoring function. We participated in CAPRI both as server and manual predictors. In CAPRI rounds 38-45, we have used various strategies depending on the available information. These ranged from imposing restraints to a few residues identified from literature as being important for the interaction, to binding pockets identified from homologous complexes or template-based refinement/CA-CA restraint-guided docking from identified templates. When relevant, symmetry restraints were used to limit the conformational sampling. We also tested for a large decamer target a new implementation of the MARTINI coarse-grained force field in HADDOCK. Overall, we obtained acceptable or better predictions for 13 and 11 server and manual submissions, respectively, out of the 22 interfaces. Our server performance (acceptable or higher-quality models when considering the top 10) was better (59%) than the manual (50%) one, in which we typically experiment with various combinations of protocols and data sources. Again, our simple scoring function based on a linear combination of intermolecular van der Waals and electrostatic energies and an empirical desolvation term demonstrated a good performance in the scoring experiment with a 63% success rate across all 22 interfaces. An analysis of model quality indicates that, while we are consistently performing well in generating acceptable models, there is room for improvement for generating/identifying higher quality models.

Assuntos

Simulação de Acoplamento Molecular , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Sítios de Ligação , Humanos , Ligantes , Peptídeos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Multimerização Proteica , Proteínas/metabolismo , Projetos de Pesquisa , Homologia Estrutural de Proteína , Termodinâmica

5.

Large-scale prediction of binding affinity in protein-small ligand complexes: the PRODIGY-LIG web server.

Vangone, Anna; Schaarschmidt, Joerg; Koukos, Panagiotis; Geng, Cunliang; Citro, Nevia; Trellet, Mikael E; Xue, Li C; Bonvin, Alexandre M J J.

Bioinformatics ; 35(9): 1585-1587, 2019 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-31051038

RESUMO

SUMMARY: Recently we published PROtein binDIng enerGY (PRODIGY), a web-server for the prediction of binding affinity in protein-protein complexes. By using a combination of simple structural properties, such as the residue-contacts made at the interface, PRODIGY has demonstrated a top performance compared with other state-of-the-art predictors in the literature. Here we present an extension of it, named PRODIGY-LIG, aimed at the prediction of affinity in protein-small ligand complexes. The predictive method, properly readapted for small ligand by making use of atomic instead of residue contacts, has been successfully applied for the blind prediction of 102 protein-ligand complexes during the D3R Grand Challenge 2. PRODIGY-LIG has the advantage of being simple, generic and applicable to any kind of protein-ligand complex. It provides an automatic, fast and user-friendly tool ensuring broad accessibility. AVAILABILITY AND IMPLEMENTATION: PRODIGY-LIG is freely available without registration requirements at http://milou.science.uu.nl/services/PRODIGY-LIG.

Assuntos

Computadores , Software , Sítios de Ligação , Internet , Ligantes , Ligação Proteica , Conformação Proteica

6.

iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations.

Geng, Cunliang; Vangone, Anna; Folkers, Gert E; Xue, Li C; Bonvin, Alexandre M J J.

Proteins ; 87(2): 110-119, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30417935

RESUMO

Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning-based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy-based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein-protein complexes. It competes with existing state-of-the-art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2-p53 complex.

Assuntos

Biologia Computacional/métodos , Aprendizado de Máquina , Mutação , Proteínas/genética , Ligação Competitiva , Evolução Molecular , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Proteínas/química , Proteínas/metabolismo , Proteínas Proto-Oncogênicas c-mdm2/química , Proteínas Proto-Oncogênicas c-mdm2/genética , Proteínas Proto-Oncogênicas c-mdm2/metabolismo , Termodinâmica , Proteína Supressora de Tumor p53/química , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo

7.

Template-based protein-protein docking exploiting pairwise interfacial residue restraints.

Xue, Li C; Rodrigues, João P G L M; Dobbs, Drena; Honavar, Vasant; Bonvin, Alexandre M J J.

Brief Bioinform ; 18(3): 458-466, 2017 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-27013645

RESUMO

Although many advanced and sophisticated ab initio approaches for modeling protein-protein complexes have been proposed in past decades, template-based modeling (TBM) remains the most accurate and widely used approach, given a reliable template is available. However, there are many different ways to exploit template information in the modeling process. Here, we systematically evaluate and benchmark a TBM method that uses conserved interfacial residue pairs as docking distance restraints [referred to as alpha carbon-alpha carbon (CA-CA)-guided docking]. We compare it with two other template-based protein-protein modeling approaches, including a conserved non-pairwise interfacial residue restrained docking approach [referred to as the ambiguous interaction restraint (AIR)-guided docking] and a simple superposition-based modeling approach. Our results show that, for most cases, the CA-CA-guided docking method outperforms both superposition with refinement and the AIR-guided docking method. We emphasize the superiority of the CA-CA-guided docking on cases with medium to large conformational changes, and interactions mediated through loops, tails or disordered regions. Our results also underscore the importance of a proper refinement of superimposition models to reduce steric clashes. In summary, we provide a benchmarked TBM protocol that uses conserved pairwise interface distance as restraints in generating realistic 3D protein-protein interaction models, when reliable templates are available. The described CA-CA-guided docking protocol is based on the HADDOCK platform, which allows users to incorporate additional prior knowledge of the target system to further improve the quality of the resulting models.

Assuntos

Proteínas/metabolismo , Modelos Moleculares , Ligação Proteica

8.

Protein-ligand pose and affinity prediction: Lessons from D3R Grand Challenge 3.

Koukos, Panagiotis I; Xue, Li C; Bonvin, Alexandre M J J.

J Comput Aided Mol Des ; 33(1): 83-91, 2019 01.

Artigo em Inglês | MEDLINE | ID: mdl-30128928

RESUMO

We report the performance of HADDOCK in the 2018 iteration of the Grand Challenge organised by the D3R consortium. Building on the findings of our participation in last year's challenge, we significantly improved our pose prediction protocol which resulted in a mean RMSD for the top scoring pose of 3.04 and 2.67 Å for the cross-docking and self-docking experiments respectively, which corresponds to an overall success rate of 63% and 71% when considering the top1 and top5 models respectively. This performance ranks HADDOCK as the 6th and 3rd best performing group (excluding multiple submissions from a same group) out of a total of 44 and 47 submissions respectively. Our ligand-based binding affinity predictor is the 3rd best predictor overall, behind only the two leading structure-based implementations, and the best ligand-based one with a Kendall's Tau correlation of 0.36 for the Cathepsin challenge. It also performed well in the classification part of the Kinase challenges, with Matthews Correlation Coefficients of 0.49 (ranked 1st), 0.39 (ranked 4th) and 0.21 (ranked 4th) for the JAK2, vEGFR2 and p38a targets respectively. Through our participation in last year's competition we came to the conclusion that template selection is of critical importance for the successful outcome of the docking. This year we have made improvements in two additional areas of importance: ligand conformer selection and initial positioning, which have been key to our excellent pose prediction performance this year.

Assuntos

Catepsinas/química , Simulação de Acoplamento Molecular/métodos , Proteínas Quinases/química , Sítios de Ligação , Desenho Assistido por Computador , Cristalografia por Raios X , Bases de Dados de Proteínas , Desenho de Fármacos , Ligantes , Conformação Molecular , Ligação Proteica , Termodinâmica

9.

Performance of HADDOCK and a simple contact-based protein-ligand binding affinity predictor in the D3R Grand Challenge 2.

Kurkcuoglu, Zeynep; Koukos, Panagiotis I; Citro, Nevia; Trellet, Mikael E; Rodrigues, J P G L M; Moreira, Irina S; Roel-Touris, Jorge; Melquiond, Adrien S J; Geng, Cunliang; Schaarschmidt, Jörg; Xue, Li C; Vangone, Anna; Bonvin, A M J J.

J Comput Aided Mol Des ; 32(1): 175-185, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-28831657

RESUMO

We present the performance of HADDOCK, our information-driven docking software, in the second edition of the D3R Grand Challenge. In this blind experiment, participants were requested to predict the structures and binding affinities of complexes between the Farnesoid X nuclear receptor and 102 different ligands. The models obtained in Stage1 with HADDOCK and ligand-specific protocol show an average ligand RMSD of 5.1 Å from the crystal structure. Only 6/35 targets were within 2.5 Å RMSD from the reference, which prompted us to investigate the limiting factors and revise our protocol for Stage2. The choice of the receptor conformation appeared to have the strongest influence on the results. Our Stage2 models were of higher quality (13 out of 35 were within 2.5 Å), with an average RMSD of 4.1 Å. The docking protocol was applied to all 102 ligands to generate poses for binding affinity prediction. We developed a modified version of our contact-based binding affinity predictor PRODIGY, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy. This simple structure-based binding affinity predictor shows a Kendall's Tau correlation of 0.37 in ranking the ligands (7th best out of 77 methods, 5th/25 groups). Those results were obtained from the average prediction over the top10 poses, irrespective of their similarity/correctness, underscoring the robustness of our simple predictor. This results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.

Assuntos

Descoberta de Drogas , Simulação de Acoplamento Molecular , Receptores Citoplasmáticos e Nucleares/metabolismo , Software , Sítios de Ligação , Desenho Assistido por Computador , Cristalografia por Raios X , Desenho de Fármacos , Humanos , Ligantes , Ligação Proteica , Conformação Proteica , Receptores Citoplasmáticos e Nucleares/agonistas , Receptores Citoplasmáticos e Nucleares/antagonistas & inibidores , Receptores Citoplasmáticos e Nucleares/química , Termodinâmica

10.

PRODIGY: a web server for predicting the binding affinity of protein-protein complexes.

Xue, Li C; Rodrigues, João Pglm; Kastritis, Panagiotis L; Bonvin, Alexandre Mjj; Vangone, Anna.

Bioinformatics ; 32(23): 3676-3678, 2016 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-27503228

RESUMO

Gaining insights into the structural determinants of protein-protein interactions holds the key for a deeper understanding of biological functions, diseases and development of therapeutics. An important aspect of this is the ability to accurately predict the binding strength for a given protein-protein complex. Here we present PROtein binDIng enerGY prediction (PRODIGY), a web server to predict the binding affinity of protein-protein complexes from their 3D structure. The PRODIGY server implements our simple but highly effective predictive model based on intermolecular contacts and properties derived from non-interface surface. AVAILABILITY AND IMPLEMENTATION: PRODIGY is freely available at: http://milou.science.uu.nl/services/PRODIGY CONTACT: a.m.j.j.bonvin@uu.nl, a.vangone@uu.nl.

Assuntos

Biologia Computacional/métodos , Internet , Mapeamento de Interação de Proteínas/métodos , Software , Ligação Proteica , Conformação Proteica

11.

DockRank: ranking docked conformations using partner-specific sequence homology-based protein interface prediction.

Xue, Li C; Jordan, Rafael A; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant.

Proteins ; 82(2): 250-67, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23873600

RESUMO

Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/.

Assuntos

Simulação de Acoplamento Molecular , Software , Ligantes , Domínios e Motivos de Interação entre Proteínas , Estrutura Quaternária de Proteína , Receptores de Superfície Celular/química , Homologia de Sequência de Aminoácidos , Homologia Estrutural de Proteína , Termodinâmica

12.

The PANDORA Software for Anchor-Restrained Peptide:MHC Modeling.

Marzella, Dario F; Crocioni, Giulia; Parizi, Farzaneh M; Xue, Li C.

Methods Mol Biol ; 2673: 251-271, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37258920

RESUMO

Major histocompatibility complexes (MHC) play a key role in the immune surveillance system in all jawed vertebrates. MHC class I molecules randomly sample cytosolic peptides from inside the cell, while MHC class II sample exogenous peptides. Both types of peptide:MHC complex are then presented on the cell surface for recognition by αß T cells (CD8+ and CD4+, respectively). The three-dimensional structure of such complexes can give crucial insights in the presentation and recognition mechanisms. For this reason, softwares like PANDORA have been developed to rapidly and accurately generate peptide:MHC (pMHC) 3D structures. In this chapter, we describe the protocol of PANDORA. PANDORA exploits the structural knowledge on anchor pockets that MHC molecules use to dock peptides. PANDORA provides anchor positions as restraints to guide the modeling process. This allows PANDORA to generate twenty 3D models in just about 5 min. PANDORA is highly customizable, easy to install, supports parallel processing, and is suitable to provide large datasets for deep learning algorithms.

Assuntos

Antígenos de Histocompatibilidade , Complexo Principal de Histocompatibilidade , Animais , Antígenos de Histocompatibilidade Classe I/genética , Peptídeos/química , Software

13.

MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations.

Jung, Yong; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C; Honavar, Vasant G.

Biomolecules ; 13(1)2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36671507

RESUMO

Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.

Assuntos

Aprendizado de Máquina , Proteínas , Proteínas/química , Ligação Proteica , Ligantes , Conformação Proteica

14.

Understanding structure-guided variant effect predictions using 3D convolutional neural networks.

Ramakrishnan, Gayatri; Baakman, Coos; Heijl, Stephan; Vroling, Bas; van Horck, Ragna; Hiraki, Jeffrey; Xue, Li C; Huynen, Martijn A.

Front Mol Biosci ; 10: 1204157, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37475887

RESUMO

Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.

15.

PANDORA v2.0: Benchmarking peptide-MHC II models and software improvements.

Parizi, Farzaneh M; Marzella, Dario F; Ramakrishnan, Gayatri; 't Hoen, Peter A C; Karimi-Jafari, Mohammad Hossein; Xue, Li C.

Front Immunol ; 14: 1285899, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38143769

RESUMO

T-cell specificity to differentiate between self and non-self relies on T-cell receptor (TCR) recognition of peptides presented by the Major Histocompatibility Complex (MHC). Investigations into the three-dimensional (3D) structures of peptide:MHC (pMHC) complexes have provided valuable insights of MHC functions. Given the limited availability of experimental pMHC structures and considerable diversity of peptides and MHC alleles, it calls for the development of efficient and reliable computational approaches for modeling pMHC structures. Here we present an update of PANDORA and the systematic evaluation of its performance in modelling 3D structures of pMHC class II complexes (pMHC-II), which play a key role in the cancer immune response. PANDORA is a modelling software that can build low-energy models in a few minutes by restraining peptide residues inside the MHC-II binding groove. We benchmarked PANDORA on 136 experimentally determined pMHC-II structures covering 44 unique αß chain pairs. Our pipeline achieves a median backbone Ligand-Root Mean Squared Deviation (L-RMSD) of 0.42 Å on the binding core and 0.88 Å on the whole peptide for the benchmark dataset. We incorporated software improvements to make PANDORA a pan-allele framework and improved the user interface and software quality. Its computational efficiency allows enriching the wealth of pMHC binding affinity and mass spectrometry data with 3D models. These models can be used as a starting point for molecular dynamics simulations or structure-boosted deep learning algorithms to identify MHC-binding peptides. PANDORA is available as a Python package through Conda or as a source installation at https://github.com/X-lab-3D/PANDORA.

Assuntos

Benchmarking , Peptídeos , Peptídeos/metabolismo , Complexo Principal de Histocompatibilidade , Antígenos de Histocompatibilidade , Software

16.

Entropy and Variability: A Second Opinion by Deep Learning.

Rademaker, Daniel T; Xue, Li C; 't Hoen, Peter A C; Vriend, Gert.

Biomolecules ; 12(12)2022 11 23.

Artigo em Inglês | MEDLINE | ID: mdl-36551168

RESUMO

BACKGROUND: Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures of the distribution of the twenty amino acid types found at evolutionary equivalent positions: the columns in multiple sequence alignments. Commonly used measures are variability, average hydrophobicity, or Shannon entropy. One of these techniques, called entropy-variability analysis, as the name already suggests, reduces the distribution of observed residue types in one column to two numbers: the Shannon entropy and the variability as defined by the number of residue types observed. RESULTS: We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments.

Assuntos

Aprendizado Profundo , Humanos , Sequência de Aminoácidos , Proteínas/química , Aminoácidos , Encaminhamento e Consulta , Algoritmos

17.

PANDORA: A Fast, Anchor-Restrained Modelling Protocol for Peptide: MHC Complexes.

Marzella, Dario F; Parizi, Farzaneh M; van Tilborg, Derek; Renaud, Nicolas; Sybrandi, Daan; Buzatu, Rafaella; Rademaker, Daniel T; 't Hoen, Peter A C; Xue, Li C.

Front Immunol ; 13: 878762, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35619705

RESUMO

Deeper understanding of T-cell-mediated adaptive immune responses is important for the design of cancer immunotherapies and antiviral vaccines against pandemic outbreaks. T-cells are activated when they recognize foreign peptides that are presented on the cell surface by Major Histocompatibility Complexes (MHC), forming peptide:MHC (pMHC) complexes. 3D structures of pMHC complexes provide fundamental insight into T-cell recognition mechanism and aids immunotherapy design. High MHC and peptide diversities necessitate efficient computational modelling to enable whole proteome structural analysis. We developed PANDORA, a generic modelling pipeline for pMHC class I and II (pMHC-I and pMHC-II), and present its performance on pMHC-I here. Given a query, PANDORA searches for structural templates in its extensive database and then applies anchor restraints to the modelling process. This restrained energy minimization ensures one of the fastest pMHC modelling pipelines so far. On a set of 835 pMHC-I complexes over 78 MHC types, PANDORA generated models with a median RMSD of 0.70 Å and achieved a 93% success rate in top 10 models. PANDORA performs competitively with three pMHC-I modelling state-of-the-art approaches and outperforms AlphaFold2 in terms of accuracy while being superior to it in speed. PANDORA is a modularized and user-configurable python package with easy installation. We envision PANDORA to fuel deep learning algorithms with large-scale high-quality 3D models to tackle long-standing immunology challenges.

Assuntos

Antígenos de Histocompatibilidade , Complexo Principal de Histocompatibilidade , Antígenos de Histocompatibilidade/química , Modelos Moleculares , Peptídeos , Receptores de Antígenos de Linfócitos T

18.

HomPPI: a class of sequence homology based protein-protein interface prediction methods.

Xue, Li C; Dobbs, Drena; Honavar, Vasant.

BMC Bioinformatics ; 12: 244, 2011 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-21682895

RESUMO

BACKGROUND: Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. RESULTS: We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence.Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein.Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. CONCLUSIONS: Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners.

Assuntos

Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Sequência de Aminoácidos , Humanos , Proteínas/química , Homologia de Sequência

19.

DeepRank: a deep learning framework for data mining 3D protein-protein interfaces.

Renaud, Nicolas; Geng, Cunliang; Georgievska, Sonja; Ambrosetti, Francesco; Ridder, Lars; Marzella, Dario F; Réau, Manon F; Bonvin, Alexandre M J J; Xue, Li C.

Nat Commun ; 12(1): 7068, 2021 12 03.

Artigo em Inglês | MEDLINE | ID: mdl-34862392

RESUMO

Three-dimensional (3D) structures of protein complexes provide fundamental information to decipher biological processes at the molecular scale. The vast amount of experimentally and computationally resolved protein-protein interfaces (PPIs) offers the possibility of training deep learning models to aid the predictions of their biological relevance. We present here DeepRank, a general, configurable deep learning framework for data mining PPIs using 3D convolutional neural networks (CNNs). DeepRank maps features of PPIs onto 3D grids and trains a user-specified CNN on these 3D grids. DeepRank allows for efficient training of 3D CNNs with data sets containing millions of PPIs and supports both classification and regression. We demonstrate the performance of DeepRank on two distinct challenges: The classification of biological versus crystallographic PPIs, and the ranking of docking models. For both problems DeepRank is competitive with, or outperforms, state-of-the-art methods, demonstrating the versatility of the framework for research in structural biology.

Assuntos

Mineração de Dados/métodos , Aprendizado Profundo , Mapeamento de Interação de Proteínas/métodos , Cristalografia , Conjuntos de Dados como Assunto , Simulação de Acoplamento Molecular , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas

20.

iScore: An MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and support vector machines.

Renaud, Nicolas; Jung, Yong; Honavar, Vasant; Geng, Cunliang; Bonvin, Alexandre M J J; Xue, Li C.

SoftwareX ; 112020.

Artigo em Inglês | MEDLINE | ID: mdl-35419466

RESUMO

Computational docking is a promising tool to model three-dimensional (3D) structures of protein-protein complexes, which provides fundamental insights of protein functions in the cellular life. Singling out near-native models from the huge pool of generated docking models (referred to as the scoring problem) remains as a major challenge in computational docking. We recently published iScore, a novel graph kernel based scoring function. iScore ranks docking models based on their interface graph similarities to the training interface graph set. iScore uses a support vector machine approach with random-walk graph kernels to classify and rank protein-protein interfaces. Here, we present the software for iScore. The software provides executable scripts that fully automate the computational workflow. In addition, the creation and analysis of the interface graph can be distributed across different processes using Message Passing interface (MPI) and can be offloaded to GPUs thanks to dedicated CUDA kernels.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa