Búsqueda | BVS CLAP/SMR-OPS/OMS

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations.

Tsishyn, Matsvei; Pucci, Fabrizio; Rooman, Marianne.

Brief Bioinform ; 25(1)2023 11 22.

Artículo en Inglés | MEDLINE | ID: mdl-38197311

RESUMEN

Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.

Asunto(s)

Glicoproteína de la Espiga del Coronavirus , Humanos , Unión Proteica , Mutación , Sesgo

FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction.

Tsishyn, Matsvei; Cia, Gabriel; Hermans, Pauline; Kwasigroch, Jean; Rooman, Marianne; Pucci, Fabrizio.

Hum Genomics ; 18(1): 36, 2024 Apr 16.

Artículo en Inglés | MEDLINE | ID: mdl-38627807

RESUMEN

Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.

Asunto(s)

Proteínas , Humanos , Mutación

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.

Zhang, Jing; Kinch, Lisa; Katsonis, Panagiotis; Lichtarge, Olivier; Jagota, Milind; Song, Yun S; Sun, Yuanfei; Shen, Yang; Kuru, Nurdan; Dereli, Onur; Adebali, Ogun; Alladin, Muttaqi Ahmad; Pal, Debnath; Capriotti, Emidio; Turina, Maria Paola; Savojardo, Castrense; Martelli, Pier Luigi; Babbi, Giulia; Casadio, Rita; Pucci, Fabrizio; Rooman, Marianne; Cia, Gabriel; Tsishyn, Matsvei; Strokach, Alexey; Hu, Zhiqiang; van Loggerenberg, Warren; Roth, Frederick P; Radivojac, Predrag; Brenner, Steven E; Cong, Qian; Grishin, Nick V.

Hum Genet ; 2024 Aug 07.

Artículo en Inglés | MEDLINE | ID: mdl-39110250

RESUMEN

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA