Pesquisa | BVS Educação Profissional em Saúde

Quantification of biases in predictions of protein-protein binding affinity changes upon mutations.

Tsishyn, Matsvei; Pucci, Fabrizio; Rooman, Marianne.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38197311

RESUMO

Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.

Assuntos

Glicoproteína da Espícula de Coronavírus , Humanos , Ligação Proteica , Mutação , Viés

pycofitness-Evaluating the fitness landscape of RNA and protein sequences.

Pucci, Fabrizio; Zerihun, Mehari B; Rooman, Marianne; Schug, Alexander.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38335928

RESUMO

MOTIVATION: The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS: We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION: https://github.com/KIT-MBS/pycofitness.

Assuntos

RNA , Software , RNA/genética , Sequência de Aminoácidos , Biologia Computacional , Proteínas

FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction.

Tsishyn, Matsvei; Cia, Gabriel; Hermans, Pauline; Kwasigroch, Jean; Rooman, Marianne; Pucci, Fabrizio.

Hum Genomics ; 18(1): 36, 2024 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-38627807

RESUMO

Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.

Assuntos

Proteínas , Humanos , Mutação

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.

Zhang, Jing; Kinch, Lisa; Katsonis, Panagiotis; Lichtarge, Olivier; Jagota, Milind; Song, Yun S; Sun, Yuanfei; Shen, Yang; Kuru, Nurdan; Dereli, Onur; Adebali, Ogun; Alladin, Muttaqi Ahmad; Pal, Debnath; Capriotti, Emidio; Turina, Maria Paola; Savojardo, Castrense; Martelli, Pier Luigi; Babbi, Giulia; Casadio, Rita; Pucci, Fabrizio; Rooman, Marianne; Cia, Gabriel; Tsishyn, Matsvei; Strokach, Alexey; Hu, Zhiqiang; van Loggerenberg, Warren; Roth, Frederick P; Radivojac, Predrag; Brenner, Steven E; Cong, Qian; Grishin, Nick V.

Hum Genet ; 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-39110250

RESUMO

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

Prediction of Paratope-Epitope Pairs Using Convolutional Neural Networks.

Li, Dong; Pucci, Fabrizio; Rooman, Marianne.

Int J Mol Sci ; 25(10)2024 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-38791470

RESUMO

Antibodies play a central role in the adaptive immune response of vertebrates through the specific recognition of exogenous or endogenous antigens. The rational design of antibodies has a wide range of biotechnological and medical applications, such as in disease diagnosis and treatment. However, there are currently no reliable methods for predicting the antibodies that recognize a specific antigen region (or epitope) and, conversely, epitopes that recognize the binding region of a given antibody (or paratope). To fill this gap, we developed ImaPEp, a machine learning-based tool for predicting the binding probability of paratope-epitope pairs, where the epitope and paratope patches were simplified into interacting two-dimensional patches, which were colored according to the values of selected features, and pixelated. The specific recognition of an epitope image by a paratope image was achieved by using a convolutional neural network-based model, which was trained on a set of two-dimensional paratope-epitope images derived from experimental structures of antibody-antigen complexes. Our method achieves good performances in terms of cross-validation with a balanced accuracy of 0.8. Finally, we showcase examples of application of ImaPep, including extensive screening of large libraries to identify paratope candidates that bind to a selected epitope, and rescoring and refining antibody-antigen docking poses.

Assuntos

Epitopos , Redes Neurais de Computação , Epitopos/imunologia , Epitopos/química , Aprendizado de Máquina , Complexo Antígeno-Anticorpo/química , Complexo Antígeno-Anticorpo/imunologia , Humanos , Simulação de Acoplamento Molecular , Anticorpos/imunologia , Anticorpos/química , Antígenos/imunologia , Sítios de Ligação de Anticorpos

MHCII-peptide presentation: an assessment of the state-of-the-art prediction methods.

Yang, Yaqing; Wei, Zhonghui; Cia, Gabriel; Song, Xixi; Pucci, Fabrizio; Rooman, Marianne; Xue, Fuzhong; Hou, Qingzhen.

Front Immunol ; 15: 1293706, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38646540

RESUMO

Major histocompatibility complex Class II (MHCII) proteins initiate and regulate immune responses by presentation of antigenic peptides to CD4+ T-cells and self-restriction. The interactions between MHCII and peptides determine the specificity of the immune response and are crucial in immunotherapy and cancer vaccine design. With the ever-increasing amount of MHCII-peptide binding data available, many computational approaches have been developed for MHCII-peptide interaction prediction over the last decade. There is thus an urgent need to provide an up-to-date overview and assessment of these newly developed computational methods. To benchmark the prediction performance of these methods, we constructed an independent dataset containing binding and non-binding peptides to 20 human MHCII protein allotypes from the Immune Epitope Database, covering DP, DR and DQ alleles. After collecting 11 known predictors up to January 2022, we evaluated those available through a webserver or standalone packages on this independent dataset. The benchmarking results show that MixMHC2pred and NetMHCIIpan-4.1 achieve the best performance among all predictors. In general, newly developed methods perform better than older ones due to the rapid expansion of data on which they are trained and the development of deep learning algorithms. Our manuscript not only draws a full picture of the state-of-art of MHCII-peptide binding prediction, but also guides researchers in the choice among the different predictors. More importantly, it will inspire biomedical researchers in both academia and industry for the future developments in this field.

Assuntos

Apresentação de Antígeno , Biologia Computacional , Antígenos de Histocompatibilidade Classe II , Peptídeos , Humanos , Antígenos de Histocompatibilidade Classe II/imunologia , Antígenos de Histocompatibilidade Classe II/metabolismo , Peptídeos/imunologia , Biologia Computacional/métodos , Ligação Proteica , Aprendizado Profundo , Algoritmos

Genomic basis of environmental adaptation in the widespread poly-extremophilic Exiguobacterium group.

Shen, Liang; Liu, Yongqin; Chen, Liangzhong; Lei, Tingting; Ren, Ping; Ji, Mukan; Song, Weizhi; Lin, Hao; Su, Wei; Wang, Sheng; Rooman, Marianne; Pucci, Fabrizio.

ISME J ; 18(1)2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38365240

RESUMO

Delineating cohesive ecological units and determining the genetic basis for their environmental adaptation are among the most important objectives in microbiology. In the last decade, many studies have been devoted to characterizing the genetic diversity in microbial populations to address these issues. However, the impact of extreme environmental conditions, such as temperature and salinity, on microbial ecology and evolution remains unclear so far. In order to better understand the mechanisms of adaptation, we studied the (pan)genome of Exiguobacterium, a poly-extremophile bacterium able to grow in a wide range of environments, from permafrost to hot springs. To have the genome for all known Exiguobacterium type strains, we first sequenced those that were not yet available. Using a reverse-ecology approach, we showed how the integration of phylogenomic information, genomic features, gene and pathway enrichment data, regulatory element analyses, protein amino acid composition, and protein structure analyses of the entire Exiguobacterium pangenome allows to sharply delineate ecological units consisting of mesophilic, psychrophilic, halophilic-mesophilic, and halophilic-thermophilic ecotypes. This in-depth study clarified the genetic basis of the defined ecotypes and identified some key mechanisms driving the environmental adaptation to extreme environments. Our study points the way to organizing the vast microbial diversity into meaningful ecologically units, which, in turn, provides insight into how microbial communities adapt and respond to different environmental conditions in a changing world.

Assuntos

Exiguobacterium , Extremófilos , Genômica , Filogenia , Proteínas

Critical assessment of missense variant effect predictors on disease-relevant variant data.

Rastogi, Ruchir; Chung, Ryan; Li, Sindy; Li, Chang; Lee, Kyoungyeul; Woo, Junwoo; Kim, Dong-Wook; Keum, Changwon; Babbi, Giulia; Martelli, Pier Luigi; Savojardo, Castrense; Casadio, Rita; Chennen, Kirsley; Weber, Thomas; Poch, Olivier; Ancien, François; Cia, Gabriel; Pucci, Fabrizio; Raimondi, Daniele; Vranken, Wim; Rooman, Marianne; Marquet, Céline; Olenyi, Tobias; Rost, Burkhard; Andreoletti, Gaia; Kamandula, Akash; Peng, Yisu; Bakolitsa, Constantina; Mort, Matthew; Cooper, David N; Bergquist, Timothy; Pejaver, Vikas; Liu, Xiaoming; Radivojac, Predrag; Brenner, Steven E; Ioannidis, Nilah M.

bioRxiv ; 2024 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-38895200

RESUMO

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.

Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.

Jain, Shantanu; Trinidad, Marena; Nguyen, Thanh Binh; Jones, Kaiya; Neto, Santiago Diaz; Ge, Fang; Glagovsky, Ailin; Jones, Cameron; Moran, Giankaleb; Wang, Boqi; Rahimi, Kobra; Çalici, Sümeyra Zeynep; Cedillo, Luis R; Berardelli, Silvia; Özden, Buse; Chen, Ken; Katsonis, Panagiotis; Williams, Amanda; Lichtarge, Olivier; Rana, Sadhna; Pradhan, Swatantra; Srinivasan, Rajgopal; Sajeed, Rakshanda; Joshi, Dinesh; Faraggi, Eshel; Jernigan, Robert; Kloczkowski, Andrzej; Xu, Jierui; Song, Zigang; Özkan, Selen; Padilla, Natàlia; de la Cruz, Xavier; Acuna-Hidalgo, Rocio; Grafmüller, Andrea; Jiménez Barrón, Laura T; Manfredi, Matteo; Savojardo, Castrense; Babbi, Giulia; Martelli, Pier Luigi; Casadio, Rita; Sun, Yuanfei; Zhu, Shaowen; Shen, Yang; Pucci, Fabrizio; Rooman, Marianne; Cia, Gabriel; Raimondi, Daniele; Hermans, Pauline; Kwee, Sofia; Chen, Ella.

bioRxiv ; 2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38798479

RESUMO

Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA