Search | VHL Search Portal

1.

MultifacetedProtDB: a database of human proteins with multiple functions.

Bertolini, Elisa; Babbi, Giulia; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita.

Nucleic Acids Res ; 52(D1): D494-D501, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37791887

ABSTRACT

MultifacetedProtDB is a database of multifunctional human proteins deriving information from other databases, including UniProt, GeneCards, Human Protein Atlas (HPA), Human Phenotype Ontology (HPO) and MONDO. It collects under the label 'multifaceted' multitasking proteins addressed in literature as pleiotropic, multidomain, promiscuous (in relation to enzymes catalysing multiple substrates) and moonlighting (with two or more molecular functions), and difficult to be retrieved with a direct search in existing non-specific databases. The study of multifunctional proteins is an expanding research area aiming to elucidate the complexities of biological processes, particularly in humans, where multifunctional proteins play roles in various processes, including signal transduction, metabolism, gene regulation and cellular communication, and are often involved in disease insurgence and progression. The webserver allows searching by gene, protein and any associated structural and functional information, like available structures from PDB, structural models and interactors, using multiple filters. Protein entries are supplemented with comprehensive annotations including EC number, GO terms (biological pathways, molecular functions, and cellular components), pathways from Reactome, subcellular localization from UniProt, tissue and cell type expression from HPA, and associated diseases following MONDO, Orphanet and OMIM classification. MultiFacetedProtDB is freely available as a web server at: https://multifacetedprotdb.biocomp.unibo.it/.

Subject(s)

Databases, Protein , Proteins , Humans , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Databases as Topic

2.

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.

Stenton, Sarah L; O'Leary, Melanie C; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael W; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

Hum Genomics ; 18(1): 44, 2024 Apr 29.

Article in English | MEDLINE | ID: mdl-38685113

ABSTRACT

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

Subject(s)

Rare Diseases , Humans , Rare Diseases/genetics , Rare Diseases/diagnosis , Genome, Human/genetics , Genetic Variation/genetics , Computational Biology/methods , Phenotype

3.

Assessing predictions on fitness effects of missense variants in HMBS in CAGI6.

Zhang, Jing; Kinch, Lisa; Katsonis, Panagiotis; Lichtarge, Olivier; Jagota, Milind; Song, Yun S; Sun, Yuanfei; Shen, Yang; Kuru, Nurdan; Dereli, Onur; Adebali, Ogun; Alladin, Muttaqi Ahmad; Pal, Debnath; Capriotti, Emidio; Turina, Maria Paola; Savojardo, Castrense; Martelli, Pier Luigi; Babbi, Giulia; Casadio, Rita; Pucci, Fabrizio; Rooman, Marianne; Cia, Gabriel; Tsishyn, Matsvei; Strokach, Alexey; Hu, Zhiqiang; van Loggerenberg, Warren; Roth, Frederick P; Radivojac, Predrag; Brenner, Steven E; Cong, Qian; Grishin, Nick V.

Hum Genet ; 2024 Aug 07.

Article in English | MEDLINE | ID: mdl-39110250

ABSTRACT

This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.

4.

Huntingtin: A Protein with a Peculiar Solvent Accessible Surface.

Babbi, Giulia; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita.

Int J Mol Sci ; 22(6)2021 Mar 12.

Article in English | MEDLINE | ID: mdl-33809039

ABSTRACT

Taking advantage of the last cryogenic electron microscopy structure of human huntingtin, we explored with computational methods its physicochemical properties, focusing on the solvent accessible surface of the protein and highlighting a quite interesting mix of hydrophobic and hydrophilic patterns, with the prevalence of the latter ones. We then evaluated the probability of exposed residues to be in contact with other proteins, discovering that they tend to cluster in specific regions of the protein. We then found that the remaining portions of the protein surface can contain calcium-binding sites that we propose here as putative mediators for the protein to interact with membranes. Our findings are justified in relation to the present knowledge of huntingtin functional annotation.

Subject(s)

Calcium/metabolism , Computational Biology , Huntingtin Protein/chemistry , Proteins/genetics , Binding Sites/genetics , Humans , Huntingtin Protein/genetics , Huntingtin Protein/ultrastructure , Hydrophobic and Hydrophilic Interactions , Models, Molecular , Protein Binding/genetics , Solvents/chemistry , Surface Properties

5.

A Glance into MTHFR Deficiency at a Molecular Level.

Savojardo, Castrense; Babbi, Giulia; Baldazzi, Davide; Martelli, Pier Luigi; Casadio, Rita.

Int J Mol Sci ; 23(1)2021 Dec 23.

Article in English | MEDLINE | ID: mdl-35008593

ABSTRACT

MTHFR deficiency still deserves an investigation to associate the phenotype to protein structure variations. To this aim, considering the MTHFR wild type protein structure, with a catalytic and a regulatory domain and taking advantage of state-of-the-art computational tools, we explore the properties of 72 missense variations known to be disease associated. By computing the thermodynamic ΔΔG change according to a consensus method that we recently introduced, we find that 61% of the disease-related variations destabilize the protein, are present both in the catalytic and regulatory domain and correspond to known biochemical deficiencies. The propensity of solvent accessible residues to be involved in protein-protein interaction sites indicates that most of the interacting residues are located in the regulatory domain, and that only three of them, located at the interface of the functional protein homodimer, are both disease-related and destabilizing. Finally, we compute the protein architecture with Hidden Markov Models, one from Pfam for the catalytic domain and the second computed in house for the regulatory domain. We show that patterns of disease-associated, physicochemical variation types, both in the catalytic and regulatory domains, are unique for the MTHFR deficiency when mapped into the protein architecture.

Subject(s)

Homocystinuria/genetics , Methylenetetrahydrofolate Reductase (NADPH2)/deficiency , Muscle Spasticity/genetics , Catalytic Domain/genetics , Humans , Methylenetetrahydrofolate Reductase (NADPH2)/genetics , Protein Interaction Maps/genetics , Psychotic Disorders/genetics

6.

Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges.

Savojardo, Castrense; Babbi, Giulia; Bovo, Samuele; Capriotti, Emidio; Martelli, Pier Luigi; Casadio, Rita.

Hum Mutat ; 40(9): 1455-1462, 2019 09.

Article in English | MEDLINE | ID: mdl-31066146

ABSTRACT

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.

Subject(s)

Computational Biology/methods , Genetic Variation , Proteins/chemistry , Proteins/genetics , Algorithms , Computer Simulation , Databases, Genetic , Genetic Predisposition to Disease , Humans , Machine Learning , Phenotype , Protein Stability

7.

Assessment of methods for predicting the effects of PTEN and TPMT protein variants.

Pejaver, Vikas; Babbi, Giulia; Casadio, Rita; Folkman, Lukas; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Miller, Maximilian; Moult, John; Pal, Lipika R; Savojardo, Castrense; Yin, Yizhou; Zhou, Yaoqi; Radivojac, Predrag; Bromberg, Yana.

Hum Mutat ; 40(9): 1495-1506, 2019 09.

Article in English | MEDLINE | ID: mdl-31184403

ABSTRACT

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.

Subject(s)

Computational Biology/methods , Methyltransferases/chemistry , Mutation , PTEN Phosphohydrolase/chemistry , High-Throughput Nucleotide Sequencing , Humans , Methyltransferases/genetics , PTEN Phosphohydrolase/genetics , Protein Stability

8.

Assessing predictions on fitness effects of missense variants in calmodulin.

Zhang, Jing; Kinch, Lisa N; Cong, Qian; Katsonis, Panagiotis; Lichtarge, Olivier; Savojardo, Castrense; Babbi, Giulia; Martelli, Pier Luigi; Capriotti, Emidio; Casadio, Rita; Garg, Aditi; Pal, Debnath; Weile, Jochen; Sun, Song; Verby, Marta; Roth, Frederick P; Grishin, Nick V.

Hum Mutat ; 40(9): 1463-1473, 2019 09.

Article in English | MEDLINE | ID: mdl-31283071

ABSTRACT

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.

Subject(s)

Calmodulin/chemistry , Calmodulin/genetics , Computational Biology/methods , Mutation, Missense , Yeasts/growth & development , Algorithms , Binding Sites , Calcium/metabolism , Calmodulin/metabolism , Evolution, Molecular , Fungal Proteins/chemistry , Fungal Proteins/genetics , Fungal Proteins/metabolism , Genetic Fitness , Humans , Models, Genetic , Models, Molecular , Protein Conformation , Protein Engineering , Yeasts/genetics

9.

Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge.

Savojardo, Castrense; Petrosino, Maria; Babbi, Giulia; Bovo, Samuele; Corbi-Verge, Carles; Casadio, Rita; Fariselli, Piero; Folkman, Lukas; Garg, Aditi; Karimi, Mostafa; Katsonis, Panagiotis; Kim, Philip M; Lichtarge, Olivier; Martelli, Pier Luigi; Pasquo, Alessandra; Pal, Debnath; Shen, Yang; Strokach, Alexey V; Turina, Paola; Zhou, Yaoqi; Andreoletti, Gaia; Brenner, Steven E; Chiaraluce, Roberta; Consalvi, Valerio; Capriotti, Emidio.

Hum Mutat ; 40(9): 1392-1399, 2019 09.

Article in English | MEDLINE | ID: mdl-31209948

ABSTRACT

Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.

Subject(s)

Amino Acid Substitution , Iron-Binding Proteins/chemistry , Iron-Binding Proteins/genetics , Algorithms , Circular Dichroism , Humans , Models, Molecular , Protein Conformation , Protein Folding , Protein Stability , Frataxin

10.

Performance of computational methods for the evaluation of pericentriolar material 1 missense variants in CAGI-5.

Monzon, Alexander Miguel; Carraro, Marco; Chiricosta, Luigi; Reggiani, Francesco; Han, James; Ozturk, Kivilcim; Wang, Yanran; Miller, Maximilian; Bromberg, Yana; Capriotti, Emidio; Savojardo, Castrense; Babbi, Giulia; Martelli, Pier L; Casadio, Rita; Katsonis, Panagiotis; Lichtarge, Olivier; Carter, Hannah; Kousi, Maria; Katsanis, Nicholas; Andreoletti, Gaia; Moult, John; Brenner, Steven E; Ferrari, Carlo; Leonardi, Emanuela; Tosatto, Silvio C E.

Hum Mutat ; 40(9): 1474-1485, 2019 09.

Article in English | MEDLINE | ID: mdl-31260570

ABSTRACT

The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.

Subject(s)

Autoantigens/genetics , Cell Cycle Proteins/genetics , Computational Biology/methods , Mutation, Missense , Schizophrenia/genetics , Databases, Genetic , Genetic Predisposition to Disease , Humans , Neural Networks, Computer , Phenotype , Polymorphism, Single Nucleotide

11.

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer.

Voskanian, Alin; Katsonis, Panagiotis; Lichtarge, Olivier; Pejaver, Vikas; Radivojac, Predrag; Mooney, Sean D; Capriotti, Emidio; Bromberg, Yana; Wang, Yanran; Miller, Max; Martelli, Pier Luigi; Savojardo, Castrense; Babbi, Giulia; Casadio, Rita; Cao, Yue; Sun, Yuanfei; Shen, Yang; Garg, Aditi; Pal, Debnath; Yu, Yao; Huff, Chad D; Tavtigian, Sean V; Young, Erin; Neuhausen, Susan L; Ziv, Elad; Pal, Lipika R; Andreoletti, Gaia; Brenner, Steven E; Kann, Maricel G.

Hum Mutat ; 40(9): 1612-1622, 2019 09.

Article in English | MEDLINE | ID: mdl-31241222

ABSTRACT

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.

Subject(s)

Breast Neoplasms/genetics , Checkpoint Kinase 2/genetics , Computational Biology/methods , Hispanic or Latino/genetics , Polymorphism, Single Nucleotide , Adult , Aged , Breast Neoplasms/ethnology , Case-Control Studies , Computer Simulation , Female , Genetic Predisposition to Disease , Humans , Linear Models , Middle Aged , United States/ethnology , Exome Sequencing

12.

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants.

Cline, Melissa S; Babbi, Giulia; Bonache, Sandra; Cao, Yue; Casadio, Rita; de la Cruz, Xavier; Díez, Orland; Gutiérrez-Enríquez, Sara; Katsonis, Panagiotis; Lai, Carmen; Lichtarge, Olivier; Martelli, Pier L; Mishne, Gilad; Moles-Fernández, Alejandro; Montalban, Gemma; Mooney, Sean D; O'Conner, Robert; Ootes, Lars; Özkan, Selen; Padilla, Natalia; Pagel, Kymberleigh A; Pejaver, Vikas; Radivojac, Predrag; Riera, Casandra; Savojardo, Castrense; Shen, Yang; Sun, Yuanfei; Topper, Scott; Parsons, Michael T; Spurdle, Amanda B; Goldgar, David E.

Hum Mutat ; 40(9): 1546-1556, 2019 09.

Article in English | MEDLINE | ID: mdl-31294896

ABSTRACT

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.

Subject(s)

BRCA1 Protein/genetics , BRCA2 Protein/genetics , Breast Neoplasms/diagnosis , Computational Biology/methods , Ovarian Neoplasms/diagnosis , Breast Neoplasms/genetics , Early Detection of Cancer , Female , Genetic Predisposition to Disease , Genetic Testing , Genetic Variation , Humans , Models, Genetic , Ovarian Neoplasms/genetics

13.

Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016.

Clark, Wyatt T; Kasak, Laura; Bakolitsa, Constantina; Hu, Zhiqiang; Andreoletti, Gaia; Babbi, Giulia; Bromberg, Yana; Casadio, Rita; Dunbrack, Roland; Folkman, Lukas; Ford, Colby T; Jones, David; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier L; Mooney, Sean D; Nodzak, Conor; Pal, Lipika R; Radivojac, Predrag; Savojardo, Castrense; Shi, Xinghua; Zhou, Yaoqi; Uppal, Aneeta; Xu, Qifang; Yin, Yizhou; Pejaver, Vikas; Wang, Meng; Wei, Liping; Moult, John; Yu, Guoying Karen; Brenner, Steven E; LeBowitz, Jonathan H.

Hum Mutat ; 40(9): 1519-1529, 2019 09.

Article in English | MEDLINE | ID: mdl-31342580

ABSTRACT

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.

Subject(s)

Acetylglucosaminidase/metabolism , Computational Biology/methods , Mutation, Missense , Acetylglucosaminidase/genetics , Humans , Models, Genetic , Regression Analysis

14.

CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases.

Kasak, Laura; Hunter, Jesse M; Udani, Rupa; Bakolitsa, Constantina; Hu, Zhiqiang; Adhikari, Aashish N; Babbi, Giulia; Casadio, Rita; Gough, Julian; Guerrero, Rafael F; Jiang, Yuxiang; Joseph, Thomas; Katsonis, Panagiotis; Kotte, Sujatha; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Mooney, Sean D; Moult, John; Pal, Lipika R; Poitras, Jennifer; Radivojac, Predrag; Rao, Aditya; Sivadasan, Naveen; Sunderam, Uma; Saipradeep, V G; Yin, Yizhou; Zaucha, Jan; Brenner, Steven E; Meyn, M Stephen.

Hum Mutat ; 40(9): 1373-1391, 2019 09.

Article in English | MEDLINE | ID: mdl-31322791

ABSTRACT

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.

Subject(s)

Computational Biology/methods , Genetic Variation , Undiagnosed Diseases/diagnosis , Adolescent , Child , Child, Preschool , Computer Simulation , Databases, Genetic , Female , Genetic Predisposition to Disease , Humans , Male , Phenotype , Undiagnosed Diseases/genetics , Whole Genome Sequencing

15.

PhenPath: a tool for characterizing biological functions underlying different phenotypes.

Babbi, Giulia; Martelli, Pier Luigi; Casadio, Rita.

BMC Genomics ; 20(Suppl 8): 548, 2019 Jul 16.

Article in English | MEDLINE | ID: mdl-31307376

ABSTRACT

BACKGROUND: Many diseases are associated with complex patterns of symptoms and phenotypic manifestations. Parsimonious explanations aim at reconciling the multiplicity of phenotypic traits with the perturbation of one or few biological functions. For this, it is necessary to characterize human phenotypes at the molecular and functional levels, by exploiting gene annotations and known relations among genes, diseases and phenotypes. This characterization makes it possible to implement tools for retrieving functions shared among phenotypes, co-occurring in the same patient and facilitating the formulation of hypotheses about the molecular causes of the disease. RESULTS: We introduce PhenPath, a new resource consisting of two parts: PhenPathDB and PhenPathTOOL. The former is a database collecting the human genes associated with the phenotypes described in Human Phenotype Ontology (HPO) and OMIM Clinical Synopses. Phenotypes are then associated with biological functions and pathways by means of NET-GE, a network-based method for functional enrichment of sets of genes. The present version considers only phenotypes related to diseases. PhenPathDB collects information for 18 OMIM Clinical synopses and 7137 HPO phenotypes, related to 4292 diseases and 3446 genes. Enrichment of Gene Ontology annotations endows some 87.7, 86.9 and 73.6% of HPO phenotypes with Biological Process, Molecular Function and Cellular Component terms, respectively. Furthermore, 58.8 and 77.8% of HPO phenotypes are also enriched for KEGG and Reactome pathways, respectively. Based on PhenPathDB, PhenPathTOOL analyzes user-defined sets of phenotypes retrieving diseases, genes and functional terms which they share. This information can provide clues for interpreting the co-occurrence of phenotypes in a patient. CONCLUSIONS: The resource allows finding molecular features useful to investigate diseases characterized by multiple phenotypes, and by this, it can help researchers and physicians in identifying molecular mechanisms and biological functions underlying the concomitant manifestation of phenotypes. The resource is freely available at http://phenpath.biocomp.unibo.it .

Subject(s)

Biological Ontologies , Computational Biology/methods , Databases, Genetic , Phenotype , Disease/genetics , Humans

16.

Functional and Structural Features of Disease-Related Protein Variants.

Savojardo, Castrense; Babbi, Giulia; Martelli, Pier Luigi; Casadio, Rita.

Int J Mol Sci ; 20(7)2019 Mar 27.

Article in English | MEDLINE | ID: mdl-30934684

ABSTRACT

Modern sequencing technologies provide an unprecedented amount of data of single-nucleotide variations occurring in coding regions and leading to changes in the expressed protein sequences. A significant fraction of these single-residue variations is linked to disease onset and collected in public databases. In recent years, many scientific studies have been focusing on the dissection of salient features of disease-related variations from different perspectives. In this work, we complement previous analyses by updating a dataset of disease-related variations occurring in proteins with 3D structure. Within this dataset, we describe functional and structural features that can be of interest for characterizing disease-related variations, including major chemico-physical properties, the strength of association to disease of variation types, their effect on protein stability, their location on the protein structure, and their distribution in Pfam structural/functional protein models. Our results support previous findings obtained in different data sets and introduce Pfam models as possible fingerprints of patterns of disease related single-nucleotide variations.

Subject(s)

Disease/genetics , Mutant Proteins/chemistry , Mutant Proteins/metabolism , Mutation/genetics , Databases, Protein , Humans , Protein Domains , Solvents

17.

Mutant MYO1F alters the mitochondrial network and induces tumor proliferation in thyroid cancer.

Diquigiovanni, Chiara; Bergamini, Christian; Evangelisti, Cecilia; Isidori, Federica; Vettori, Andrea; Tiso, Natascia; Argenton, Francesco; Costanzini, Anna; Iommarini, Luisa; Anbunathan, Hima; Pagotto, Uberto; Repaci, Andrea; Babbi, Giulia; Casadio, Rita; Lenaz, Giorgio; Rhoden, Kerry J; Porcelli, Anna Maria; Fato, Romana; Bowcock, Anne; Seri, Marco; Romeo, Giovanni; Bonora, Elena.

Int J Cancer ; 143(7): 1706-1719, 2018 10 01.

Article in English | MEDLINE | ID: mdl-29672841

ABSTRACT

Familial aggregation is a significant risk factor for the development of thyroid cancer and familial non-medullary thyroid cancer (FNMTC) accounts for 5-7% of all NMTC. Whole exome sequencing analysis in the family affected by FNMTC with oncocytic features where our group previously identified a predisposing locus on chromosome 19p13.2, revealed a novel heterozygous mutation (c.400G > A, NM_012335; p.Gly134Ser) in exon 5 of MYO1F, mapping to the linkage locus. In the thyroid FRTL-5 cell model stably expressing the mutant MYO1F p.Gly134Ser protein, we observed an altered mitochondrial network, with increased mitochondrial mass and a significant increase in both intracellular and extracellular reactive oxygen species, compared to cells expressing the wild-type (wt) protein or carrying the empty vector. The mutation conferred a significant advantage in colony formation, invasion and anchorage-independent growth. These data were corroborated by in vivo studies in zebrafish, since we demonstrated that the mutant MYO1F p.Gly134Ser, when overexpressed, can induce proliferation in whole vertebrate embryos, compared to the wt one. MYO1F screening in additional 192 FNMTC families identified another variant in exon 7, which leads to exon skipping, and is predicted to alter the ATP-binding domain in MYO1F. Our study identified for the first time a role for MYO1F in NMTC.

Subject(s)

Cell Proliferation , Embryo, Nonmammalian/pathology , Mitochondria/pathology , Mutation , Myosin Type I/genetics , Thyroid Cancer, Papillary/pathology , Thyroid Neoplasms/pathology , Adolescent , Adult , Aged , Aged, 80 and over , Animals , Apoptosis , Cells, Cultured , Child , Chromosomes, Human, Pair 19 , Embryo, Nonmammalian/metabolism , Female , Genetic Predisposition to Disease , Genotype , Humans , Male , Middle Aged , Mitochondria/genetics , Mitochondria/metabolism , Myosin Type I/chemistry , Myosin Type I/metabolism , Oxygen Consumption , Pedigree , Protein Conformation , Thyroid Cancer, Papillary/genetics , Thyroid Cancer, Papillary/metabolism , Thyroid Neoplasms/genetics , Thyroid Neoplasms/metabolism , Young Adult , Zebrafish

18.

Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4.

Xu, Qifang; Tang, Qingling; Katsonis, Panagiotis; Lichtarge, Olivier; Jones, David; Bovo, Samuele; Babbi, Giulia; Martelli, Pier L; Casadio, Rita; Lee, Gyu Rie; Seok, Chaok; Fenton, Aron W; Dunbrack, Roland L.

Hum Mutat ; 38(9): 1123-1131, 2017 09.

Article in English | MEDLINE | ID: mdl-28370845

ABSTRACT

The Critical Assessment of Genome Interpretation (CAGI) is a global community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation. One of the 2015-2016 competitions focused on predicting the influence of mutations on the allosteric regulation of human liver pyruvate kinase. More than 30 different researchers accessed the challenge data. However, only four groups accepted the challenge. Features used for predictions ranged from evolutionary constraints, mutant site locations relative to active and effector binding sites, and computational docking outputs. Despite the range of expertise and strategies used by predictors, the best predictions were marginally greater than random for modified allostery resulting from mutations. In contrast, several groups successfully predicted which mutations severely reduced enzymatic activity. Nonetheless, poor predictions of allostery stands in stark contrast to the impression left by more than 700 PubMed entries identified using the identifiers "computational + allosteric." This contrast highlights a specialized need for new computational tools and utilization of benchmarks that focus on allosteric regulation.

Subject(s)

Benchmarking/methods , Pyruvate Kinase/chemistry , Pyruvate Kinase/genetics , Allosteric Regulation , Allosteric Site , Computational Biology/methods , Databases, Genetic , Fructosediphosphates/metabolism , Humans , Models, Molecular , Mutation , Pyruvate Kinase/metabolism

19.

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.

Daneshjou, Roxana; Wang, Yanran; Bromberg, Yana; Bovo, Samuele; Martelli, Pier L; Babbi, Giulia; Lena, Pietro Di; Casadio, Rita; Edwards, Matthew; Gifford, David; Jones, David T; Sundaram, Laksshman; Bhat, Rajendra Rana; Li, Xiaolin; Pal, Lipika R; Kundu, Kunal; Yin, Yizhou; Moult, John; Jiang, Yuxiang; Pejaver, Vikas; Pagel, Kymberleigh A; Li, Biao; Mooney, Sean D; Radivojac, Predrag; Shah, Sohela; Carraro, Marco; Gasparini, Alessandra; Leonardi, Emanuela; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E; Bachar, Eran; Azaria, Johnathan R; Ofran, Yanay; Unger, Ron; Niroula, Abhishek; Vihinen, Mauno; Chang, Billy; Wang, Maggie H; Franke, Andre; Petersen, Britt-Sabina; Pirooznia, Mehdi; Zandi, Peter; McCombie, Richard; Potash, James B; Altman, Russ B; Klein, Teri E; Hoskins, Roger A; Repo, Susanna; Brenner, Steven E.

Hum Mutat ; 38(9): 1182-1192, 2017 09.

Article in English | MEDLINE | ID: mdl-28634997

ABSTRACT

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.

Subject(s)

Bipolar Disorder/genetics , Crohn Disease/genetics , Exome Sequencing/methods , Precision Medicine/methods , Warfarin/therapeutic use , Computational Biology/methods , Databases, Genetic , Genetic Predisposition to Disease , Humans , Information Dissemination , Pharmacogenomic Variants , Phenotype , Warfarin/pharmacology

20.

eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes.

Babbi, Giulia; Martelli, Pier Luigi; Profiti, Giuseppe; Bovo, Samuele; Savojardo, Castrense; Casadio, Rita.

BMC Genomics ; 18(Suppl 5): 554, 2017 08 11.

Article in English | MEDLINE | ID: mdl-28812536

ABSTRACT

BACKGROUND: Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis. RESULTS: We present eDGAR, a database collecting and organizing the data on gene/disease associations as derived from OMIM, Humsavar and ClinVar. For each disease-associated gene, eDGAR collects information on its annotation. Specifically, for lists of genes, eDGAR provides information on: i) interactions retrieved from PDB, BIOGRID and STRING; ii) co-occurrence in stable and functional structural complexes; iii) shared Gene Ontology annotations; iv) shared KEGG and REACTOME pathways; v) enriched functional annotations computed with NET-GE; vi) regulatory interactions derived from TRRUST; vii) localization on chromosomes and/or co-localisation in neighboring loci. The present release of eDGAR includes 2672 diseases, related to 3658 different genes, for a total number of 5729 gene-disease associations. 71% of the genes are linked to 621 multigenic diseases and eDGAR highlights their common GO terms, KEGG/REACTOME pathways, physical and regulatory interactions. eDGAR includes a network based enrichment method for detecting statistically significant functional terms associated to groups of genes. CONCLUSIONS: eDGAR offers a resource to analyze disease-gene associations. In multigenic diseases genes can share physical interactions and/or co-occurrence in the same functional processes. eDGAR is freely available at: edgar.biocomp.unibo.it.

Subject(s)

Databases, Genetic , Genetic Diseases, Inborn/genetics , Genomics/methods , Protein Interaction Maps , Genetic Diseases, Inborn/metabolism , Humans , Metabolic Networks and Pathways , Molecular Sequence Annotation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL