Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
1.
Hum Genomics ; 18(1): 36, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38627807

ABSTRACT

Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.


Subject(s)
Proteins , Humans , Mutation
2.
Hum Mutat ; 41(2): 347-362, 2020 02.
Article in English | MEDLINE | ID: mdl-31680375

ABSTRACT

Precise identification of causative variants from whole-genome sequencing data, including both coding and noncoding variants, is challenging. The Critical Assessment of Genome Interpretation 5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of the 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multiphenotype-matching score. Mutations in the top-scoring genes for each phenotype profile were ranked on a 6-point scale of pathogenicity probability, resulting in an approximately equal number of top-ranked coding and noncoding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome to a clinical profile for five cases. The challenge assessor found genes in three of these five cases as likely appropriate. In the postsubmission phase, after careful screening of the genes in the correct genome, we identified additional potential diagnostic variants, a high proportion of which are noncoding.


Subject(s)
Genetic Association Studies/methods , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease , Genome, Human , Genomics/methods , Rare Diseases , Algorithms , Alleles , Genetic Variation , Genome-Wide Association Study/methods , Genotype , Humans , Models, Theoretical , Phenotype , Whole Genome Sequencing , Workflow
3.
Hum Mutat ; 40(9): 1546-1556, 2019 09.
Article in English | MEDLINE | ID: mdl-31294896

ABSTRACT

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.


Subject(s)
BRCA1 Protein/genetics , BRCA2 Protein/genetics , Breast Neoplasms/diagnosis , Computational Biology/methods , Ovarian Neoplasms/diagnosis , Breast Neoplasms/genetics , Early Detection of Cancer , Female , Genetic Predisposition to Disease , Genetic Testing , Genetic Variation , Humans , Models, Genetic , Ovarian Neoplasms/genetics
4.
Hum Mutat ; 40(9): 1463-1473, 2019 09.
Article in English | MEDLINE | ID: mdl-31283071

ABSTRACT

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.


Subject(s)
Calmodulin/chemistry , Calmodulin/genetics , Computational Biology/methods , Mutation, Missense , Yeasts/growth & development , Algorithms , Binding Sites , Calcium/metabolism , Calmodulin/metabolism , Evolution, Molecular , Fungal Proteins/chemistry , Fungal Proteins/genetics , Fungal Proteins/metabolism , Genetic Fitness , Humans , Models, Genetic , Models, Molecular , Protein Conformation , Protein Engineering , Yeasts/genetics
5.
Hum Mutat ; 40(9): 1321-1329, 2019 09.
Article in English | MEDLINE | ID: mdl-31144782

ABSTRACT

Venous thromboembolism (VTE) is a common hematological disorder. VTE affects millions of people around the world each year and can be fatal. Earlier studies have revealed the possible VTE genetic risk factors in Europeans. The 2018 Critical Assessment of Genome Interpretation (CAGI) challenge had asked participants to distinguish between 66 VTE and 37 non-VTE African American (AA) individuals based on their exome sequencing data. We used variants from AA VTE association studies and VTE genes from DisGeNET database to evaluate VTE risk via four different approaches; two of these methods were most successful at the task. Our best performing method represented each exome as a vector of predicted functional effect scores of variants within the known genes. These exome vectors were then clustered with k-means. This approach achieved 70.8% precision and 69.7% recall in identifying VTE patients. Our second-best ranked method had collapsed the variant effect scores into gene-level function changes, using the same vector clustering approach for patient/control identification. These results show predictability of VTE risk in AA population and highlight the importance of variant-driven gene functional changes in judging disease status. Of course, more in-depth understanding of AA VTE pathogenicity is still needed for more precise predictions.


Subject(s)
Computational Biology/methods , Exome Sequencing/methods , Polymorphism, Single Nucleotide , Venous Thromboembolism/genetics , Black or African American/genetics , Case-Control Studies , Female , Genetic Association Studies , Genetic Predisposition to Disease , Humans , Male , Principal Component Analysis , United States/ethnology , Venous Thromboembolism/drug therapy , Venous Thromboembolism/ethnology , Warfarin
6.
Hum Mutat ; 40(9): 1215-1224, 2019 09.
Article in English | MEDLINE | ID: mdl-31301154

ABSTRACT

Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.


Subject(s)
Alternative Splicing , Computational Biology/methods , Mutation , Proteins/genetics , Animals , Congresses as Topic , Genetic Fitness , Humans , Models, Genetic , Sequence Homology, Nucleic Acid
7.
Hum Mutat ; 40(9): 1486-1494, 2019 09.
Article in English | MEDLINE | ID: mdl-31268618

ABSTRACT

The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI-5 challenge assessing the 38 missense variants affecting the human Pericentriolar material 1 protein (PCM1), our SNAP-based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI-5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in nonconserved positions. They were also better able to call effect variants in a structurally rich region than in a less-structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation-a feature that likely makes it a harder problem for generalized variant effect predictors.


Subject(s)
Autoantigens/genetics , Cell Cycle Proteins/genetics , Computational Biology/methods , Mutation, Missense , Algorithms , Autoantigens/metabolism , Cell Cycle Proteins/metabolism , Databases, Genetic , Genetic Predisposition to Disease , Humans
8.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Article in English | MEDLINE | ID: mdl-31241222

ABSTRACT

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Subject(s)
Breast Neoplasms/genetics , Checkpoint Kinase 2/genetics , Computational Biology/methods , Hispanic or Latino/genetics , Polymorphism, Single Nucleotide , Adult , Aged , Breast Neoplasms/ethnology , Case-Control Studies , Computer Simulation , Female , Genetic Predisposition to Disease , Humans , Linear Models , Middle Aged , United States/ethnology , Exome Sequencing
9.
Hum Mutat ; 40(9): 1261-1269, 2019 09.
Article in English | MEDLINE | ID: mdl-31090248

ABSTRACT

Single nucleotide mutations in exonic regions can significantly affect gene function through a disruption of splicing, and various computational methods have been developed to predict the splicing-related effects of a single nucleotide mutation. We implemented a new method using ensemble learning that combines two types of predictive models: (a) base sequence-based deep neural networks (DNNs) and (b) machine learning models based on genomic attributes. This method was applied to the Massively Parallel Splicing Assay challenge of the Fifth Critical Assessment of Genome Interpretation, in which challenge participants predicted various experimentally-defined exonic splicing mutations, and achieved a promising result. We successfully revealed that combining different predictive models based upon the stacked generalization method led to significant improvement in prediction performance. In addition, whereas most of the genomic features adopted in constructing machine learning models were previously reported, feature values generated with DSSP, a DNN-based splice site prediction tool, were novel and helpful for the prediction. Learning the sequence patterns associated with normal splicing and the change in splicing site probabilities caused by a mutation was presumed to be helpful in predicting splicing disruption.


Subject(s)
Computational Biology/methods , Polymorphism, Single Nucleotide , RNA Splicing , Deep Learning , Exons , Genomics , Humans , Models, Genetic
10.
Hum Mutat ; 40(9): 1270-1279, 2019 09.
Article in English | MEDLINE | ID: mdl-31074545

ABSTRACT

Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) "Vex-seq" challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice-disrupting variants among 2,094 single-nucleotide polymorphisms from the Exome Aggregation Consortium using model-predicted changes in percent spliced in (ΔPSI) associated with tested variants. Benchmarking our model against widely used state-of-the-art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ΔPSI measured for tested variants.


Subject(s)
Alternative Splicing , Polymorphism, Single Nucleotide , Proteins/chemistry , Proteins/genetics , Animals , Computational Biology , Humans , Protein Structure, Secondary , RNA Splice Sites , Regression Analysis , Exome Sequencing
11.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Article in English | MEDLINE | ID: mdl-31301157

ABSTRACT

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Subject(s)
Amino Acid Substitution , Computational Biology/methods , Cystathionine beta-Synthase/genetics , Cystathionine/metabolism , Cystathionine beta-Synthase/metabolism , Homocysteine/metabolism , Humans , Phenotype , Precision Medicine
12.
Hum Mutat ; 40(9): 1197-1201, 2019 09.
Article in English | MEDLINE | ID: mdl-31334884

ABSTRACT

Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'ka-je/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.


Subject(s)
Computational Biology/methods , Genome, Human , Algorithms , Congresses as Topic , Data Interpretation, Statistical , Genomics , Humans , Precision Medicine
13.
Hum Mutat ; 40(9): 1373-1391, 2019 09.
Article in English | MEDLINE | ID: mdl-31322791

ABSTRACT

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.


Subject(s)
Computational Biology/methods , Genetic Variation , Undiagnosed Diseases/diagnosis , Adolescent , Child , Child, Preschool , Computer Simulation , Databases, Genetic , Female , Genetic Predisposition to Disease , Humans , Male , Phenotype , Undiagnosed Diseases/genetics , Whole Genome Sequencing
14.
Hum Mutat ; 40(9): 1414-1423, 2019 09.
Article in English | MEDLINE | ID: mdl-31243847

ABSTRACT

Predicting the impact of mutations on proteins remains an important problem. As part of the CAGI5 frataxin challenge, we evaluate the accuracy with which Provean, FoldX, and ELASPIC can predict changes in the Gibbs free energy of a protein using a limited data set of eight mutations. We find that different methods have distinct strengths and limitations, with no method being strictly superior to other methods on all metrics. ELASPIC achieves the highest accuracy while also providing a web interface which simplifies the evaluation and analysis of mutations. FoldX is slightly less accurate than ELASPIC but is easier to run locally, as it does not depend on external tools or datasets. Provean achieves reasonable results while being computational less expensive than the other methods and not requiring a structure of the protein. In addition to methods submitted to the CAGI5 community experiment, and with the aim to inform about other methods with high accuracy, we also evaluate predictions made by Rosetta's ddg_monomer protocol, Rosetta's cartesian_ddg protocol, and thermodynamic integration calculations using Amber package. ELASPIC still achieves the highest accuracy, while Rosetta's catesian_ddg protocol appears to perform best in capturing the overall trend in the data.


Subject(s)
Computational Biology/methods , Iron-Binding Proteins/chemistry , Iron-Binding Proteins/genetics , Mutation , Humans , Models, Molecular , Protein Conformation , Protein Folding , Protein Stability , Thermodynamics , Frataxin
15.
Hum Mutat ; 40(9): 1455-1462, 2019 09.
Article in English | MEDLINE | ID: mdl-31066146

ABSTRACT

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.


Subject(s)
Computational Biology/methods , Genetic Variation , Proteins/chemistry , Proteins/genetics , Algorithms , Computer Simulation , Databases, Genetic , Genetic Predisposition to Disease , Humans , Machine Learning , Phenotype , Protein Stability
16.
Hum Mutat ; 40(9): 1495-1506, 2019 09.
Article in English | MEDLINE | ID: mdl-31184403

ABSTRACT

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.


Subject(s)
Computational Biology/methods , Methyltransferases/chemistry , Mutation , PTEN Phosphohydrolase/chemistry , High-Throughput Nucleotide Sequencing , Humans , Methyltransferases/genetics , PTEN Phosphohydrolase/genetics , Protein Stability
17.
Hum Mutat ; 40(9): 1519-1529, 2019 09.
Article in English | MEDLINE | ID: mdl-31342580

ABSTRACT

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.


Subject(s)
Acetylglucosaminidase/metabolism , Computational Biology/methods , Mutation, Missense , Acetylglucosaminidase/genetics , Humans , Models, Genetic , Regression Analysis
18.
Hum Mutat ; 38(9): 1051-1063, 2017 09.
Article in English | MEDLINE | ID: mdl-28817247

ABSTRACT

The exponential growth of genomic variants uncovered by next-generation sequencing necessitates efficient and accurate computational analyses to predict their functional effects. A number of computational methods have been developed for the task, but few unbiased comparisons of their performance are available. To fill the gap, The Critical Assessment of Genome Interpretation (CAGI) comprehensively assesses phenotypic predictions on newly collected experimental datasets. Here, we present the results of the SUMO conjugase challenge where participants were predicting functional effects of missense mutations in human SUMO-conjugating enzyme UBE2I. The performance of the predictors is similar to each other and is far from perfection. Evolutionary information from sequence alignments dominates the success: deleterious mutations at conserved positions and benign mutations at variable positions are accurately predicted. Prediction accuracy of other mutations remains unsatisfactory, and this fast-growing field of research is yet to learn the use of spatial structure information to improve the predictions significantly.


Subject(s)
Computational Biology/methods , Mutation, Missense , Ubiquitin-Conjugating Enzymes/genetics , Ubiquitin-Conjugating Enzymes/metabolism , Databases, Genetic , Evolution, Molecular , High-Throughput Nucleotide Sequencing , Humans , Models, Molecular , Protein Binding , Selection, Genetic , Sequence Alignment , Ubiquitin-Conjugating Enzymes/chemistry
19.
Hum Mutat ; 38(9): 1155-1168, 2017 09.
Article in English | MEDLINE | ID: mdl-28397312

ABSTRACT

The CAGI-4 Hopkins clinical panel challenge was an attempt to assess state-of-the-art methods for clinical phenotype prediction from DNA sequence. Participants were provided with exonic sequences of 83 genes for 106 patients from the Johns Hopkins DNA Diagnostic Laboratory. Five groups participated in the challenge, predicting both the probability that each patient had each of the 14 possible classes of disease, as well as one or more causal variants. In cases where the Hopkins laboratory reported a variant, at least one predictor correctly identified the disease class in 36 of the 43 patients (84%). Even in cases where the Hopkins laboratory did not find a variant, at least one predictor correctly identified the class in 39 of the 63 patients (62%). Each prediction group correctly diagnosed at least one patient that was not successfully diagnosed by any other group. We discuss the causal variant predictions by different groups and their implications for further development of methods to assess variants of unknown significance. Our results suggest that clinically relevant variants may be missed when physicians order small panels targeted on a specific phenotype. We also quantify the false-positive rate of DNA-guided analysis in the absence of prior phenotypic indication.


Subject(s)
Computational Biology/methods , Sequence Analysis, DNA/methods , Databases, Genetic , Genetic Predisposition to Disease , Genetic Testing , Humans , Phenotype
20.
Hum Mutat ; 38(9): 1123-1131, 2017 09.
Article in English | MEDLINE | ID: mdl-28370845

ABSTRACT

The Critical Assessment of Genome Interpretation (CAGI) is a global community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation. One of the 2015-2016 competitions focused on predicting the influence of mutations on the allosteric regulation of human liver pyruvate kinase. More than 30 different researchers accessed the challenge data. However, only four groups accepted the challenge. Features used for predictions ranged from evolutionary constraints, mutant site locations relative to active and effector binding sites, and computational docking outputs. Despite the range of expertise and strategies used by predictors, the best predictions were marginally greater than random for modified allostery resulting from mutations. In contrast, several groups successfully predicted which mutations severely reduced enzymatic activity. Nonetheless, poor predictions of allostery stands in stark contrast to the impression left by more than 700 PubMed entries identified using the identifiers "computational + allosteric." This contrast highlights a specialized need for new computational tools and utilization of benchmarks that focus on allosteric regulation.


Subject(s)
Benchmarking/methods , Pyruvate Kinase/chemistry , Pyruvate Kinase/genetics , Allosteric Regulation , Allosteric Site , Computational Biology/methods , Databases, Genetic , Fructosediphosphates/metabolism , Humans , Models, Molecular , Mutation , Pyruvate Kinase/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL