RESUMEN
Pathogenic variants in multiple genes on the X chromosome have been implicated in syndromic and non-syndromic intellectual disability disorders. ZFX on Xp22.11 encodes a transcription factor that has been linked to diverse processes including oncogenesis and development, but germline variants have not been characterized in association with disease. Here, we present clinical and molecular characterization of 18 individuals with germline ZFX variants. Exome or genome sequencing revealed 11 variants in 18 subjects (14 males and 4 females) from 16 unrelated families. Four missense variants were identified in 11 subjects, with seven truncation variants in the remaining individuals. Clinical findings included developmental delay/intellectual disability, behavioral abnormalities, hypotonia, and congenital anomalies. Overlapping and recurrent facial features were identified in all subjects, including thickening and medial broadening of eyebrows, variations in the shape of the face, external eye abnormalities, smooth and/or long philtrum, and ear abnormalities. Hyperparathyroidism was found in four families with missense variants, and enrichment of different tumor types was observed. In molecular studies, DNA-binding domain variants elicited differential expression of a small set of target genes relative to wild-type ZFX in cultured cells, suggesting a gain or loss of transcriptional activity. Additionally, a zebrafish model of ZFX loss displayed an altered behavioral phenotype, providing additional evidence for the functional significance of ZFX. Our clinical and experimental data support that variants in ZFX are associated with an X-linked intellectual disability syndrome characterized by a recurrent facial gestalt, neurocognitive and behavioral abnormalities, and an increased risk for congenital anomalies and hyperparathyroidism.
Asunto(s)
Hiperparatiroidismo , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Masculino , Femenino , Animales , Humanos , Discapacidad Intelectual/patología , Pez Cebra/genética , Mutación Missense/genética , Factores de Transcripción/genética , Fenotipo , Trastornos del Neurodesarrollo/genéticaRESUMEN
Activating signal co-integrator complex 1 (ASCC1) acts with ASCC-ALKBH3 complex in alkylation damage responses. ASCC1 uniquely combines two evolutionarily ancient domains: nucleotide-binding K-Homology (KH) (associated with regulating splicing, transcriptional, and translation) and two-histidine phosphodiesterase (PDE; associated with hydrolysis of cyclic nucleotide phosphate bonds). Germline mutations link loss of ASCC1 function to spinal muscular atrophy with congenital bone fractures 2 (SMABF2). Herein analysis of The Cancer Genome Atlas (TCGA) suggests ASCC1 RNA overexpression in certain tumors correlates with poor survival, Signatures 29 and 3 mutations, and genetic instability markers. We determined crystal structures of Alvinella pompejana (Ap) ASCC1 and Human (Hs) PDE domain revealing high-resolution details and features conserved over 500 million years of evolution. Extending our understanding of the KH domain Gly-X-X-Gly sequence motif, we define a novel structural Helix-Clasp-Helix (HCH) nucleotide binding motif and show ASCC1 sequence-specific binding to CGCG-containing RNA. The V-shaped PDE nucleotide binding channel has two His-Φ-Ser/Thr-Φ (HXT) motifs (Φ being hydrophobic) positioned to initiate cyclic phosphate bond hydrolysis. A conserved atypical active-site histidine torsion angle implies a novel PDE substrate. Flexible active site loop and arginine-rich domain linker appear regulatory. Small-angle X-ray scattering (SAXS) revealed aligned KH-PDE RNA binding sites with limited flexibility in solution. Quantitative evolutionary bioinformatic analyses of disease and cancer-associated mutations support implied functional roles for RNA binding, phosphodiesterase activity, and regulation. Collective results inform ASCC1's roles in transactivation and alkylation damage responses, its targeting by structure-based inhibitors, and how ASCC1 mutations may impact inherited disease and cancer.
Asunto(s)
Hidrolasas Diéster Fosfóricas , Humanos , Biología Computacional/métodos , Cristalografía por Rayos X , Hidrolasas Diéster Fosfóricas/metabolismo , Hidrolasas Diéster Fosfóricas/química , Hidrolasas Diéster Fosfóricas/genética , Motivos de Unión al ARN/genéticaRESUMEN
Genetic variants drive the evolution of traits and diseases. We previously modeled these variants as small displacements in fitness landscapes and estimated their functional impact by differentiating the evolutionary relationship between genotype and phenotype. Conversely, here we integrate these derivatives to identify genes steering specific traits. Over cancer cohorts, integration identified 460 likely tumor-driving genes. Many have literature and experimental support but had eluded prior genomic searches for positive selection in tumors. Beyond providing cancer insights, these results introduce a general calculus of evolution to quantify the genotype-phenotype relationship and discover genes associated with complex traits and diseases.
Asunto(s)
Cálculos , Neoplasias , Evolución Biológica , Aptitud Genética , Genotipo , Humanos , Modelos Genéticos , Neoplasias/genética , Fenotipo , Selección GenéticaRESUMEN
BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , FenotipoRESUMEN
Variants which disrupt splicing are a frequent cause of rare disease that have been under-ascertained clinically. Accurate and efficient methods to predict a variant's impact on splicing are needed to interpret the growing number of variants of unknown significance (VUS) identified by exome and genome sequencing. Here, we present the results of the CAGI6 Splicing VUS challenge, which invited predictions of the splicing impact of 56 variants ascertained clinically and functionally validated to determine splicing impact. The performance of 12 prediction methods, along with SpliceAI and CADD, was compared on the 56 functionally validated variants. The maximum accuracy achieved was 82% from two different approaches, one weighting SpliceAI scores by minor allele frequency, and one applying the recently published Splicing Prediction Pipeline (SPiP). SPiP performed optimally in terms of sensitivity, while an ensemble method combining multiple prediction tools and information from databases exceeded all others for specificity. Several challenge methods equalled or exceeded the performance of SpliceAI, with ultimate choice of prediction method likely to depend on experimental or clinical aims. One quarter of the variants were incorrectly predicted by at least 50% of the methods, highlighting the need for further improvements to splicing prediction methods for successful clinical application.
RESUMEN
This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
RESUMEN
SUMMARY: In any population under selective pressure, a central challenge is to distinguish the genes that drive adaptation from others which, subject to population variation, harbor many neutral mutations de novo. We recently showed that such genes could be identified by supplementing information on mutational frequency with an evolutionary analysis of the likely functional impact of coding variants. This approach improved the discovery of driver genes in both lab-evolved and environmental Escherichia coli strains. To facilitate general adoption, we now developed ShinyBioHEAT, an R Shiny web-based application that enables identification of phenotype driving gene in two commonly used model bacteria, E.coli and Bacillus subtilis, with no specific computational skill requirements. ShinyBioHEAT not only supports transparent and interactive analysis of lab evolution data in E.coli and B.subtilis, but it also creates dynamic visualizations of mutational impact on protein structures, which add orthogonal checks on predicted drivers. AVAILABILITY AND IMPLEMENTATION: Code for ShinyBioHEAT is available at https://github.com/LichtargeLab/ShinyBioHEAT. The Shiny application is additionally hosted at http://bioheat.lichtargelab.org/.
Asunto(s)
Escherichia coli , Aplicaciones Móviles , Escherichia coli/genética , Programas Informáticos , Mutación , Interpretación Estadística de Datos , Tasa de MutaciónRESUMEN
BACKGROUND: Cisplatin (CDDP) is a mainstay treatment for advanced head and neck squamous cell carcinomas (HNSCC) despite a high frequency of innate and acquired resistance. We hypothesised that tumours acquire CDDP resistance through an enhanced reductive state dependent on metabolic rewiring. METHODS: To validate this model and understand how an adaptive metabolic programme might be imprinted, we performed an integrated analysis of CDDP-resistant HNSCC clones from multiple genomic backgrounds by whole-exome sequencing, RNA-seq, mass spectrometry, steady state and flux metabolomics. RESULTS: Inactivating KEAP1 mutations or reductions in KEAP1 RNA correlated with Nrf2 activation in CDDP-resistant cells, which functionally contributed to resistance. Proteomics identified elevation of downstream Nrf2 targets and the enrichment of enzymes involved in generation of biomass and reducing equivalents, metabolism of glucose, glutathione, NAD(P), and oxoacids. This was accompanied by biochemical and metabolic evidence of an enhanced reductive state dependent on coordinated glucose and glutamine catabolism, associated with reduced energy production and proliferation, despite normal mitochondrial structure and function. CONCLUSIONS: Our analysis identified coordinated metabolic changes associated with CDDP resistance that may provide new therapeutic avenues through targeting of these convergent pathways.
Asunto(s)
Antineoplásicos , Neoplasias de Cabeza y Cuello , Humanos , Cisplatino/metabolismo , Carcinoma de Células Escamosas de Cabeza y Cuello , Proteína 1 Asociada A ECH Tipo Kelch/genética , Factor 2 Relacionado con NF-E2/genética , Resistencia a Antineoplásicos/genética , Línea Celular Tumoral , Glucosa , Antineoplásicos/farmacologíaRESUMEN
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Asunto(s)
Pruebas Genéticas , Genoma Humano , Biología Computacional , Variación Genética , Humanos , Reproducibilidad de los ResultadosRESUMEN
MOTIVATION: Since the first recognized case of COVID-19, more than 100 million people have been infected worldwide. Global efforts in drug and vaccine development to fight the disease have yielded vaccines and drug candidates to cure COVID-19. However, the spread of SARS-CoV-2 variants threatens the continued efficacy of these treatments. In order to address this, we interrogate the evolutionary history of the entire SARS-CoV-2 proteome to identify evolutionarily conserved functional sites that can inform the search for treatments with broader coverage across the coronavirus family. RESULTS: Combining coronavirus family sequence information with the mutations observed in the current COVID-19 outbreak, we systematically and comprehensively define evolutionarily stable sites that may provide useful drug and vaccine targets and which are less likely to be compromised by the emergence of new virus strains. Several experimentally validated effective drugs interact with these proposed target sites. In addition, the same evolutionary information can prioritize cross reactive antigens that are useful in directing multi-epitope vaccine strategies to illicit broadly neutralizing immune responses to the betacoronavirus family. Although the results are focused on SARS-CoV-2, these approaches stem from evolutionary principles that are agnostic to the organism or infective agent. AVAILABILITY AND IMPLEMENTATION: The results of this work are made interactively available at http://cov.lichtargelab.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
COVID-19 , Vacunas Virales , Humanos , SARS-CoV-2/genética , Proteoma , Vacunas contra la COVID-19 , Vacunas Virales/genéticaRESUMEN
PURPOSE: We characterize the clinical and molecular phenotypes of six unrelated individuals with intellectual disability and autism spectrum disorder who carry heterozygous missense variants of the PRKAR1B gene, which encodes the R1ß subunit of the cyclic AMP-dependent protein kinase A (PKA). METHODS: Variants of PRKAR1B were identified by single- or trio-exome analysis. We contacted the families and physicians of the six individuals to collect phenotypic information, performed in vitro analyses of the identified PRKAR1B-variants, and investigated PRKAR1B expression during embryonic development. RESULTS: Recent studies of large patient cohorts with neurodevelopmental disorders found significant enrichment of de novo missense variants in PRKAR1B. In our cohort, de novo origin of the PRKAR1B variants could be confirmed in five of six individuals, and four carried the same heterozygous de novo variant c.1003C>T (p.Arg335Trp; NM_001164760). Global developmental delay, autism spectrum disorder, and apraxia/dyspraxia have been reported in all six, and reduced pain sensitivity was found in three individuals carrying the c.1003C>T variant. PRKAR1B expression in the brain was demonstrated during human embryonal development. Additionally, in vitro analyses revealed altered basal PKA activity in cells transfected with variant-harboring PRKAR1B expression constructs. CONCLUSION: Our study provides strong evidence for a PRKAR1B-related neurodevelopmental disorder.
Asunto(s)
Apraxias , Trastorno del Espectro Autista , Discapacidad Intelectual , Trastornos del Neurodesarrollo , Trastorno del Espectro Autista/genética , Subunidad RIbeta de la Proteína Quinasa Dependiente de AMP Cíclico , Femenino , Humanos , Discapacidad Intelectual/genética , Trastornos del Neurodesarrollo/genética , Dolor , EmbarazoRESUMEN
Many computational approaches estimate the effect of coding variants, but their predictions often disagree with each other. These contradictions confound users and raise questions regarding reliability. Performance assessments can indicate the expected accuracy for each method and highlight advantages and limitations. The Critical Assessment of Genome Interpretation (CAGI) community aims to organize objective and systematic assessments: They challenge predictors on unpublished experimental and clinical data and assign independent assessors to evaluate the submissions. We participated in CAGI experiments as predictors, using the Evolutionary Action (EA) method to estimate the fitness effect of coding mutations. EA is untrained, uses homology information, and relies on a formal equation: The fitness effect equals the functional sensitivity to residue changes multiplied by the magnitude of the substitution. In previous CAGI experiments (between 2011 and 2016), our submissions aimed to predict the protein activity of single mutants. In 2018 (CAGI5), we also submitted predictions regarding clinical associations, folding stability, and matching genomic data with phenotype. For all these diverse challenges, we used EA to predict the fitness effect of variants, adjusted to specifically address each question. Our submissions had consistently good performance, suggesting that EA predicts reliably the effects of genetic variants.
Asunto(s)
Biología Computacional/métodos , Variación Genética , Análisis de Secuencia de ADN/métodos , Bases de Datos Genéticas , Evolución Molecular , Aptitud Genética , Predisposición Genética a la Enfermedad , Humanos , Fenotipo , Alineación de SecuenciaRESUMEN
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Asunto(s)
Biología Computacional/métodos , Metiltransferasas/química , Mutación , Fosfohidrolasa PTEN/química , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metiltransferasas/genética , Fosfohidrolasa PTEN/genética , Estabilidad ProteicaRESUMEN
This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.
Asunto(s)
Calmodulina/química , Calmodulina/genética , Biología Computacional/métodos , Mutación Missense , Levaduras/crecimiento & desarrollo , Algoritmos , Sitios de Unión , Calcio/metabolismo , Calmodulina/metabolismo , Evolución Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptitud Genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformación Proteica , Ingeniería de Proteínas , Levaduras/genéticaRESUMEN
The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.
Asunto(s)
Trastorno del Espectro Autista/genética , Biología Computacional/métodos , Discapacidad Intelectual/genética , Análisis de Secuencia de ADN/métodos , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Fenotipo , Sitios de Carácter CuantitativoRESUMEN
The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.
Asunto(s)
Autoantígenos/genética , Proteínas de Ciclo Celular/genética , Biología Computacional/métodos , Mutación Missense , Esquizofrenia/genética , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Redes Neurales de la Computación , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
Asunto(s)
Sustitución de Aminoácidos , Proteínas de Unión a Hierro/química , Proteínas de Unión a Hierro/genética , Algoritmos , Dicroismo Circular , Humanos , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Estabilidad Proteica , FrataxinaRESUMEN
The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.
Asunto(s)
Neoplasias de la Mama/genética , Quinasa de Punto de Control 2/genética , Biología Computacional/métodos , Hispánicos o Latinos/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Neoplasias de la Mama/etnología , Estudios de Casos y Controles , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Humanos , Modelos Lineales , Persona de Mediana Edad , Estados Unidos/etnología , Secuenciación del ExomaRESUMEN
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
Asunto(s)
Sustitución de Aminoácidos , Biología Computacional/métodos , Cistationina betasintasa/genética , Cistationina/metabolismo , Cistationina betasintasa/metabolismo , Homocisteína/metabolismo , Humanos , Fenotipo , Medicina de PrecisiónRESUMEN
Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.