Pesquisa | BVS Economia da Saúde

Critical assessment of missense variant effect predictors on disease-relevant variant data.

Rastogi, Ruchir; Chung, Ryan; Li, Sindy; Li, Chang; Lee, Kyoungyeul; Woo, Junwoo; Kim, Dong-Wook; Keum, Changwon; Babbi, Giulia; Martelli, Pier Luigi; Savojardo, Castrense; Casadio, Rita; Chennen, Kirsley; Weber, Thomas; Poch, Olivier; Ancien, François; Cia, Gabriel; Pucci, Fabrizio; Raimondi, Daniele; Vranken, Wim; Rooman, Marianne; Marquet, Céline; Olenyi, Tobias; Rost, Burkhard; Andreoletti, Gaia; Kamandula, Akash; Peng, Yisu; Bakolitsa, Constantina; Mort, Matthew; Cooper, David N; Bergquist, Timothy; Pejaver, Vikas; Liu, Xiaoming; Radivojac, Predrag; Brenner, Steven E; Ioannidis, Nilah M.

bioRxiv ; 2024 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-38895200

RESUMO

Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity of missense variants is necessary to evaluate their clinical and research utility and suggest directions for future improvement. Here, as part of the sixth edition of the Critical Assessment of Genome Interpretation (CAGI) challenge, we assess missense variant effect predictors (or variant impact predictors) on an evaluation dataset of rare missense variants from disease-relevant databases. Our assessment evaluates predictors submitted to the CAGI6 Annotate-All-Missense challenge, predictors commonly used by the clinical genetics community, and recently developed deep learning methods for variant effect prediction. To explore a variety of settings that are relevant for different clinical and research applications, we assess performance within different subsets of the evaluation data and within high-specificity and high-sensitivity regimes. We find strong performance of many predictors across multiple settings. Meta-predictors tend to outperform their constituent individual predictors; however, several individual predictors have performance similar to that of commonly used meta-predictors. The relative performance of predictors differs in high-specificity and high-sensitivity regimes, suggesting that different methods may be best suited to different use cases. We also characterize two potential sources of bias. Predictors that incorporate allele frequency as a predictive feature tend to have reduced performance when distinguishing pathogenic variants from very rare benign variants, and predictors supervised on pathogenicity labels from curated variant databases often learn label imbalances within genes. Overall, we find notable advances over the oldest and most cited missense variant effect predictors and continued improvements among the most recently developed tools, and the CAGI Annotate-All-Missense challenge (also termed the Missense Marathon) will continue to assess state-of-the-art methods as the field progresses. Together, our results help illuminate the current clinical and research utility of missense variant effect predictors and identify potential areas for future development.

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.

Stenton, Sarah L; O'Leary, Melanie C; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael W; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

Hum Genomics ; 18(1): 44, 2024 Apr 29.

Artigo em Inglês | MEDLINE | ID: mdl-38685113

RESUMO

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

Assuntos

Doenças Raras , Humanos , Doenças Raras/genética , Doenças Raras/diagnóstico , Genoma Humano/genética , Variação Genética/genética , Biologia Computacional/métodos , Fenótipo

CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs).

Aspromonte, Maria Cristina; Conte, Alessio Del; Zhu, Shaowen; Tan, Wuwei; Shen, Yang; Zhang, Yexian; Li, Qi; Wang, Maggie Haitian; Babbi, Giulia; Bovo, Samuele; Martelli, Pier Luigi; Casadio, Rita; Althagafi, Azza; Toonsi, Sumyyah; Kulmanov, Maxat; Hoehndorf, Robert; Katsonis, Panagiotis; Williams, Amanda; Lichtarge, Olivier; Xian, Su; Surento, Wesley; Pejaver, Vikas; Mooney, Sean D; Sunderam, Uma; Srinivasan, Rajgopal; Murgia, Alessandra; Piovesan, Damiano; Tosatto, Silvio C E; Leonardi, Emanuela.

Res Sq ; 2023 Aug 02.

Artigo em Inglês | MEDLINE | ID: mdl-37577579

RESUMO

In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.

Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project.

Stenton, Sarah L; O'Leary, Melanie; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

medRxiv ; 2023 Aug 04.

Artigo em Inglês | MEDLINE | ID: mdl-37577678

RESUMO

Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants.

Cline, Melissa S; Babbi, Giulia; Bonache, Sandra; Cao, Yue; Casadio, Rita; de la Cruz, Xavier; Díez, Orland; Gutiérrez-Enríquez, Sara; Katsonis, Panagiotis; Lai, Carmen; Lichtarge, Olivier; Martelli, Pier L; Mishne, Gilad; Moles-Fernández, Alejandro; Montalban, Gemma; Mooney, Sean D; O'Conner, Robert; Ootes, Lars; Özkan, Selen; Padilla, Natalia; Pagel, Kymberleigh A; Pejaver, Vikas; Radivojac, Predrag; Riera, Casandra; Savojardo, Castrense; Shen, Yang; Sun, Yuanfei; Topper, Scott; Parsons, Michael T; Spurdle, Amanda B; Goldgar, David E.

Hum Mutat ; 40(9): 1546-1556, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31294896

RESUMO

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.

Assuntos

Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias da Mama/diagnóstico , Biologia Computacional/métodos , Neoplasias Ovarianas/diagnóstico , Neoplasias da Mama/genética , Detecção Precoce de Câncer , Feminino , Predisposição Genética para Doença , Testes Genéticos , Variação Genética , Humanos , Modelos Genéticos , Neoplasias Ovarianas/genética

Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016.

Clark, Wyatt T; Kasak, Laura; Bakolitsa, Constantina; Hu, Zhiqiang; Andreoletti, Gaia; Babbi, Giulia; Bromberg, Yana; Casadio, Rita; Dunbrack, Roland; Folkman, Lukas; Ford, Colby T; Jones, David; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier L; Mooney, Sean D; Nodzak, Conor; Pal, Lipika R; Radivojac, Predrag; Savojardo, Castrense; Shi, Xinghua; Zhou, Yaoqi; Uppal, Aneeta; Xu, Qifang; Yin, Yizhou; Pejaver, Vikas; Wang, Meng; Wei, Liping; Moult, John; Yu, Guoying Karen; Brenner, Steven E; LeBowitz, Jonathan H.

Hum Mutat ; 40(9): 1519-1529, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31342580

RESUMO

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.

Assuntos

Acetilglucosaminidase/metabolismo , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Acetilglucosaminidase/genética , Humanos , Modelos Genéticos , Análise de Regressão

CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases.

Kasak, Laura; Hunter, Jesse M; Udani, Rupa; Bakolitsa, Constantina; Hu, Zhiqiang; Adhikari, Aashish N; Babbi, Giulia; Casadio, Rita; Gough, Julian; Guerrero, Rafael F; Jiang, Yuxiang; Joseph, Thomas; Katsonis, Panagiotis; Kotte, Sujatha; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Mooney, Sean D; Moult, John; Pal, Lipika R; Poitras, Jennifer; Radivojac, Predrag; Rao, Aditya; Sivadasan, Naveen; Sunderam, Uma; Saipradeep, V G; Yin, Yizhou; Zaucha, Jan; Brenner, Steven E; Meyn, M Stephen.

Hum Mutat ; 40(9): 1373-1391, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31322791

RESUMO

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.

Assuntos

Biologia Computacional/métodos , Variação Genética , Doenças não Diagnosticadas/diagnóstico , Adolescente , Criança , Pré-Escolar , Simulação por Computador , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fenótipo , Doenças não Diagnosticadas/genética , Sequenciamento Completo do Genoma

Assessment of methods for predicting the effects of PTEN and TPMT protein variants.

Pejaver, Vikas; Babbi, Giulia; Casadio, Rita; Folkman, Lukas; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Miller, Maximilian; Moult, John; Pal, Lipika R; Savojardo, Castrense; Yin, Yizhou; Zhou, Yaoqi; Radivojac, Predrag; Bromberg, Yana.

Hum Mutat ; 40(9): 1495-1506, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31184403

RESUMO

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.

Assuntos

Biologia Computacional/métodos , Metiltransferases/química , Mutação , PTEN Fosfo-Hidrolase/química , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metiltransferases/genética , PTEN Fosfo-Hidrolase/genética , Estabilidade Proteica

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.

Daneshjou, Roxana; Wang, Yanran; Bromberg, Yana; Bovo, Samuele; Martelli, Pier L; Babbi, Giulia; Lena, Pietro Di; Casadio, Rita; Edwards, Matthew; Gifford, David; Jones, David T; Sundaram, Laksshman; Bhat, Rajendra Rana; Li, Xiaolin; Pal, Lipika R; Kundu, Kunal; Yin, Yizhou; Moult, John; Jiang, Yuxiang; Pejaver, Vikas; Pagel, Kymberleigh A; Li, Biao; Mooney, Sean D; Radivojac, Predrag; Shah, Sohela; Carraro, Marco; Gasparini, Alessandra; Leonardi, Emanuela; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E; Bachar, Eran; Azaria, Johnathan R; Ofran, Yanay; Unger, Ron; Niroula, Abhishek; Vihinen, Mauno; Chang, Billy; Wang, Maggie H; Franke, Andre; Petersen, Britt-Sabina; Pirooznia, Mehdi; Zandi, Peter; McCombie, Richard; Potash, James B; Altman, Russ B; Klein, Teri E; Hoskins, Roger A; Repo, Susanna; Brenner, Steven E.

Hum Mutat ; 38(9): 1182-1192, 2017 09.

Artigo em Inglês | MEDLINE | ID: mdl-28634997

RESUMO

Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotype-phenotype relationships.

Assuntos

Transtorno Bipolar/genética , Doença de Crohn/genética , Sequenciamento do Exoma/métodos , Medicina de Precisão/métodos , Varfarina/uso terapêutico , Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Disseminação de Informação , Variantes Farmacogenômicos , Fenótipo , Varfarina/farmacologia

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA