|

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.

Stenton, Sarah L; O'Leary, Melanie C; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael W; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

Hum Genomics ; 18(1): 44, 2024 Apr 29.

Article En | MEDLINE | ID: mdl-38685113

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

Rare Diseases , Humans , Rare Diseases/genetics , Rare Diseases/diagnosis , Genome, Human/genetics , Genetic Variation/genetics , Computational Biology/methods , Phenotype

Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project.

Stenton, Sarah L; O'Leary, Melanie; Lemire, Gabrielle; VanNoy, Grace E; DiTroia, Stephanie; Ganesh, Vijay S; Groopman, Emily; O'Heir, Emily; Mangilog, Brian; Osei-Owusu, Ikeoluwa; Pais, Lynn S; Serrano, Jillian; Singer-Berk, Moriel; Weisburd, Ben; Wilson, Michael; Austin-Tse, Christina; Abdelhakim, Marwa; Althagafi, Azza; Babbi, Giulia; Bellazzi, Riccardo; Bovo, Samuele; Carta, Maria Giulia; Casadio, Rita; Coenen, Pieter-Jan; De Paoli, Federica; Floris, Matteo; Gajapathy, Manavalan; Hoehndorf, Robert; Jacobsen, Julius O B; Joseph, Thomas; Kamandula, Akash; Katsonis, Panagiotis; Kint, Cyrielle; Lichtarge, Olivier; Limongelli, Ivan; Lu, Yulan; Magni, Paolo; Mamidi, Tarun Karthik Kumar; Martelli, Pier Luigi; Mulargia, Marta; Nicora, Giovanna; Nykamp, Keith; Pejaver, Vikas; Peng, Yisu; Pham, Thi Hong Cam; Podda, Maurizio S; Rao, Aditya; Rizzo, Ettore; Saipradeep, Vangala G; Savojardo, Castrense.

medRxiv ; 2023 Aug 04.

Article En | MEDLINE | ID: mdl-37577678

Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees.

Gong, Wuming; Granados, Alejandro A; Hu, Jingyuan; Jones, Matthew G; Raz, Ofir; Salvador-Martínez, Irepan; Zhang, Hanrui; Chow, Ke-Huan K; Kwak, Il-Youp; Retkute, Renata; Prusokiene, Alisa; Prusokas, Augustinas; Khodaverdian, Alex; Zhang, Richard; Rao, Suhas; Wang, Robert; Rennert, Phil; Saipradeep, Vangala G; Sivadasan, Naveen; Rao, Aditya; Joseph, Thomas; Srinivasan, Rajgopal; Peng, Jiajie; Han, Lu; Shang, Xuequn; Garry, Daniel J; Yu, Thomas; Chung, Verena; Mason, Michael; Liu, Zhandong; Guan, Yuanfang; Yosef, Nir; Shendure, Jay; Telford, Maximilian J; Shapiro, Ehud; Elowitz, Michael B; Meyer, Pablo.

Cell Syst ; 12(8): 810-826.e4, 2021 08 18.

Article En | MEDLINE | ID: mdl-34146472

The recent advent of CRISPR and other molecular tools enabled the reconstruction of cell lineages based on induced DNA mutations and promises to solve the ones of more complex organisms. To date, no lineage reconstruction algorithms have been rigorously examined for their performance and robustness across dataset types and number of cells. To benchmark such methods, we decided to organize a DREAM challenge using in vitro experimental intMEMOIR recordings and in silico data for a C. elegans lineage tree of about 1,000 cells and a Mus musculus tree of 10,000 cells. Some of the 22 approaches submitted had excellent performance, but structural features of the trees prevented optimal reconstructions. Using smaller sub-trees as training sets proved to be a good approach for tuning algorithms to reconstruct larger trees. The simulation and reconstruction methods here generated delineate a potential way forward for solving larger cell lineage trees such as in mouse.

Benchmarking , Caenorhabditis elegans , Algorithms , Animals , Caenorhabditis elegans/genetics , Cell Lineage/genetics , Computer Simulation , Mice

PRIORI-T: A tool for rare disease gene prioritization using MEDLINE.

Rao, Aditya; Joseph, Thomas; Saipradeep, Vangala G; Kotte, Sujatha; Sivadasan, Naveen; Srinivasan, Rajgopal.

PLoS One ; 15(4): e0231728, 2020.

Article En | MEDLINE | ID: mdl-32315351

INTRODUCTION: Phenotype-driven rare disease gene prioritization relies on high quality curated resources containing disease, gene and phenotype annotations. However, the effectiveness of gene prioritization tools is constrained by the incomplete coverage of rare disease, phenotype and gene annotations in such curated resources. METHODS: We extracted rare disease correlation pairs involving diseases, phenotypes and genes from MEDLINE abstracts and used the information propagation algorithm GCAS to build an association network. We built a tool called PRIORI-T for rare disease gene prioritization that uses this network for phenotype-driven rare disease gene prioritization. The quality of disease-gene associations in PRIORI-T was compared with resources such as DisGeNET and Open Targets in the context of rare diseases. The gene prioritization performance of PRIORI-T was evaluated using phenotype descriptions of 230 real-world rare disease clinical cases collated from recent publications, as well as compared to other gene prioritization tools such as HANRD and Orphamizer. RESULTS: PRIORI-T contains qualitatively better associations than DisGeNET and Open Targets. Furthermore, the causal genes were captured within Top-50 for more than 40% of the real-world clinical cases and within Top-300 for more than 72% of the cases when PRIORI-T was used for gene prioritization. It outperformed other gene prioritization tools such as HANRD and Orphamizer that primarily rely on curated resources. CONCLUSIONS: PRIORI-T exhibited improved gene prioritization performance without requiring high quality curated data. Thus, it holds great promise in phenotype-driven gene prioritization for rare disease studies.

Algorithms , Computational Biology/methods , MEDLINE , Rare Diseases/genetics , Humans , Phenotype

TPX: Biomedical literature search made easy.

Joseph, Thomas; Saipradeep, Vangala G; Raghavan, Ganesh Sekar Venkat; Srinivasan, Rajgopal; Rao, Aditya; Kotte, Sujatha; Sivadasan, Naveen.

Bioinformation ; 8(12): 578-80, 2012.

Article En | MEDLINE | ID: mdl-22829734

UNLABELLED: TPX is a web-based PubMed search enhancement tool that enables faster article searching using analysis and exploration features. These features include identification of relevant biomedical concepts from search results with linkouts to source databases, concept based article categorization, concept assisted search and filtering, query refinement. A distinguishing feature here is the ability to add user-defined concept names and/or concept types for named entity recognition. The tool allows contextual exploration of knowledge sources by providing concept association maps derived from the MEDLINE repository. It also has a full-text search mode that can be configured on request to access local text repositories, incorporating entity co-occurrence search at sentence/paragraph levels. Local text files can also be analyzed on-the-fly. AVAILABILITY: http://tpx.atc.tcs.com