Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus.

Lyons, Erica L; Watson, Daniel; Alodadi, Mohammad S; Haugabook, Sharie J; Tawa, Gregory J; Hannah-Shmouni, Fady; Porter, Forbes D; Collins, Jack R; Ottinger, Elizabeth A; Mudunuri, Uma S

Lyons, Erica L; Watson, Daniel; Alodadi, Mohammad S; Haugabook, Sharie J; Tawa, Gregory J; Hannah-Shmouni, Fady; Porter, Forbes D; Collins, Jack R; Ottinger, Elizabeth A; Mudunuri, Uma S.

Affiliation

Lyons EL; Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
Watson D; Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
Alodadi MS; Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
Haugabook SJ; Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA.
Tawa GJ; Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA.
Hannah-Shmouni F; Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA.
Porter FD; Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA.
Collins JR; Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
Ottinger EA; Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA. elizabeth.ottinger@nih.gov.
Mudunuri US; Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA. uma.mudunuri@nih.gov.

BMC Genomics ; 24(1): 460, 2023 Aug 16.

Article in En | MEDLINE | ID: mdl-37587458

ABSTRACT

BACKGROUND: Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS: This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS: Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.

Subject(s)
Key words

CTD; Gene variant; Literature curation; Rare disease; SLC6A8; Text mining; Variant database

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Creatine / Rare Diseases Limits: Humans Language: En Journal: BMC Genomics Journal subject: GENETICA Year: 2023 Document type: Article Affiliation country: Country of publication:

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google