Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 1 de 1
Filter
Add more filters











Database
Language
Publication year range
1.
Genet Med ; 24(10): 2091-2102, 2022 10.
Article in English | MEDLINE | ID: mdl-35976265

ABSTRACT

PURPOSE: Cohort building is a powerful foundation for improving clinical care, performing biomedical research, recruiting for clinical trials, and many other applications. We set out to build a cohort of all monogenic patients with a definitive causal gene diagnosis in a 3-million patient hospital system. METHODS: We define a subset (4461) of OMIM diseases that have at least 1 known monogenic causal gene. We then introduce MonoMiner, a natural language processing framework to identify molecularly confirmed monogenic patients from free-text clinical notes. RESULTS: We show that ICD-10-CM codes cover only a fraction of monogenic diseases and that even where available, ICD-10-CM code‒based patient retrieval offers 0.14 precision. Searching by causal gene symbol offers great recall but has an even worse 0.07 precision. MonoMiner achieves 6 to 11 times higher precision (0.80), with 0.87 precision on disease diagnosis alone, tagging 4259 patients with 560 monogenic diseases and 534 causal genes, at 0.48 recall. CONCLUSION: MonoMiner enables the discovery of a large, high-precision cohort of patients with monogenic diseases with an established molecular diagnosis, empowering numerous downstream uses. Because it relies solely on clinical notes, MonoMiner is highly portable, and its approach is adaptable to other domains and languages.


Subject(s)
Electronic Health Records , Natural Language Processing , Cohort Studies , Humans
SELECTION OF CITATIONS
SEARCH DETAIL