Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records.

Shao, Yijun; Zeng, Qing T; Chen, Kathryn K; Shutes-David, Andrew; Thielke, Stephen M; Tsuang, Debby W

Shao, Yijun; Zeng, Qing T; Chen, Kathryn K; Shutes-David, Andrew; Thielke, Stephen M; Tsuang, Debby W.

Afiliação

Shao Y; George Washington University, 800 22nd St. NW, Science and Engineering Hall, Ste. #8390, Washington, DC, 20052, USA.
Zeng QT; Washington DC VA Medical Center, 50 Irving St. NW, Washington, 20422, DC, USA.
Chen KK; George Washington University, 800 22nd St. NW, Science and Engineering Hall, Ste. #8390, Washington, DC, 20052, USA.
Shutes-David A; Washington DC VA Medical Center, 50 Irving St. NW, Washington, 20422, DC, USA.
Thielke SM; Geriatric Research, Education, and Clinical Center, S182 GRECC, VA Puget Sound Health Care System, 1660 S. Columbian Way, Seattle, WA, 98108, USA.
Tsuang DW; Department of Psychiatry and Behavioral Sciences, University of Washington, 1959 NE Pacific St., Box 356560, Seattle, WA, 98195, USA.

BMC Med Inform Decis Mak ; 19(1): 128, 2019 07 09.

Article em En | MEDLINE | ID: mdl-31288818

ABSTRACT

ABSTRACT

BACKGROUND:

Dementia is underdiagnosed in both the general population and among Veterans. This underdiagnosis decreases quality of life, reduces opportunities for interventions, and increases health-care costs. New approaches are therefore necessary to facilitate the timely detection of dementia. This study seeks to identify cases of undiagnosed dementia by developing and validating a weakly supervised machine-learning approach that incorporates the analysis of both structured and unstructured electronic health record (EHR) data.

METHODS:

A topic modeling approach that included latent Dirichlet allocation, stable topic extraction, and random sampling was applied to VHA EHRs. Topic features from unstructured data and features from structured data were compared between Veterans with (n = 1861) and without (n = 9305) ICD-9 dementia codes. A logistic regression model was used to develop dementia prediction scores, and manual reviews were conducted to validate the machine-learning results.

RESULTS:

A total of 853 features were identified (290 topics, 174 non-dementia ICD codes, 159 CPT codes, 59 medications, and 171 note types) for the development of logistic regression prediction scores. These scores were validated in a subset of Veterans without ICD-9 dementia codes (n = 120) by experts in dementia who performed manual record reviews and achieved a high level of inter-rater agreement. The manual reviews were used to develop a receiver of characteristic (ROC) curve with different thresholds for case detection, including a threshold of 0.061, which produced an optimal sensitivity (0.825) and specificity (0.832).

CONCLUSIONS:

Dementia is underdiagnosed, and thus, ICD codes alone cannot serve as a gold standard for diagnosis. However, this study suggests that imperfect data (e.g., ICD codes in combination with other EHR features) can serve as a silver standard to develop a risk model, apply that model to patients without dementia codes, and then select a case-detection threshold. The study is one of the first to utilize both structured and unstructured EHRs to develop risk scores for the diagnosis of dementia.

Assuntos

Diagnóstico Tardio; Demência/diagnóstico; Registros Eletrônicos de Saúde; Classificação Internacional de Doenças; Aprendizado de Máquina; Idoso; Idoso de 80 Anos ou mais; Feminino; Humanos; Masculino; Veteranos

Palavras-chave

Dementia; Diagnosis; Machine learning; Medical records; Veterans

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Classificação Internacional de Doenças / Demência / Diagnóstico Tardio / Registros Eletrônicos de Saúde / Aprendizado de Máquina Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google