Natural language processing to identify lupus nephritis phenotype in electronic health records.

Deng, Yu; Pacheco, Jennifer A; Ghosh, Anika; Chung, Anh; Mao, Chengsheng; Smith, Joshua C; Zhao, Juan; Wei, Wei-Qi; Barnado, April; Dorn, Chad; Weng, Chunhua; Liu, Cong; Cordon, Adam; Yu, Jingzhi; Tedla, Yacob; Kho, Abel; Ramsey-Goldman, Rosalind; Walunas, Theresa; Luo, Yuan

Deng, Yu; Pacheco, Jennifer A; Ghosh, Anika; Chung, Anh; Mao, Chengsheng; Smith, Joshua C; Zhao, Juan; Wei, Wei-Qi; Barnado, April; Dorn, Chad; Weng, Chunhua; Liu, Cong; Cordon, Adam; Yu, Jingzhi; Tedla, Yacob; Kho, Abel; Ramsey-Goldman, Rosalind; Walunas, Theresa; Luo, Yuan.

Deng Y; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Pacheco JA; Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Ghosh A; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Chung A; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Mao C; Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Smith JC; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Zhao J; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.
Wei WQ; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.
Barnado A; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.
Dorn C; Department of Medicine, Vanderbilt University Medical Center, Nashville, USA.
Weng C; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA.
Liu C; Department of Biomedical Informatics, Columbia University, New York City, USA.
Cordon A; Department of Biomedical Informatics, Columbia University, New York City, USA.
Yu J; Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Tedla Y; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Kho A; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Ramsey-Goldman R; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Walunas T; Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Luo Y; Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA. t-walunas@northwestern.edu.

BMC Med Inform Decis Mak ; 22(Suppl 2): 348, 2024 Mar 03.

Article en En | MEDLINE | ID: mdl-38433189

ABSTRACT

ABSTRACT

BACKGROUND:

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW).

METHODS:

We developed five algorithms a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC).

RESULTS:

Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm.

CONCLUSION:

Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Asunto(s)

Lupus Eritematoso Sistémico; Nefritis Lúpica; Humanos; Nefritis Lúpica/diagnóstico; Registros Electrónicos de Salud; Procesamiento de Lenguaje Natural; Fenotipo; Enfermedades Raras

Palabras clave

Computational phenotyping; Electronic health records; Lupus nephritis; Natural language processing

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Nefritis Lúpica / Lupus Eritematoso Sistémico Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google