Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.

Horng, Hannah; Steinkamp, Jackson; Kahn, Charles E; Cook, Tessa S

Horng, Hannah; Steinkamp, Jackson; Kahn, Charles E; Cook, Tessa S.

Afiliação

Horng H; Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
Steinkamp J; Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA.
Kahn CE; Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA. ckahn@upenn.edu.
Cook TS; Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA. ckahn@upenn.edu.

J Digit Imaging ; 35(6): 1694-1698, 2022 12.

Article em En | MEDLINE | ID: mdl-35715655

ABSTRACT

ABSTRACT

Natural language processing (NLP) techniques for electronic health records have shown great potential to improve the quality of medical care. The text of radiology reports frequently constitutes a large fraction of EHR data, and can provide valuable information about patients' diagnoses, medical history, and imaging findings. The lack of a major public repository for radiological reports severely limits the development, testing, and application of new NLP tools. De-identification of protected health information (PHI) presents a major challenge to building such repositories, as many automated tools for de-identification were trained or designed for clinical notes and do not perform sufficiently well to build a public database of radiology reports. We developed and evaluated six ensemble models based on three publically available de-identification tools MIT de-id, NeuroNER, and Philter. A set of 1023 reports was set aside as the testing partition. Two individuals with medical training annotated the test set for PHI; differences were resolved by consensus. Ensemble methods included simple voting schemes (1-Vote, 2-Votes, and 3-Votes), a decision tree, a naïve Bayesian classifier, and Adaboost boosting. The 1-Vote ensemble achieved recall of 998 / 1043 (95.7%); the 3-Votes ensemble had precision of 1035 / 1043 (99.2%). F1 scores were 93.4% for the decision tree, 71.2% for the naïve Bayesian classifier, and 87.5% for the boosting method. Basic voting algorithms and machine learning classifiers incorporating the predictions of multiple tools can outperform each tool acting alone in de-identifying radiology reports. Ensemble methods hold substantial potential to improve automated de-identification tools for radiology reports to make such reports more available for research use to improve patient care and outcomes.

Assuntos

Processamento de Linguagem Natural; Radiologia; Humanos; Teorema de Bayes; Registros Eletrônicos de Saúde; Aprendizado de Máquina

Palavras-chave

De-identification; Ensemble models; Machine learning; Natural language processing; Protected health information (PHI); Reporting

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Radiologia / Processamento de Linguagem Natural Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: J Digit Imaging Assunto da revista: DIAGNOSTICO POR IMAGEM / INFORMATICA MEDICA / RADIOLOGIA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google