Your browser doesn't support javascript.
loading
An evaluation of existing text de-identification tools for use with patient progress notes from Australian general practice.
El-Hayek, Carol; Barzegar, Siamak; Faux, Noel; Doyle, Kim; Pillai, Priyanka; Mutch, Simon J; Vaisey, Alaina; Ward, Roger; Sanci, Lena; Dunn, Adam G; Hellard, Margaret E; Hocking, Jane S; Verspoor, Karin; Boyle, Douglas Ir.
Afiliação
  • El-Hayek C; Burnet Institute, Melbourne, Australia; Melbourne School of Population and Global Health, University of Melbourne, Australia; School of Public Health and Preventive Medicine, Monash University, Australia. Electronic address: carol.el-hayek@burnet.edu.au.
  • Barzegar S; School of Computing and Information Systems, University of Melbourne, Australia.
  • Faux N; Melbourne Data Analytics Platform, University of Melbourne, Australia; Florey Institute of Neuroscience and Mental Health, University of Melbourne, Australia.
  • Doyle K; Melbourne Data Analytics Platform, University of Melbourne, Australia.
  • Pillai P; Melbourne Data Analytics Platform, University of Melbourne, Australia; The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
  • Mutch SJ; Melbourne Data Analytics Platform, University of Melbourne, Australia.
  • Vaisey A; Melbourne School of Population and Global Health, University of Melbourne, Australia.
  • Ward R; Department of General Practice and Primary Care, University of Melbourne, Australia.
  • Sanci L; Department of General Practice and Primary Care, University of Melbourne, Australia.
  • Dunn AG; School of Medical Sciences, University of Sydney, Australia.
  • Hellard ME; Burnet Institute, Melbourne, Australia; Melbourne School of Population and Global Health, University of Melbourne, Australia; School of Public Health and Preventive Medicine, Monash University, Australia; The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia.
  • Hocking JS; Melbourne School of Population and Global Health, University of Melbourne, Australia.
  • Verspoor K; School of Computing and Information Systems, University of Melbourne, Australia; School of Computing Technologies, RMIT University, Melbourne, Australia.
  • Boyle DI; Department of General Practice and Primary Care, University of Melbourne, Australia.
Int J Med Inform ; 173: 105021, 2023 05.
Article em En | MEDLINE | ID: mdl-36870249
ABSTRACT

INTRODUCTION:

Digitized patient progress notes from general practice represent a significant resource for clinical and public health research but cannot feasibly and ethically be used for these purposes without automated de-identification. Internationally, several open-source natural language processing tools have been developed, however, given wide variations in clinical documentation practices, these cannot be utilized without appropriate review. We evaluated the performance of four de-identification tools and assessed their suitability for customization to Australian general practice progress notes.

METHODS:

Four tools were selected three rule-based (HMS Scrubber, MIT De-id, Philter) and one machine learning (MIST). 300 patient progress notes from three general practice clinics were manually annotated with personally identifying information. We conducted a pairwise comparison between the manual annotations and patient identifiers automatically detected by each tool, measuring recall (sensitivity), precision (positive predictive value), f1-score (harmonic mean of precision and recall), and f2-score (weighs recall 2x higher than precision). Error analysis was also conducted to better understand each tool's structure and performance.

RESULTS:

Manual annotation detected 701 identifiers in seven categories. The rule-based tools detected identifiers in six categories and MIST in three. Philter achieved the highest aggregate recall (67%) and the highest recall for NAME (87%). HMS Scrubber achieved the highest recall for DATE (94%) and all tools performed poorly on LOCATION. MIST achieved the highest precision for NAME and DATE while also achieving similar recall to the rule-based tools for DATE and highest recall for LOCATION. Philter had the lowest aggregate precision (37%), however preliminary adjustments of its rules and dictionaries showed a substantial reduction in false positives.

CONCLUSION:

Existing off-the-shelf solutions for automated de-identification of clinical text are not immediately suitable for our context without modification. Philter is the most promising candidate due to its high recall and flexibility however will require extensive revising of its pattern matching rules and dictionaries.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Registros Eletrônicos de Saúde / Medicina Geral Tipo de estudo: Diagnostic_studies / Evaluation_studies / Guideline / Prognostic_studies Limite: Humans País como assunto: Oceania Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Registros Eletrônicos de Saúde / Medicina Geral Tipo de estudo: Diagnostic_studies / Evaluation_studies / Guideline / Prognostic_studies Limite: Humans País como assunto: Oceania Idioma: En Ano de publicação: 2023 Tipo de documento: Article