Your browser doesn't support javascript.
loading
Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline.
Tavabi, Nazgol; Pruneski, James; Golchin, Shahriar; Singh, Mallika; Sanborn, Ryan; Heyworth, Benton; Landschaft, Assaf; Kimia, Amir; Kiapour, Ata.
Affiliation
  • Tavabi N; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA. Electronic address: Nazgol.Tavabi@childrens.harvard.edu.
  • Pruneski J; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
  • Golchin S; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA.
  • Singh M; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA.
  • Sanborn R; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA.
  • Heyworth B; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
  • Landschaft A; Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA.
  • Kimia A; Harvard Medical School, Boston, MA, USA; Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA.
  • Kiapour A; Department of Orthopaedic Surgery and Sports Medicine, Boston Children's Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA. Electronic address: Ata.Kiapour@childrens.harvard.edu.
Artif Intell Med ; 151: 102847, 2024 May.
Article in En | MEDLINE | ID: mdl-38658131
ABSTRACT
Building clinical registries is an important step in clinical research and improvement of patient care quality. Natural Language Processing (NLP) methods have shown promising results in extracting valuable information from unstructured clinical notes. However, the structure and nature of clinical notes are very different from regular text that state-of-the-art NLP models are trained and tested on, and they have their own set of challenges. In this study, we propose Sentence Extractor with Keywords (SE-K), an efficient and interpretable classification approach for extracting information from clinical notes and show that it outperforms more computationally expensive methods in text classification. Following the Institutional Review Board (IRB) approval, we used SE-K and two embedding based NLP approaches (Sentence Extractor with Embeddings (SE-E) and Bidirectional Encoder Representations from Transformers (BERT)) to develop comprehensive registry of anterior cruciate ligament surgeries from 20 years of unstructured clinical data at a multi-site tertiary-care regional children's hospital. The low-resource approach (SE-K) had better performance (average AUROC of 0.94 ± 0.04) than the embedding-based approaches (SE-E 0.93 ± 0.04 and BERT 0.87 ± 0.09) for out of sample validation, in addition to minimum performance drop between test and out-of-sample validation. Moreover, the SE-K approach was at least six times faster (on CPU) than SE-E (on CPU) and BERT (on GPU) and provides interpretability. Our proposed approach, SE-K, can be effectively used to extract relevant variables from clinic notes to build large-scale registries, with consistently better performance compared to the more resource-intensive approaches (e.g., BERT). Such approaches can facilitate information extraction from unstructured notes for registry building, quality improvement and adverse event monitoring.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing / Registries Limits: Humans Language: En Journal: Artif Intell Med Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Natural Language Processing / Registries Limits: Humans Language: En Journal: Artif Intell Med Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article