Your browser doesn't support javascript.
loading
Improving model transferability for clinical note section classification models using continued pretraining.
Zhou, Weipeng; Yetisgen, Meliha; Afshar, Majid; Gao, Yanjun; Savova, Guergana; Miller, Timothy A.
Affiliation
  • Zhou W; Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington-Seattle, Seattle, WA, United States.
  • Yetisgen M; Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington-Seattle, Seattle, WA, United States.
  • Afshar M; Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States.
  • Gao Y; Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States.
  • Savova G; Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
  • Miller TA; Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
J Am Med Inform Assoc ; 31(1): 89-97, 2023 12 22.
Article de En | MEDLINE | ID: mdl-37725927
ABSTRACT

OBJECTIVE:

The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for 1 institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP ("Subjective," "Object," "Assessment," and "Plan") framework with improved transferability. MATERIALS AND

METHODS:

We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain-adaptive pretraining and task-adaptive pretraining. We added in-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added.

RESULTS:

We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across 3 datasets. This improvement was equivalent to adding 35 in-domain annotated samples.

DISCUSSION:

Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods.

CONCLUSION:

Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Mémorisation et recherche des informations / Établissements de santé Type d'étude: Prognostic_studies Aspects: Determinantes_sociais_saude / Equity_inequality Langue: En Journal: J Am Med Inform Assoc Sujet du journal: INFORMATICA MEDICA Année: 2023 Type de document: Article Pays d'affiliation: États-Unis d'Amérique

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Mémorisation et recherche des informations / Établissements de santé Type d'étude: Prognostic_studies Aspects: Determinantes_sociais_saude / Equity_inequality Langue: En Journal: J Am Med Inform Assoc Sujet du journal: INFORMATICA MEDICA Année: 2023 Type de document: Article Pays d'affiliation: États-Unis d'Amérique