Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment.

Talebi, Salmonn; Tong, Elizabeth; Li, Anna; Yamin, Ghiam; Zaharchuk, Greg; Mofrad, Mohammad R K

Talebi, Salmonn; Tong, Elizabeth; Li, Anna; Yamin, Ghiam; Zaharchuk, Greg; Mofrad, Mohammad R K.

Affiliation

Talebi S; University of California, 208A Stanley Hall #1762, Berkeley, CA, 94720-1762, USA.
Tong E; Stanford University, Stanford, CA, USA.
Li A; Stanford University, Stanford, CA, USA.
Yamin G; Stanford University, Stanford, CA, USA.
Zaharchuk G; Stanford University, Stanford, CA, USA.
Mofrad MRK; University of California, 208A Stanley Hall #1762, Berkeley, CA, 94720-1762, USA. mofrad@berkeley.edu.

BMC Med Inform Decis Mak ; 24(1): 40, 2024 Feb 07.

Article in En | MEDLINE | ID: mdl-38326769

ABSTRACT

ABSTRACT

BACKGROUND:

Deep learning has demonstrated significant advancements across various domains. However, its implementation in specialized areas, such as medical settings, remains approached with caution. In these high-stake environments, understanding the model's decision-making process is critical. This study assesses the performance of different pretrained Bidirectional Encoder Representations from Transformers (BERT) models and delves into understanding its decision-making within the context of medical image protocol assignment.

METHODS:

Four different pre-trained BERT models (BERT, BioBERT, ClinicalBERT, RoBERTa) were fine-tuned for the medical image protocol classification task. Word importance was measured by attributing the classification output to every word using a gradient-based method. Subsequently, a trained radiologist reviewed the resulting word importance scores to assess the model's decision-making process relative to human reasoning.

RESULTS:

The BERT model came close to human performance on our test set. The BERT model successfully identified relevant words indicative of the target protocol. Analysis of important words in misclassifications revealed potential systematic errors in the model.

CONCLUSIONS:

The BERT model shows promise in medical image protocol assignment by reaching near human level performance and identifying key words effectively. The detection of systematic errors paves the way for further refinements to enhance its safety and utility in clinical settings.

Subject(s)

Natural Language Processing; Problem Solving; Humans

Key words

BERT; Explanations; Healthcare; Interpretability; Machine learning

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Problem Solving / Natural Language Processing Type of study: Guideline / Prognostic_studies Limits: Humans Language: En Journal: BMC Med Inform Decis Mak Journal subject: INFORMATICA MEDICA Year: 2024 Document type: Article Affiliation country: United States Country of publication: United kingdom

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google