Your browser doesn't support javascript.
loading
An investigation into the risk of population bias in deep learning autocontouring.
McQuinlan, Yasmin; Brouwer, Charlotte L; Lin, Zhixiong; Gan, Yong; Sung Kim, Jin; van Elmpt, Wouter; Gooding, Mark J.
Affiliation
  • McQuinlan Y; Mirada Medical Ltd, Oxford, United Kingdom. Electronic address: yasmin.mcquinlan@mirada-medical.com.
  • Brouwer CL; University of Groningen, University Medical Center Groningen, Department of Radiation Oncology, Groningen, The Netherlands. Electronic address: c.l.brouwer@umcg.nl.
  • Lin Z; Shantou University Medical Centre, Guangdong, China. Electronic address: zxlin5@qq.com.
  • Gan Y; Shantou University Medical Centre, Guangdong, China. Electronic address: y.gan@umcg.nl.
  • Sung Kim J; Yonsei University Health System, Seoul, Republic of Korea. Electronic address: jinsung@yuhs.ac.
  • van Elmpt W; Department of Radiation Oncology (MAASTRO), GROW - School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands. Electronic address: wouter.vanelmpt@maastro.nl.
  • Gooding MJ; Mirada Medical Ltd, Oxford, United Kingdom; Inpictura Ltd, Oxford, United Kingdom. Electronic address: mark.gooding@inpicturamedica.com.
Radiother Oncol ; 186: 109747, 2023 09.
Article in En | MEDLINE | ID: mdl-37330053
ABSTRACT
BACKGROUND AND

PURPOSE:

To date, data used in the development of Deep Learning-based automatic contouring (DLC) algorithms have been largely sourced from single geographic populations. This study aimed to evaluate the risk of population-based bias by determining whether the performance of an autocontouring system is impacted by geographic population. MATERIALS AND

METHODS:

80 Head Neck CT deidentified scans were collected from four clinics in Europe (n = 2) and Asia (n = 2). A single observer manually delineated 16 organs-at-risk in each. Subsequently, the data was contoured using a DLC solution, and trained using single institution (European) data. Autocontours were compared to manual delineations using quantitative measures. A Kruskal-Wallis test was used to test for any difference between populations. Clinical acceptability of automatic and manual contours to observers from each participating institution was assessed using a blinded subjective evaluation.

RESULTS:

Seven organs showed a significant difference in volume between groups. Four organs showed statistical differences in quantitative similarity measures. The qualitative test showed greater variation in acceptance of contouring between observers than between data from different origins, with greater acceptance by the South Korean observers.

CONCLUSION:

Much of the statistical difference in quantitative performance could be explained by the difference in organ volume impacting the contour similarity measures and the small sample size. However, the qualitative assessment suggests that observer perception bias has a greater impact on the apparent clinical acceptability than quantitatively observed differences. This investigation of potential geographic bias should extend to more patients, populations, and anatomical regions in the future.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Deep Learning Type of study: Etiology_studies / Guideline / Qualitative_research / Risk_factors_studies Limits: Humans Country/Region as subject: Europa Language: En Journal: Radiother Oncol Year: 2023 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Deep Learning Type of study: Etiology_studies / Guideline / Qualitative_research / Risk_factors_studies Limits: Humans Country/Region as subject: Europa Language: En Journal: Radiother Oncol Year: 2023 Document type: Article