Gross failure rates and failure modes for a commercial AI-based auto-segmentation algorithm in head and neck cancer patients.

Temple, Simon W P; Rowbottom, Carl G

Temple, Simon W P; Rowbottom, Carl G.

Afiliação

Temple SWP; Medical Physics Department, The Clatterbridge Cancer Centre NHS Foundation Trust, Liverpool, UK.
Rowbottom CG; Medical Physics Department, The Clatterbridge Cancer Centre NHS Foundation Trust, Liverpool, UK.

J Appl Clin Med Phys ; 25(6): e14273, 2024 Jun.

Article em En | MEDLINE | ID: mdl-38263866

ABSTRACT

ABSTRACT

PURPOSE:

Artificial intelligence (AI) based commercial software can be used to automatically delineate organs at risk (OAR), with potential for efficiency savings in the radiotherapy treatment planning pathway, and reduction of inter- and intra-observer variability. There has been little research investigating gross failure rates and failure modes of such systems.

METHOD:

50 head and neck (H&N) patient data sets with "gold standard" contours were compared to AI-generated contours to produce expected mean and standard deviation values for the Dice Similarity Coefficient (DSC), for four common H&N OARs (brainstem, mandible, left and right parotid). An AI-based commercial system was applied to 500 H&N patients. AI-generated contours were compared to manual contours, outlined by an expert human, and a gross failure was set at three standard deviations below the expected mean DSC. Failures were inspected to assess reason for failure of the AI-based system with failures relating to suboptimal manual contouring censored. True failures were classified into 4 sub-types (setup position, anatomy, image artefacts and unknown).

RESULTS:

There were 24 true failures of the AI-based commercial software, a gross failure rate of 1.2%. Fifteen failures were due to patient anatomy, four were due to dental image artefacts, three were due to patient position and two were unknown. True failure rates by OAR were 0.4% (brainstem), 2.2% (mandible), 1.4% (left parotid) and 0.8% (right parotid).

CONCLUSION:

True failures of the AI-based system were predominantly associated with a non-standard element within the CT scan. It is likely that these non-standard elements were the reason for the gross failure, and suggests that patient datasets used to train the AI model did not contain sufficient heterogeneity of data. Regardless of the reasons for failure, the true failure rate for the AI-based system in the H&N region for the OARs investigated was low (â¼1%).

Assuntos

Algoritmos; Inteligência Artificial; Neoplasias de Cabeça e Pescoço; Órgãos em Risco; Dosagem Radioterapêutica; Planejamento da Radioterapia Assistida por Computador; Radioterapia de Intensidade Modulada; Humanos; Neoplasias de Cabeça e Pescoço/radioterapia; Neoplasias de Cabeça e Pescoço/diagnóstico por imagem; Planejamento da Radioterapia Assistida por Computador/métodos; Órgãos em Risco/efeitos da radiação; Radioterapia de Intensidade Modulada/métodos; Software; Processamento de Imagem Assistida por Computador/métodos; Tomografia Computadorizada por Raios X/métodos

Palavras-chave

autosegmentation; deep learning; failure modes

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Dosagem Radioterapêutica / Algoritmos / Planejamento da Radioterapia Assistida por Computador / Inteligência Artificial / Radioterapia de Intensidade Modulada / Órgãos em Risco / Neoplasias de Cabeça e Pescoço Tipo de estudo: Guideline / Prognostic_studies Limite: Humans Idioma: En Revista: J Appl Clin Med Phys Assunto da revista: BIOFISICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google