Clinical validation of deep learning algorithms for radiotherapy targeting of non-small-cell lung cancer: an observational study.

Hosny, Ahmed; Bitterman, Danielle S; Guthier, Christian V; Qian, Jack M; Roberts, Hannah; Perni, Subha; Saraf, Anurag; Peng, Luke C; Pashtan, Itai; Ye, Zezhong; Kann, Benjamin H; Kozono, David E; Christiani, David; Catalano, Paul J; Aerts, Hugo J W L; Mak, Raymond H

Hosny, Ahmed; Bitterman, Danielle S; Guthier, Christian V; Qian, Jack M; Roberts, Hannah; Perni, Subha; Saraf, Anurag; Peng, Luke C; Pashtan, Itai; Ye, Zezhong; Kann, Benjamin H; Kozono, David E; Christiani, David; Catalano, Paul J; Aerts, Hugo J W L; Mak, Raymond H.

Afiliación

Hosny A; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Bitterman DS; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Computational Health Informatics Program, Boston Child
Guthier CV; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Qian JM; Harvard Radiation Oncology Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Mass General Brigham, Boston, MA.
Roberts H; Harvard Radiation Oncology Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Mass General Brigham, Boston, MA.
Perni S; Harvard Radiation Oncology Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Mass General Brigham, Boston, MA.
Saraf A; Harvard Radiation Oncology Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Mass General Brigham, Boston, MA.
Peng LC; Harvard Radiation Oncology Program, Brigham and Women's Hospital, Dana-Farber Cancer Institute, Mass General Brigham, Boston, MA.
Pashtan I; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Ye Z; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Kann BH; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Kozono DE; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
Christiani D; Harvard T H Chan School of Public Health, Massachusetts General Hospital and Harvard Medical School, Baltimore, MD, USA.
Catalano PJ; Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Aerts HJWL; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA; Radiology and Nuclear Medicine, CARIM & GROW, Maas
Mak RH; Artificial Intelligence in Medicine Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA; Department of Radiation Oncology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. Electronic address: rmak@partners.org.

Lancet Digit Health ; 4(9): e657-e666, 2022 09.

Article en En | MEDLINE | ID: mdl-36028289

ABSTRACT

ABSTRACT

BACKGROUND:

Artificial intelligence (AI) and deep learning have shown great potential in streamlining clinical tasks. However, most studies remain confined to in silico validation in small internal cohorts, without external validation or data on real-world clinical utility. We developed a strategy for the clinical validation of deep learning models for segmenting primary non-small-cell lung cancer (NSCLC) tumours and involved lymph nodes in CT images, which is a time-intensive step in radiation treatment planning, with large variability among experts.

METHODS:

In this observational study, CT images and segmentations were collected from eight internal and external sources from the USA, the Netherlands, Canada, and China, with patients from the Maastro and Harvard-RT1 datasets used for model discovery (segmented by a single expert). Validation consisted of interobserver and intraobserver benchmarking, primary validation, functional validation, and end-user testing on the following datasets multi-delineation, Harvard-RT1, Harvard-RT2, RTOG-0617, NSCLC-radiogenomics, Lung-PET-CT-Dx, RIDER, and thorax phantom. Primary validation consisted of stepwise testing on increasingly external datasets using measures of overlap including volumetric dice (VD) and surface dice (SD). Functional validation explored dosimetric effect, model failure modes, test-retest stability, and accuracy. End-user testing with eight experts assessed automated segmentations in a simulated clinical setting.

FINDINGS:

We included 2208 patients imaged between 2001 and 2015, with 787 patients used for model discovery and 1421 for model validation, including 28 patients for end-user testing. Models showed an improvement over the interobserver benchmark (multi-delineation dataset; VD 0·91 [IQR 0·83-0·92], p=0·0062; SD 0·86 [0·71-0·91], p=0·0005), and were within the intraobserver benchmark. For primary validation, AI performance on internal Harvard-RT1 data (segmented by the same expert who segmented the discovery data) was VD 0·83 (IQR 0·76-0·88) and SD 0·79 (0·68-0·88), within the interobserver benchmark. Performance on internal Harvard-RT2 data segmented by other experts was VD 0·70 (0·56-0·80) and SD 0·50 (0·34-0·71). Performance on RTOG-0617 clinical trial data was VD 0·71 (0·60-0·81) and SD 0·47 (0·35-0·59), with similar results on diagnostic radiology datasets NSCLC-radiogenomics and Lung-PET-CT-Dx. Despite these geometric overlap results, models yielded target volumes with equivalent radiation dose coverage to those of experts. We also found non-significant differences between de novo expert and AI-assisted segmentations. AI assistance led to a 65% reduction in segmentation time (5·4 min; p<0·0001) and a 32% reduction in interobserver variability (SD; p=0·013).

INTERPRETATION:

We present a clinical validation strategy for AI models. We found that in silico geometric segmentation metrics might not correlate with clinical utility of the models. Experts' segmentation style and preference might affect model performance.

FUNDING:

US National Institutes of Health and EU European Research Council.

Asunto(s)

Carcinoma de Pulmón de Células no Pequeñas; Aprendizaje Profundo; Neoplasias Pulmonares; Algoritmos; Inteligencia Artificial; Humanos; Tomografía Computarizada por Tomografía de Emisión de Positrones; Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Carcinoma de Pulmón de Células no Pequeñas / Aprendizaje Profundo / Neoplasias Pulmonares Tipo de estudio: Observational_studies / Prognostic_studies Límite: Humans País/Región como asunto: America do norte Idioma: En Revista: Lancet Digit Health Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google