RESUMO
The aim of this study was to validate a previously developed deep learning model in 5 independent clinical trials. The predictive performance of this model was compared with the international prognostic index (IPI) and 2 models incorporating radiomic PET/CT features (clinical PET and PET models). Methods: In total, 1,132 diffuse large B-cell lymphoma patients were included: 296 for training and 836 for external validation. The primary outcome was 2-y time to progression. The deep learning model was trained on maximum-intensity projections from PET/CT scans. The clinical PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, SUVpeak, age, and performance status. The PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, and SUVpeak Model performance was assessed using the area under the curve (AUC) and Kaplan-Meier curves. Results: The IPI yielded an AUC of 0.60 on all external data. The deep learning model yielded a significantly higher AUC of 0.66 (P < 0.01). For each individual clinical trial, the model was consistently better than IPI. Radiomic model AUCs remained higher for all clinical trials. The deep learning and clinical PET models showed equivalent performance (AUC, 0.69; P > 0.05). The PET model yielded the highest AUC of all models (AUC, 0.71; P < 0.05). Conclusion: The deep learning model predicted outcome in all trials with a higher performance than IPI and better survival curve separation. This model can predict treatment outcome in diffuse large B-cell lymphoma without tumor delineation but at the cost of a lower prognostic performance than with radiomics.
Assuntos
Linfoma Difuso de Grandes Células B , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Humanos , Linfoma Difuso de Grandes Células B/diagnóstico por imagem , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Prognóstico , Inteligência Artificial , AdultoRESUMO
BACKGROUND: Convolutional neural networks (CNNs), applied to baseline [18F]-FDG PET/CT maximum intensity projections (MIPs), show potential for treatment outcome prediction in diffuse large B-cell lymphoma (DLBCL). The aim of this study is to investigate the robustness of CNN predictions to different image reconstruction protocols. Baseline [18F]FDG PET/CT scans were collected from 20 DLBCL patients. EARL1, EARL2 and high-resolution (HR) protocols were applied per scan, generating three images with different image qualities. Image-based transformation was applied by blurring EARL2 and HR images to generate EARL1 compliant images using a Gaussian filter of 5 and 7 mm, respectively. MIPs were generated for each of the reconstructions, before and after image transformation. An in-house developed CNN predicted the probability of tumor progression within 2 years for each MIP. The difference in probabilities per patient was then calculated between both EARL2 and HR with respect to EARL1 (delta probabilities or ΔP). We compared these to the probabilities obtained after aligning the data with ComBat using the difference in median and interquartile range (IQR). RESULTS: CNN probabilities were found to be sensitive to different reconstruction protocols (EARL2 ΔP: median = 0.09, interquartile range (IQR) = [0.06, 0.10] and HR ΔP: median = 0.1, IQR = [0.08, 0.16]). Moreover, higher resolution images (EARL2 and HR) led to higher probability values. After image-based and ComBat transformation, an improved agreement of CNN probabilities among reconstructions was found for all patients. This agreement was slightly better after image-based transformation (transformed EARL2 ΔP: median = 0.022, IQR = [0.01, 0.02] and transformed HR ΔP: median = 0.029, IQR = [0.01, 0.03]). CONCLUSION: Our CNN-based outcome predictions are affected by the applied reconstruction protocols, yet in a predictable manner. Image-based harmonization is a suitable approach to harmonize CNN predictions across image reconstruction protocols.
RESUMO
Convolutional neural networks (CNNs) may improve response prediction in diffuse large B-cell lymphoma (DLBCL). The aim of this study was to investigate the feasibility of a CNN using maximum intensity projection (MIP) images from 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) baseline scans to predict the probability of time-to-progression (TTP) within 2 years and compare it with the International Prognostic Index (IPI), i.e. a clinically used score. 296 DLBCL 18F-FDG PET/CT baseline scans collected from a prospective clinical trial (HOVON-84) were analysed. Cross-validation was performed using coronal and sagittal MIPs. An external dataset (340 DLBCL patients) was used to validate the model. Association between the probabilities, metabolic tumour volume and Dmaxbulk was assessed. Probabilities for PET scans with synthetically removed tumors were also assessed. The CNN provided a 2-year TTP prediction with an area under the curve (AUC) of 0.74, outperforming the IPI-based model (AUC = 0.68). Furthermore, high probabilities (> 0.6) of the original MIPs were considerably decreased after removing the tumours (< 0.4, generally). These findings suggest that MIP-based CNNs are able to predict treatment outcome in DLBCL.
Assuntos
Fluordesoxiglucose F18 , Linfoma Difuso de Grandes Células B , Humanos , Inteligência Artificial , Linfoma Difuso de Grandes Células B/diagnóstico por imagem , Linfoma Difuso de Grandes Células B/tratamento farmacológico , Linfoma Difuso de Grandes Células B/metabolismo , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Tomografia por Emissão de Pósitrons , Prognóstico , Estudos Prospectivos , Estudos Retrospectivos , Tomografia Computadorizada por Raios X , Resultado do Tratamento , Ensaios Clínicos como AssuntoRESUMO
INTRODUCTION: Although visual and quantitative assessments of [18F]FDG PET/CT studies typically rely on liver uptake value as a reference or normalisation factor, consensus or consistency in measuring [18F]FDG uptake is lacking. Therefore, we evaluate the variation of several liver standardised uptake value (SUV) measurements in lymphoma [18F]FDG PET/CT studies using different uptake metrics. METHODS: PET/CT scans from 34 lymphoma patients were used to calculate SUVmaxliver, SUVpeakliver and SUVmeanliver as a function of (1) volume-of-interest (VOI) size, (2) location, (3) imaging time point and (4) as a function of total metabolic tumour volume (MTV). The impact of reconstruction protocol on liver uptake is studied on 15 baseline lymphoma patient scans. The effect of noise on liver SUV was assessed using full and 25% count images of 15 lymphoma scans. RESULTS: Generally, SUVmaxliver and SUVpeakliver were 38% and 16% higher compared to SUVmeanliver. SUVmaxliver and SUVpeakliver increased up to 31% and 15% with VOI size while SUVmeanliver remained unchanged with the lowest variability for the largest VOI size. Liver uptake metrics were not affected by VOI location. Compared to baseline, liver uptake metrics were 15-18% and 9-18% higher at interim and EoT PET, respectively. SUVliver decreased with larger total MTVs. SUVmaxliver and SUVpeakliver were affected by reconstruction protocol up to 62%. SUVmax and SUVpeak moved 22% and 11% upward between full and 25% count images. CONCLUSION: SUVmeanliver was most robust against VOI size, location, reconstruction protocol and image noise level, and is thus the most reproducible metric for liver uptake. The commonly recommended 3 cm diameter spherical VOI-based SUVmeanliver values were only slightly more variable than those seen with larger VOI sizes and are sufficient for SUVmeanliver measurements in future studies. TRIAL REGISTRATION: EudraCT: 2006-005,174-42, 01-08-2008.
Assuntos
Fluordesoxiglucose F18 , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Humanos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Compostos Radiofarmacêuticos , Reprodutibilidade dos Testes , Fígado/diagnóstico por imagemRESUMO
BACKGROUND: [18F]FDG PET-based metabolic tumor volume (MTV) is a promising prognostic marker for lymphoma patients. The aim of this study is to assess the sensitivity of several MTV segmentation methods to variations in image reconstruction methods and the ability of ComBat to improve MTV reproducibility. METHODS: Fifty-six lesions were segmented from baseline [18F]FDG PET scans of 19 lymphoma patients. For each scan, EARL1 and EARL2 standards and locally clinically preferred reconstruction protocols were applied. Lesions were delineated using 9 semiautomatic segmentation methods: fixed threshold based on standardized uptake value (SUV), (SUV = 4, SUV = 2.5), relative threshold (41% of SUVmax [41M], 50% of SUVpeak [A50P]), majority vote-based methods that select voxels detected by at least 2 (MV2) and 3 (MV3) out of the latter 4 methods, Nestle thresholding, and methods that identify the optimal method based on SUVmax (L2A, L2B). MTVs from EARL2 and locally clinically preferred reconstructions were compared to those from EARL1. Finally, different versions of ComBat were explored to harmonize the data. RESULTS: MTVs from the SUV4.0 method were least sensitive to the use of different reconstructions (MTV ratio: median = 1.01, interquartile range = [0.96-1.10]). After ComBat harmonization, an improved agreement of MTVs among different reconstructions was found for most segmentation methods. The regular implementation of ComBat ('Regular ComBat') using non-transformed distributions resulted in less accurate and precise MTV alignments than a version using log-transformed datasets ('Log-transformed ComBat'). CONCLUSION: MTV depends on both segmentation method and reconstruction methods. ComBat reduces reconstruction dependent MTV variability, especially when log-transformation is used to account for the non-normal distribution of MTVs.
RESUMO
Window studies are gaining traction to assess (molecular) changes in short timeframes. Decreased tumor cell positivity for the proliferation marker Ki67 is often used as a proxy for treatment response. Immunohistochemistry (IHC)-based Ki67 on tissue from neo-adjuvant trials was previously reported to be predictive for long-term response to endocrine therapy for breast cancer in postmenopausal women, but none of these trials enrolled premenopausal women. Nonetheless, the marker is being used on this subpopulation. We compared pathologist assessed IHC-based Ki67 in samples from pre- and postmenopausal women in a neo-adjuvant, endocrine therapy focused trial (NCT00738777), randomized between tamoxifen, anastrozole, or fulvestrant. These results were compared with (1) IHC-based Ki67 scoring by AI, (2) mitotic figures, (3) mRNA-based Ki67, (4) five independent gene expression signatures capturing proliferation, and (5) blood levels for tamoxifen and its metabolites as well as estradiol. Upon tamoxifen, IHC-based Ki67 levels were decreased in both pre- and postmenopausal breast cancer patients, which was confirmed using mRNA-based cell proliferation markers. The magnitude of decrease of Ki67 IHC was smaller in pre- versus postmenopausal women. We found a direct relationship between post-treatment estradiol levels and the magnitude of the Ki67 decrease in tumors. These data suggest IHC-based Ki67 may be an appropriate biomarker for tamoxifen response in premenopausal breast cancer patients, but anti-proliferative effect size depends on estradiol levels.