RESUMO
Radiation therapy (RT) is a crucial treatment for head and neck squamous cell carcinoma (HNSCC); however, it can have adverse effects on patients' long-term function and quality of life. Biomarkers that can predict tumor response to RT are being explored to personalize treatment and improve outcomes. While tissue and blood biomarkers have limitations, imaging biomarkers derived from magnetic resonance imaging (MRI) offer detailed information. The integration of MRI and a linear accelerator in the MR-Linac system allows for MR-guided radiation therapy (MRgRT), offering precise visualization and treatment delivery. This data descriptor offers a valuable repository for weekly intra-treatment diffusion-weighted imaging (DWI) data obtained from head and neck cancer patients. By analyzing the sequential DWI changes and their correlation with treatment response, as well as oncological and survival outcomes, the study provides valuable insights into the clinical implications of DWI in HNSCC.
Assuntos
Imagem de Difusão por Ressonância Magnética , Neoplasias de Cabeça e Pescoço , Humanos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Neoplasias de Cabeça e Pescoço/radioterapia , Radioterapia Guiada por Imagem , Carcinoma de Células Escamosas de Cabeça e Pescoço/diagnóstico por imagem , Carcinoma de Células Escamosas de Cabeça e Pescoço/radioterapia , Aceleradores de PartículasRESUMO
BACKGROUND: Radiotherapy is a core treatment modality for oropharyngeal cancer (OPC), where the primary gross tumor volume (GTVp) is manually segmented with high interobserver variability. This calls for reliable and trustworthy automated tools in clinician workflow. Therefore, accurate uncertainty quantification and its downstream utilization is critical. METHODS: Here we propose uncertainty-aware deep learning for OPC GTVp segmentation, and illustrate the utility of uncertainty in multiple applications. We examine two Bayesian deep learning (BDL) models and eight uncertainty measures, and utilize a large multi-institute dataset of 292 PET/CT scans to systematically analyze our approach. RESULTS: We show that our uncertainty-based approach accurately predicts the quality of the deep learning segmentation in 86.6% of cases, identifies low performance cases for semi-automated correction, and visualizes regions of the scans where the segmentations likely fail. CONCLUSIONS: Our BDL-based analysis provides a first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.
Radiotherapy is used as a treatment for people with oropharyngeal cancer. It is important to distinguish the areas where cancer is present so the radiotherapy treatment can be targeted at the cancer. Computational methods based on artificial intelligence can automate this task but need to be able to distinguish areas where it is unclear whether cancer is present. In this study we compare these computational methods that are able to highlight areas where it is unclear whether or not cancer is present. Our approach accurately predicts how well these areas are distinguished by the models. Our results could be applied to improve the computational methods used during radiotherapy treatment. This could enable more targeted treatment to be used in the future, which could result in better outcomes for people with oropharyngeal cancer.
RESUMO
BACKGROUND/PURPOSE: The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS: We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS: We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION: Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
RESUMO
Background/purpose: The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. Methods: We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. Results: We identified 56 articles published from 2015-2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32%). Imaging data was used in 91% of studies, while only 13% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo dropout being the most commonly implemented UQ method (32%) followed by ensembling (16%). 55% of studies did not share code or datasets. Conclusion: Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, there was a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
RESUMO
BACKGROUND/OBJECTIVES: Pain is a challenging multifaceted symptom reported by most cancer patients. This systematic review aims to explore applications of artificial intelligence/machine learning (AI/ML) in predicting pain-related outcomes and pain management in cancer. METHODS: A comprehensive search of Ovid MEDLINE, EMBASE and Web of Science databases was conducted using terms: "Cancer," "Pain," "Pain Management," "Analgesics," "Artificial Intelligence," "Machine Learning," and "Neural Networks" published up to September 7, 2023. AI/ML models, their validation and performance were summarized. Quality assessment was conducted using PROBAST risk-of-bias andadherence to TRIPOD guidelines. RESULTS: Forty four studies from 2006 to 2023 were included. Nineteen studies used AI/ML for classifying pain after cancer therapy [median AUC 0.80 (range 0.76-0.94)]. Eighteen studies focused on cancer pain research [median AUC 0.86 (range 0.50-0.99)], and 7 focused on applying AI/ML for cancer pain management, [median AUC 0.71 (range 0.47-0.89)]. Median AUC (0.77) of models across all studies. Random forest models demonstrated the highest performance (median AUC 0.81), lasso models had the highest median sensitivity (1), while Support Vector Machine had the highest median specificity (0.74). Overall adherence to TRIPOD guidelines was 70.7%. Overall, high risk-of-bias (77.3%), lack of external validation (14%) and clinical application (23%) was detected. Reporting of model calibration was also missing (5%). CONCLUSION: Implementation of AI/ML tools promises significant advances in the classification, risk stratification, and management decisions for cancer pain. Further research focusing on quality improvement, model calibration, rigorous external clinical validation in real healthcare settings is imperative for ensuring its practical and reliable application in clinical practice.
RESUMO
OBJECTIVES: Lung metastases in adenoid cystic carcinoma (ACC) usually have indolent growth and the optimal timing to start systemic therapy is not established. We assessed ACC lung metastasis tumor growth dynamics and compared the prognostic value of time to progression (TTP) and tumor volume doubling time (TVDT). METHODS: The study included ACC patients with ≥1 pulmonary metastasis (≥5 mm) and at least 2 chest computed tomography scans. Radiology assessment was performed from the first scan showing metastasis until treatment initiation or death. Up to 5 lung nodules per patient were segmented for TVDT calculation. To assess tumor growth rate (TGR), the correlation coefficient (r) and coefficient of determination (R2) were calculated for measured lung nodules. TTP was assessed per RECIST 1.1; TVDT was calculated using the Schwartz formula. Overall survival was analyzed using the Kaplan-Meier method. RESULTS: The study included 75 patients. Sixty-seven patients (89%) had lung-only metastasis on first CT scan. The TGR was overall constant (median R2 = 0.974). Median TTP and TVDT were 11.2 months and 7.5 months. Shorter TVDT (<6 months) was associated with poor overall survival (HR = 0.48; p = 0.037), but TTP was not associated with survival (HR = 1.02; p = 0.96). Cox regression showed that TVDT but not TTP significantly correlated with OS. TVDT calculated using estimated tumor volume correlated with TVDT obtained by segmentation. CONCLUSION: Most ACC lung metastases have a constant TGR. TVDT may be a better prognostic indicator than TTP in lung-metastatic ACC. TVDT can be estimated by single longitudinal measurement in clinical practice.
Assuntos
Carcinoma Adenoide Cístico , Neoplasias Pulmonares , Humanos , Prognóstico , Carcinoma Adenoide Cístico/patologia , Carga Tumoral , Fatores de Tempo , Neoplasias Pulmonares/diagnóstico por imagem , Pulmão/patologia , Estudos RetrospectivosRESUMO
Background: Acute pain is a common and debilitating symptom experienced by oral cavity and oropharyngeal cancer (OC/OPC) patients undergoing radiation therapy (RT). Uncontrolled pain can result in opioid overuse and increased risks of long-term opioid dependence. The specific aim of this exploratory analysis was the prediction of severe acute pain and opioid use in the acute on-treatment setting, to develop risk-stratification models for pragmatic clinical trials. Materials and Methods: A retrospective study was conducted on 900 OC/OPC patients treated with RT during 2017 to 2023. Clinical data including demographics, tumor data, pain scores and medication data were extracted from patient records. On-treatment pain intensity scores were assessed using a numeric rating scale (0-none, 10-worst) and total opioid doses were calculated using morphine equivalent daily dose (MEDD) conversion factors. Analgesics efficacy was assessed based on the combined pain intensity and the total required MEDD. ML models, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Model (GBM) were developed and validated using ten-fold cross-validation. Performance of models were evaluated using discrimination and calibration metrics. Feature importance was investigated using bootstrap and permutation techniques. Results: For predicting acute pain intensity, the GBM demonstrated superior area under the receiver operating curve (AUC) (0.71), recall (0.39), and F1 score (0.48). For predicting the total MEDD, LR outperformed other models in the AUC (0.67). For predicting the analgesics efficacy, SVM achieved the highest specificity (0.97), and best calibration (ECE of 0.06), while RF and GBM achieved the same highest AUC, 0.68. RF model emerged as the best calibrated model with ECE of 0.02 for pain intensity prediction and 0.05 for MEDD prediction. Baseline pain scores and vital signs demonstrated the most contributed features for the different predictive models. Conclusion: These ML models are promising in predicting end-of-treatment acute pain and opioid requirements and analgesics efficacy in OC/OPC patients undergoing RT. Baseline pain score, vital sign changes were identified as crucial predictors. Implementation of these models in clinical practice could facilitate early risk stratification and personalized pain management. Prospective multicentric studies and external validation are essential for further refinement and generalizability.
RESUMO
BACKGROUND: In order to accurately accumulate delivered dose for head and neck cancer patients treated with the Adapt to Position workflow on the 1.5T magnetic resonance imaging (MRI)-linear accelerator (MR-linac), the low-resolution T2-weighted MRIs used for daily setup must be segmented to enable reconstruction of the delivered dose at each fraction. PURPOSE: In this pilot study, we evaluate various autosegmentation methods for head and neck organs at risk (OARs) on on-board setup MRIs from the MR-linac for off-line reconstruction of delivered dose. METHODS: Seven OARs (parotid glands, submandibular glands, mandible, spinal cord, and brainstem) were contoured on 43 images by seven observers each. Ground truth contours were generated using a simultaneous truth and performance level estimation (STAPLE) algorithm. Twenty total autosegmentation methods were evaluated in ADMIRE: 1-9) atlas-based autosegmentation using a population atlas library (PAL) of 5/10/15 patients with STAPLE, patch fusion (PF), random forest (RF) for label fusion; 10-19) autosegmentation using images from a patient's 1-4 prior fractions (individualized patient prior [IPP]) using STAPLE/PF/RF; 20) deep learning (DL) (3D ResUNet trained on 43 ground truth structure sets plus 45 contoured by one observer). Execution time was measured for each method. Autosegmented structures were compared to ground truth structures using the Dice similarity coefficient, mean surface distance (MSD), Hausdorff distance (HD), and Jaccard index (JI). For each metric and OAR, performance was compared to the inter-observer variability using Dunn's test with control. Methods were compared pairwise using the Steel-Dwass test for each metric pooled across all OARs. Further dosimetric analysis was performed on three high-performing autosegmentation methods (DL, IPP with RF and 4 fractions [IPP_RF_4], IPP with 1 fraction [IPP_1]), and one low-performing (PAL with STAPLE and 5 atlases [PAL_ST_5]). For five patients, delivered doses from clinical plans were recalculated on setup images with ground truth and autosegmented structure sets. Differences in maximum and mean dose to each structure between the ground truth and autosegmented structures were calculated and correlated with geometric metrics. RESULTS: DL and IPP methods performed best overall, all significantly outperforming inter-observer variability and with no significant difference between methods in pairwise comparison. PAL methods performed worst overall; most were not significantly different from the inter-observer variability or from each other. DL was the fastest method (33 s per case) and PAL methods the slowest (3.7-13.8 min per case). Execution time increased with a number of prior fractions/atlases for IPP and PAL. For DL, IPP_1, and IPP_RF_4, the majority (95%) of dose differences were within ± 250 cGy from ground truth, but outlier differences up to 785 cGy occurred. Dose differences were much higher for PAL_ST_5, with outlier differences up to 1920 cGy. Dose differences showed weak but significant correlations with all geometric metrics (R2 between 0.030 and 0.314). CONCLUSIONS: The autosegmentation methods offering the best combination of performance and execution time are DL and IPP_1. Dose reconstruction on on-board T2-weighted MRIs is feasible with autosegmented structures with minimal dosimetric variation from ground truth, but contours should be visually inspected prior to dose reconstruction in an end-to-end dose accumulation workflow.
Assuntos
Neoplasias de Cabeça e Pescoço , Planejamento da Radioterapia Assistida por Computador , Humanos , Projetos Piloto , Fluxo de Trabalho , Planejamento da Radioterapia Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Neoplasias de Cabeça e Pescoço/radioterapia , Imageamento por Ressonância Magnética/métodos , Órgãos em RiscoRESUMO
Background and Purpose: Prior work on adaptive organ-at-risk (OAR)-sparing radiation therapy has typically reported outcomes based on fixed-number or fixed-interval re-plannings, which represent a one-size-fits-all approach and do not account for the variable progression of individual patients' toxicities. The purpose of this study was to determine the personalized optimal timing for re-planning in adaptive OAR-sparing radiation therapy, considering limited re-planning resources, specifically for patients with head and neck cancer (HNC). Methods and Materials: A novel Markov decision process (MDP) model was developed to determine optimal timing of re-plannings based on the patient's expected toxicity, characterized by normal tissue complication probability (NTCP), for four toxicities. The MDP parameters were derived from a dataset comprising 52 HNC patients treated at the University of Texas MD Anderson Cancer Center between 2007 and 2013. Optimal replanning strategies were obtained when the permissible number of re-plannings throughout the treatment was limited to 1, 2, and 3. Results: The MDP (optimal) solution recommended re-planning when the difference between planned and actual NTCPs (ΔNTCP) was greater than or equal to 1%, 2%, 2%, and 4% at treatment fractions 10, 15, 20, and 25, respectively, exhibiting a temporally increasing pattern. The ΔNTCP thresholds remained constant across the number of re-planning allowances (1, 2, and 3). Conclusion: The MDP model determines the optimal timing for implementing patient-specific adaptive re-planning. This approach incorporates ΔNTCP thresholds and considers varying total re-plannings. The methods are versatile and applicable across cancer types, institutional settings, and different OARs and NTCP models.
RESUMO
PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.
Assuntos
Teorema de Bayes , Benchmarking , Radio-Oncologistas , Humanos , Benchmarking/métodos , Feminino , Planejamento da Radioterapia Assistida por Computador/métodos , Neoplasias/epidemiologia , Neoplasias/radioterapia , Órgãos em Risco , Masculino , Radioterapia (Especialidade)/normas , Radioterapia (Especialidade)/métodos , Demografia , Variações Dependentes do ObservadorRESUMO
Background: Demand for head and neck cancer (HNC) radiotherapy data in algorithmic development has prompted increased image dataset sharing. Medical images must comply with data protection requirements so that re-use is enabled without disclosing patient identifiers. Defacing, i.e., the removal of facial features from images, is often considered a reasonable compromise between data protection and re-usability for neuroimaging data. While defacing tools have been developed by the neuroimaging community, their acceptability for radiotherapy applications have not been explored. Therefore, this study systematically investigated the impact of available defacing algorithms on HNC organs at risk (OARs). Methods: A publicly available dataset of magnetic resonance imaging scans for 55 HNC patients with eight segmented OARs (bilateral submandibular glands, parotid glands, level II neck lymph nodes, level III neck lymph nodes) was utilized. Eight publicly available defacing algorithms were investigated: afni_refacer, DeepDefacer, defacer, fsl_deface, mask_face, mri_deface, pydeface, and quickshear. Using a subset of scans where defacing succeeded (N=29), a 5-fold cross-validation 3D U-net based OAR auto-segmentation model was utilized to perform two main experiments: 1.) comparing original and defaced data for training when evaluated on original data; 2.) using original data for training and comparing the model evaluation on original and defaced data. Models were primarily assessed using the Dice similarity coefficient (DSC). Results: Most defacing methods were unable to produce any usable images for evaluation, while mask_face, fsl_deface, and pydeface were unable to remove the face for 29%, 18%, and 24% of subjects, respectively. When using the original data for evaluation, the composite OAR DSC was statistically higher (p ≤ 0.05) for the model trained with the original data with a DSC of 0.760 compared to the mask_face, fsl_deface, and pydeface models with DSCs of 0.742, 0.736, and 0.449, respectively. Moreover, the model trained with original data had decreased performance (p ≤ 0.05) when evaluated on the defaced data with DSCs of 0.673, 0.693, and 0.406 for mask_face, fsl_deface, and pydeface, respectively. Conclusion: Defacing algorithms may have a significant impact on HNC OAR auto-segmentation model training and testing. This work highlights the need for further development of HNC-specific image anonymization methods.
RESUMO
Purpose: Contouring Collaborative for Consensus in Radiation Oncology (C3RO) is a crowdsourced challenge engaging radiation oncologists across various expertise levels in segmentation. An obstacle to artificial intelligence (AI) development is the paucity of multiexpert datasets; consequently, we sought to characterize whether aggregate segmentations generated from multiple nonexperts could meet or exceed recognized expert agreement. Approach: Participants who contoured ≥ 1 region of interest (ROI) for the breast, sarcoma, head and neck (H&N), gynecologic (GYN), or gastrointestinal (GI) cases were identified as a nonexpert or recognized expert. Cohort-specific ROIs were combined into single simultaneous truth and performance level estimation (STAPLE) consensus segmentations. STAPLE nonexpert ROIs were evaluated against STAPLE expert contours using Dice similarity coefficient (DSC). The expert interobserver DSC ( IODSC expert ) was calculated as an acceptability threshold between STAPLE nonexpert and STAPLE expert . To determine the number of nonexperts required to match the IODSC expert for each ROI, a single consensus contour was generated using variable numbers of nonexperts and then compared to the IODSC expert . Results: For all cases, the DSC values for STAPLE nonexpert versus STAPLE expert were higher than comparator expert IODSC expert for most ROIs. The minimum number of nonexpert segmentations needed for a consensus ROI to achieve IODSC expert acceptability criteria ranged between 2 and 4 for breast, 3 and 5 for sarcoma, 3 and 5 for H&N, 3 and 5 for GYN, and 3 for GI. Conclusions: Multiple nonexpert-generated consensus ROIs met or exceeded expert-derived acceptability thresholds. Five nonexperts could potentially generate consensus segmentations for most ROIs with performance approximating experts, suggesting nonexpert segmentations as feasible cost-effective AI inputs.
RESUMO
PURPOSE: To determine DWI parameters associated with tumor response and oncologic outcomes in head and neck (HNC) patients treated with radiotherapy (RT). METHODS: HNC patients in a prospective study were included. Patients had MRIs pre-, mid-, and post-RT completion. We used T2-weighted sequences for tumor segmentation which were co-registered to respective DWIs for extraction of apparent diffusion coefficient (ADC) measurements. Treatment response was assessed at mid- and post-RT and was defined as: complete response (CR) vs. non-complete response (non-CR). The Mann-Whitney U test was used to compare ADC between CR and non-CR. Recursive partitioning analysis (RPA) was performed to identify ADC threshold associated with relapse. Cox proportional hazards models were done for clinical vs. clinical and imaging parameters and internal validation was done using bootstrapping technique. RESULTS: Eighty-one patients were included. Median follow-up was 31 months. For patients with post-RT CR, there was a significant increase in mean ADC at mid-RT compared to baseline ((1.8 ± 0.29) × 10-3 mm2/s vs. (1.37 ± 0.22) × 10-3 mm2/s, p < 0.0001), while patients with non-CR had no significant increase (p > 0.05). RPA identified GTV-P delta (Δ)ADCmean < 7% at mid-RT as the most significant parameter associated with worse LC and RFS (p = 0.01). Uni- and multi-variable analysis showed that GTV-P ΔADCmean at mid-RT ≥ 7% was significantly associated with better LC and RFS. The addition of ΔADCmean significantly improved the c-indices of LC and RFS models compared with standard clinical variables (0.85 vs. 0.77 and 0.74 vs. 0.68 for LC and RFS, respectively, p < 0.0001 for both). CONCLUSION: ΔADCmean at mid-RT is a strong predictor of oncologic outcomes in HNC. Patients with no significant increase of primary tumor ADC at mid-RT are at high risk of disease relapse.
Assuntos
Neoplasias de Cabeça e Pescoço , Recidiva Local de Neoplasia , Humanos , Estudos Prospectivos , Recidiva Local de Neoplasia/diagnóstico por imagem , Imagem de Difusão por Ressonância Magnética/métodos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Neoplasias de Cabeça e Pescoço/radioterapia , Imageamento por Ressonância Magnética , BiomarcadoresRESUMO
Background: Oropharyngeal cancer (OPC) is a widespread disease, with radiotherapy being a core treatment modality. Manual segmentation of the primary gross tumor volume (GTVp) is currently employed for OPC radiotherapy planning, but is subject to significant interobserver variability. Deep learning (DL) approaches have shown promise in automating GTVp segmentation, but comparative (auto)confidence metrics of these models predictions has not been well-explored. Quantifying instance-specific DL model uncertainty is crucial to improving clinician trust and facilitating broad clinical implementation. Therefore, in this study, probabilistic DL models for GTVp auto-segmentation were developed using large-scale PET/CT datasets, and various uncertainty auto-estimation methods were systematically investigated and benchmarked. Methods: We utilized the publicly available 2021 HECKTOR Challenge training dataset with 224 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations as a development set. A separate set of 67 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations was used for external validation. Two approximate Bayesian deep learning methods, the MC Dropout Ensemble and Deep Ensemble, both with five submodels, were evaluated for GTVp segmentation and uncertainty performance. The segmentation performance was evaluated using the volumetric Dice similarity coefficient (DSC), mean surface distance (MSD), and Hausdorff distance at 95% (95HD). The uncertainty was evaluated using four measures from literature: coefficient of variation (CV), structure expected entropy, structure predictive entropy, and structure mutual information, and additionally with our novel Dice-risk measure. The utility of uncertainty information was evaluated with the accuracy of uncertainty-based segmentation performance prediction using the Accuracy vs Uncertainty (AvU) metric, and by examining the linear correlation between uncertainty estimates and DSC. In addition, batch-based and instance-based referral processes were examined, where the patients with high uncertainty were rejected from the set. In the batch referral process, the area under the referral curve with DSC (R-DSC AUC) was used for evaluation, whereas in the instance referral process, the DSC at various uncertainty thresholds were examined. Results: Both models behaved similarly in terms of the segmentation performance and uncertainty estimation. Specifically, the MC Dropout Ensemble had 0.776 DSC, 1.703 mm MSD, and 5.385 mm 95HD. The Deep Ensemble had 0.767 DSC, 1.717 mm MSD, and 5.477 mm 95HD. The uncertainty measure with the highest DSC correlation was structure predictive entropy with correlation coefficients of 0.699 and 0.692 for the MC Dropout Ensemble and the Deep Ensemble, respectively. The highest AvU value was 0.866 for both models. The best performing uncertainty measure for both models was the CV which had R-DSC AUC of 0.783 and 0.782 for the MC Dropout Ensemble and Deep Ensemble, respectively. With referring patients based on uncertainty thresholds from 0.85 validation DSC for all uncertainty measures, on average the DSC improved from the full dataset by 4.7% and 5.0% while referring 21.8% and 22% patients for MC Dropout Ensemble and Deep Ensemble, respectively. Conclusion: We found that many of the investigated methods provide overall similar but distinct utility in terms of predicting segmentation quality and referral performance. These findings are a critical first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.
RESUMO
Radiation therapy (RT) is a crucial treatment for head and neck squamous cell carcinoma (HNSCC), however it can have adverse effects on patients' long-term function and quality of life. Biomarkers that can predict tumor response to RT are being explored to personalize treatment and improve outcomes. While tissue and blood biomarkers have limitations, imaging biomarkers derived from magnetic resonance imaging (MRI) offer detailed information. The integration of MRI and a linear accelerator in the MR-Linac system allows for MR-guided radiation therapy (MRgRT), offering precise visualization and treatment delivery. This data descriptor offers a valuable repository for weekly intra-treatment diffusion-weighted imaging (DWI) data obtained from head and neck cancer patients. By analyzing the sequential DWI changes and their correlation with treatment response, as well as oncological and survival outcomes, the study provides valuable insights into the clinical implications of DWI in HNSCC. [Table: see text].
RESUMO
Background/objective: Pain is a challenging multifaceted symptom reported by most cancer patients, resulting in a substantial burden on both patients and healthcare systems. This systematic review aims to explore applications of artificial intelligence/machine learning (AI/ML) in predicting pain-related outcomes and supporting decision-making processes in pain management in cancer. Methods: A comprehensive search of Ovid MEDLINE, EMBASE and Web of Science databases was conducted using terms including "Cancer", "Pain", "Pain Management", "Analgesics", "Opioids", "Artificial Intelligence", "Machine Learning", "Deep Learning", and "Neural Networks" published up to September 7, 2023. The screening process was performed using the Covidence screening tool. Only original studies conducted in human cohorts were included. AI/ML models, their validation and performance and adherence to TRIPOD guidelines were summarized from the final included studies. Results: This systematic review included 44 studies from 2006-2023. Most studies were prospective and uni-institutional. There was an increase in the trend of AI/ML studies in cancer pain in the last 4 years. Nineteen studies used AI/ML for classifying cancer patients' pain development after cancer therapy, with median AUC 0.80 (range 0.76-0.94). Eighteen studies focused on cancer pain research with median AUC 0.86 (range 0.50-0.99), and 7 focused on applying AI/ML for cancer pain management decisions with median AUC 0.71 (range 0.47-0.89). Multiple ML models were investigated with. median AUC across all models in all studies (0.77). Random forest models demonstrated the highest performance (median AUC 0.81), lasso models had the highest median sensitivity (1), while Support Vector Machine had the highest median specificity (0.74). Overall adherence of included studies to TRIPOD guidelines was 70.7%. Lack of external validation (14%) and clinical application (23%) of most included studies was detected. Reporting of model calibration was also missing in the majority of studies (5%). Conclusion: Implementation of various novel AI/ML tools promises significant advances in the classification, risk stratification, and management decisions for cancer pain. These advanced tools will integrate big health-related data for personalized pain management in cancer patients. Further research focusing on model calibration and rigorous external clinical validation in real healthcare settings is imperative for ensuring its practical and reliable application in clinical practice.
RESUMO
BACKGROUND/PURPOSE: Adequate image registration of anatomical and functional magnetic resonance imaging (MRI) scans is necessary for MR-guided head and neck cancer (HNC) adaptive radiotherapy planning. Despite the quantitative capabilities of diffusion-weighted imaging (DWI) MRI for treatment plan adaptation, geometric distortion remains a considerable limitation. Therefore, we systematically investigated various deformable image registration (DIR) methods to co-register DWI and T2-weighted (T2W) images. MATERIALS/METHODS: We compared three commercial (ADMIRE, Velocity, Raystation) and three open-source (Elastix with default settings [Elastix Default], Elastix with parameter set 23 [Elastix 23], Demons) post-acquisition DIR methods applied to T2W and DWI MRI images acquired during the same imaging session in twenty immobilized HNC patients. In addition, we used the non-registered images (None) as a control comparator. Ground-truth segmentations of radiotherapy structures (tumour and organs at risk) were generated by a physician expert on both image sequences. For each registration approach, structures were propagated from T2W to DWI images. These propagated structures were then compared with ground-truth DWI structures using the Dice similarity coefficient and mean surface distance. RESULTS: 19 left submandibular glands, 18 right submandibular glands, 20 left parotid glands, 20 right parotid glands, 20 spinal cords, and 12 tumours were delineated. Most DIR methods took <30 s to execute per case, with the exception of Elastix 23 which took â¼458 s to execute per case. ADMIRE and Elastix 23 demonstrated improved performance over None for all metrics and structures (Bonferroni-corrected p < 0.05), while the other methods did not. Moreover, ADMIRE and Elastix 23 significantly improved performance in individual and pooled analysis compared to all other methods. CONCLUSIONS: The ADMIRE DIR method offers improved geometric performance with reasonable execution time so should be favoured for registering T2W and DWI images acquired during the same scan session in HNC patients. These results are important to ensure the appropriate selection of registration strategies for MR-guided radiotherapy.
Assuntos
Neoplasias de Cabeça e Pescoço , Planejamento da Radioterapia Assistida por Computador , Humanos , Planejamento da Radioterapia Assistida por Computador/métodos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Neoplasias de Cabeça e Pescoço/radioterapia , Imageamento por Ressonância Magnética/métodos , Imagem de Difusão por Ressonância Magnética , Dosagem Radioterapêutica , Processamento de Imagem Assistida por Computador/métodos , AlgoritmosRESUMO
Clinician generated segmentation of tumor and healthy tissue regions of interest (ROIs) on medical images is crucial for radiotherapy. However, interobserver segmentation variability has long been considered a significant detriment to the implementation of high-quality and consistent radiotherapy dose delivery. This has prompted the increasing development of automated segmentation approaches. However, extant segmentation datasets typically only provide segmentations generated by a limited number of annotators with varying, and often unspecified, levels of expertise. In this data descriptor, numerous clinician annotators manually generated segmentations for ROIs on computed tomography images across a variety of cancer sites (breast, sarcoma, head and neck, gynecologic, gastrointestinal; one patient per cancer site) for the Contouring Collaborative for Consensus in Radiation Oncology challenge. In total, over 200 annotators (experts and non-experts) contributed using a standardized annotation platform (ProKnow). Subsequently, we converted Digital Imaging and Communications in Medicine data into Neuroimaging Informatics Technology Initiative format with standardized nomenclature for ease of use. In addition, we generated consensus segmentations for experts and non-experts using the Simultaneous Truth and Performance Level Estimation method. These standardized, structured, and easily accessible data are a valuable resource for systematically studying variability in segmentation applications.
Assuntos
Crowdsourcing , Neoplasias , Radioterapia (Especialidade) , Humanos , Feminino , Neoplasias/diagnóstico por imagem , Neoplasias/radioterapia , Tomografia Computadorizada por Raios X , Planejamento da Radioterapia Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodosRESUMO
BACKGROUND: Medical image auto-segmentation is poised to revolutionize radiotherapy workflows. The quality of auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of these clinician-derived segmentations have yet to be fully understood or quantified. Therefore, the purpose of this study was to determine the role of common observer demographic variables on quantitative segmentation performance. METHODS: Organ at risk (OAR) and tumor volume segmentations provided by radiation oncologist observers from the Contouring Collaborative for Consensus in Radiation Oncology public dataset were utilized for this study. Segmentations were derived from five separate disease sites comprised of one patient case each: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and gastrointestinal (GI). Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus gold standard primarily using the Dice Similarity Coefficient (DSC); surface DSC was investigated as a secondary metric. Metrics were stratified into binary groups based on previously established structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Markov chain Monte Carlo Bayesian estimation were used to investigate the association between demographic variables and the binarized segmentation quality for each disease site separately. Variables with a highest density interval excluding zero - loosely analogous to frequentist significance - were considered to substantially impact the outcome measure. RESULTS: After filtering by practicing radiation oncologists, 574, 110, 452, 112, and 48 structure observations remained for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of observations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumor volumes, respectively. Bayesian regression analysis revealed tumor category had a substantial negative impact on binarized DSC for the breast (coefficient mean ± standard deviation: -0.97 ± 0.20), sarcoma (-1.04 ± 0.54), H&N (-1.00 ± 0.24), and GI (-2.95 ± 0.98) cases. There were no clear recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations and wide highest density intervals. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality. Future studies should investigate additional demographic variables, more patients and imaging modalities, and alternative metrics of segmentation acceptability.
RESUMO
Purpose: Sarcopenia is an established prognostic factor in patients diagnosed with head and neck squamous cell carcinoma (HNSCC). The quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical neck skeletal muscle (SM) segmentation and cross-sectional area. However, manual SM segmentation is labor-intensive, prone to inter-observer variability, and impractical for large-scale clinical use. To overcome this challenge, we have developed and externally validated a fully-automated image-based deep learning (DL) platform for cervical vertebral SM segmentation and SMI calculation, and evaluated the relevance of this with survival and toxicity outcomes. Materials and Methods: 899 patients diagnosed as having HNSCC with CT scans from multiple institutes were included, with 335 cases utilized for training, 96 for validation, 48 for internal testing and 393 for external testing. Ground truth single-slice segmentations of SM at the C3 vertebra level were manually generated by experienced radiation oncologists. To develop an efficient method of segmenting the SM, a multi-stage DL pipeline was implemented, consisting of a 2D convolutional neural network (CNN) to select the middle slice of C3 section and a 2D U-Net to segment SM areas. The model performance was evaluated using the Dice Similarity Coefficient (DSC) as the primary metric for the internal test set, and for the external test set the quality of automated segmentation was assessed manually by two experienced radiation oncologists. The L3 skeletal muscle area (SMA) and SMI were then calculated from the C3 cross sectional area (CSA) of the auto-segmented SM. Finally, established SMI cut-offs were used to perform further analyses to assess the correlation with survival and toxicity endpoints in the external institution with univariable and multivariable Cox regression. Results: DSCs for validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI: 0.90 - 0.91) and 0.90 (95% CI: 0.89 - 0.91), respectively. The predicted CSA is highly correlated with the ground-truth CSA in both validation (r = 0.99, p < 0.0001) and test sets (r = 0.96, p < 0.0001). In the external test set (n = 377), 96.2% of the SM segmentations were deemed acceptable by consensus expert review. Predicted SMA and SMI values were highly correlated with the ground-truth values, with Pearson r ß 0.99 (p < 0.0001) for both the female and male patients in all datasets. Sarcopenia was associated with worse OS (HR 2.05 [95% CI 1.04 - 4.04], p = 0.04) and longer PEG tube duration (median 162 days vs. 134 days, HR 1.51 [95% CI 1.12 - 2.08], p = 0.006 in multivariate analysis. Conclusion: We developed and externally validated a fully-automated platform that strongly correlates with imaging-assessed sarcopenia in patients with H&N cancer that correlates with survival and toxicity outcomes. This study constitutes a significant stride towards the integration of sarcopenia assessment into decision-making for individuals diagnosed with HNSCC. SUMMARY STATEMENT: In this study, we developed and externally validated a deep learning model to investigate the impact of sarcopenia, defined as the loss of skeletal muscle mass, on patients with head and neck squamous cell carcinoma (HNSCC) undergoing radiotherapy. We demonstrated an efficient, fullyautomated deep learning pipeline that can accurately segment C3 skeletal muscle area, calculate cross-sectional area, and derive a skeletal muscle index to diagnose sarcopenia from a standard of care CT scan. In multi-institutional data, we found that pre-treatment sarcopenia was associated with significantly reduced overall survival and an increased risk of adverse events. Given the increased vulnerability of patients with HNSCC, the assessment of sarcopenia prior to radiotherapy may aid in informed treatment decision-making and serve as a predictive marker for the necessity of early supportive measures.