Search | VHL Search Portal

1.

Prognostic value of tumor volume doubling time in lung-metastatic adenoid cystic carcinoma.

Dal Lago, Eduardo A; Sousa, Luana G; Yang, Zixi; Hoff, Camilla O; Bonini, Flavia; Sawyer, Matthew; Wang, Kaiwen; Lewis, Whitney; Wahid, Kareem A; Hanna, Ehab Y; El-Naggar, Adel; Fuller, Clifton D; Kundu, Suprateek; Godoy, Myrna; Ferrarotto, Renata.

Oral Oncol ; 151: 106759, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38507991

ABSTRACT

OBJECTIVES: Lung metastases in adenoid cystic carcinoma (ACC) usually have indolent growth and the optimal timing to start systemic therapy is not established. We assessed ACC lung metastasis tumor growth dynamics and compared the prognostic value of time to progression (TTP) and tumor volume doubling time (TVDT). METHODS: The study included ACC patients with ≥1 pulmonary metastasis (≥5 mm) and at least 2 chest computed tomography scans. Radiology assessment was performed from the first scan showing metastasis until treatment initiation or death. Up to 5 lung nodules per patient were segmented for TVDT calculation. To assess tumor growth rate (TGR), the correlation coefficient (r) and coefficient of determination (R2) were calculated for measured lung nodules. TTP was assessed per RECIST 1.1; TVDT was calculated using the Schwartz formula. Overall survival was analyzed using the Kaplan-Meier method. RESULTS: The study included 75 patients. Sixty-seven patients (89%) had lung-only metastasis on first CT scan. The TGR was overall constant (median R2 = 0.974). Median TTP and TVDT were 11.2 months and 7.5 months. Shorter TVDT (<6 months) was associated with poor overall survival (HR = 0.48; p = 0.037), but TTP was not associated with survival (HR = 1.02; p = 0.96). Cox regression showed that TVDT but not TTP significantly correlated with OS. TVDT calculated using estimated tumor volume correlated with TVDT obtained by segmentation. CONCLUSION: Most ACC lung metastases have a constant TGR. TVDT may be a better prognostic indicator than TTP in lung-metastatic ACC. TVDT can be estimated by single longitudinal measurement in clinical practice.

Subject(s)

Carcinoma, Adenoid Cystic , Lung Neoplasms , Humans , Prognosis , Carcinoma, Adenoid Cystic/pathology , Tumor Burden , Time Factors , Lung Neoplasms/diagnostic imaging , Lung/pathology , Retrospective Studies

2.

Artificial Intelligence Uncertainty Quantification in Radiotherapy Applications - A Scoping Review.

Wahid, Kareem A; Kaffey, Zaphanlene Y; Farris, David P; Humbert-Vidan, Laia; Moreno, Amy C; Rasmussen, Mathis; Ren, Jintao; Naser, Mohamed A; Netherton, Tucker J; Korreman, Stine; Balakrishnan, Guha; Fuller, Clifton D; Fuentes, David; Dohopolski, Michael J.

medRxiv ; 2024 May 13.

Article in English | MEDLINE | ID: mdl-38798581

ABSTRACT

Background/purpose: The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. Methods: We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. Results: We identified 56 articles published from 2015-2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32%). Imaging data was used in 91% of studies, while only 13% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo dropout being the most commonly implemented UQ method (32%) followed by ensembling (16%). 55% of studies did not share code or datasets. Conclusion: Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, there was a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.

3.

Application of simultaneous uncertainty quantification and segmentation for oropharyngeal cancer use-case with Bayesian deep learning.

Sahlsten, Jaakko; Jaskari, Joel; Wahid, Kareem A; Ahmed, Sara; Glerean, Enrico; He, Renjie; Kann, Benjamin H; Mäkitie, Antti; Fuller, Clifton D; Naser, Mohamed A; Kaski, Kimmo.

Commun Med (Lond) ; 4(1): 110, 2024 Jun 08.

Article in English | MEDLINE | ID: mdl-38851837

ABSTRACT

BACKGROUND: Radiotherapy is a core treatment modality for oropharyngeal cancer (OPC), where the primary gross tumor volume (GTVp) is manually segmented with high interobserver variability. This calls for reliable and trustworthy automated tools in clinician workflow. Therefore, accurate uncertainty quantification and its downstream utilization is critical. METHODS: Here we propose uncertainty-aware deep learning for OPC GTVp segmentation, and illustrate the utility of uncertainty in multiple applications. We examine two Bayesian deep learning (BDL) models and eight uncertainty measures, and utilize a large multi-institute dataset of 292 PET/CT scans to systematically analyze our approach. RESULTS: We show that our uncertainty-based approach accurately predicts the quality of the deep learning segmentation in 86.6% of cases, identifies low performance cases for semi-automated correction, and visualizes regions of the scans where the segmentations likely fail. CONCLUSIONS: Our BDL-based analysis provides a first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.

Radiotherapy is used as a treatment for people with oropharyngeal cancer. It is important to distinguish the areas where cancer is present so the radiotherapy treatment can be targeted at the cancer. Computational methods based on artificial intelligence can automate this task but need to be able to distinguish areas where it is unclear whether cancer is present. In this study we compare these computational methods that are able to highlight areas where it is unclear whether or not cancer is present. Our approach accurately predicts how well these areas are distinguished by the models. Our results could be applied to improve the computational methods used during radiotherapy treatment. This could enable more targeted treatment to be used in the future, which could result in better outcomes for people with oropharyngeal cancer.

4.

Dataset of weekly intra-treatment diffusion weighted imaging in head and neck cancer patients treated with MR-Linac.

El-Habashy, Dina M; Wahid, Kareem A; He, Renjie; McDonald, Brigid; Mulder, Samuel J; Ding, Yao; Salzillo, Travis; Lai, Stephen Y; Christodouleas, John; Dresner, Alex; Wang, Jihong; Naser, Mohamed A; Fuller, Clifton D; Mohamed, Abdallah Sherif Radwan.

Sci Data ; 11(1): 487, 2024 May 11.

Article in English | MEDLINE | ID: mdl-38734679

ABSTRACT

Radiation therapy (RT) is a crucial treatment for head and neck squamous cell carcinoma (HNSCC); however, it can have adverse effects on patients' long-term function and quality of life. Biomarkers that can predict tumor response to RT are being explored to personalize treatment and improve outcomes. While tissue and blood biomarkers have limitations, imaging biomarkers derived from magnetic resonance imaging (MRI) offer detailed information. The integration of MRI and a linear accelerator in the MR-Linac system allows for MR-guided radiation therapy (MRgRT), offering precise visualization and treatment delivery. This data descriptor offers a valuable repository for weekly intra-treatment diffusion-weighted imaging (DWI) data obtained from head and neck cancer patients. By analyzing the sequential DWI changes and their correlation with treatment response, as well as oncological and survival outcomes, the study provides valuable insights into the clinical implications of DWI in HNSCC.

Subject(s)

Diffusion Magnetic Resonance Imaging , Head and Neck Neoplasms , Humans , Head and Neck Neoplasms/diagnostic imaging , Head and Neck Neoplasms/radiotherapy , Radiotherapy, Image-Guided , Squamous Cell Carcinoma of Head and Neck/diagnostic imaging , Squamous Cell Carcinoma of Head and Neck/radiotherapy , Particle Accelerators

5.

Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review.

Wahid, Kareem A; Kaffey, Zaphanlene Y; Farris, David P; Humbert-Vidan, Laia; Moreno, Amy C; Rasmussen, Mathis; Ren, Jintao; Naser, Mohamed A; Netherton, Tucker J; Korreman, Stine; Balakrishnan, Guha; Fuller, Clifton D; Fuentes, David; Dohopolski, Michael J.

Radiother Oncol ; : 110542, 2024 Sep 17.

Article in English | MEDLINE | ID: mdl-39299574

ABSTRACT

BACKGROUND/PURPOSE: The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS: We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS: We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50â¯%), followed by image-synthesis (13â¯%), and multiple applications simultaneously (11â¯%). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32â¯%). Imaging data was used in 91â¯% of studies, while only 13â¯% incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60â¯%), with Monte Carlo dropout being the most commonly implemented UQ method (32â¯%) followed by ensembling (16â¯%). 55â¯% of studies did not share code or datasets. CONCLUSION: Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.

6.

Artificial Intelligence and Machine Learning in Cancer Pain: A Systematic Review.

Salama, Vivian; Godinich, Brandon; Geng, Yimin; Humbert-Vidan, Laia; Maule, Laura; Wahid, Kareem A; Naser, Mohamed A; He, Renjie; Mohamed, Abdallah S R; Fuller, Clifton D; Moreno, Amy C.

J Pain Symptom Manage ; 2024 Aug 03.

Article in English | MEDLINE | ID: mdl-39097246

ABSTRACT

BACKGROUND/OBJECTIVES: Pain is a challenging multifaceted symptom reported by most cancer patients. This systematic review aims to explore applications of artificial intelligence/machine learning (AI/ML) in predicting pain-related outcomes and pain management in cancer. METHODS: A comprehensive search of Ovid MEDLINE, EMBASE and Web of Science databases was conducted using terms: "Cancer," "Pain," "Pain Management," "Analgesics," "Artificial Intelligence," "Machine Learning," and "Neural Networks" published up to September 7, 2023. AI/ML models, their validation and performance were summarized. Quality assessment was conducted using PROBAST risk-of-bias andadherence to TRIPOD guidelines. RESULTS: Forty four studies from 2006 to 2023 were included. Nineteen studies used AI/ML for classifying pain after cancer therapy [median AUC 0.80 (range 0.76-0.94)]. Eighteen studies focused on cancer pain research [median AUC 0.86 (range 0.50-0.99)], and 7 focused on applying AI/ML for cancer pain management, [median AUC 0.71 (range 0.47-0.89)]. Median AUC (0.77) of models across all studies. Random forest models demonstrated the highest performance (median AUC 0.81), lasso models had the highest median sensitivity (1), while Support Vector Machine had the highest median specificity (0.74). Overall adherence to TRIPOD guidelines was 70.7%. Overall, high risk-of-bias (77.3%), lack of external validation (14%) and clinical application (23%) was detected. Reporting of model calibration was also missing (5%). CONCLUSION: Implementation of AI/ML tools promises significant advances in the classification, risk stratification, and management decisions for cancer pain. Further research focusing on quality improvement, model calibration, rigorous external clinical validation in real healthcare settings is imperative for ensuring its practical and reliable application in clinical practice.

7.

Comparison of Machine Leaning Models for Prediction of Acute Pain Severity and On-Treatment Opioid Utilization in Oral Cavity and Oropharyngeal Cancer Patients Receiving Radiation Therapy: Exploratory Analysis from a Large-Scale Retrospective Cohort.

Salama, Vivian; Humbert-Vidan, Laia; Godinich, Brandon; Wahid, Kareem A; ElHabashy, Dina M; Naser, Mohamed A; He, Renjie; Mohamed, Abdallah S R; Sahli, Ariana J; Hutcheson, Katherine A; Gunn, Gary Brandon; Rosenthal, David I; Fuller, Clifton D; Moreno, Amy C.

medRxiv ; 2024 Feb 08.

Article in English | MEDLINE | ID: mdl-38370746

ABSTRACT

Background: Acute pain is a common and debilitating symptom experienced by oral cavity and oropharyngeal cancer (OC/OPC) patients undergoing radiation therapy (RT). Uncontrolled pain can result in opioid overuse and increased risks of long-term opioid dependence. The specific aim of this exploratory analysis was the prediction of severe acute pain and opioid use in the acute on-treatment setting, to develop risk-stratification models for pragmatic clinical trials. Materials and Methods: A retrospective study was conducted on 900 OC/OPC patients treated with RT during 2017 to 2023. Clinical data including demographics, tumor data, pain scores and medication data were extracted from patient records. On-treatment pain intensity scores were assessed using a numeric rating scale (0-none, 10-worst) and total opioid doses were calculated using morphine equivalent daily dose (MEDD) conversion factors. Analgesics efficacy was assessed based on the combined pain intensity and the total required MEDD. ML models, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Model (GBM) were developed and validated using ten-fold cross-validation. Performance of models were evaluated using discrimination and calibration metrics. Feature importance was investigated using bootstrap and permutation techniques. Results: For predicting acute pain intensity, the GBM demonstrated superior area under the receiver operating curve (AUC) (0.71), recall (0.39), and F1 score (0.48). For predicting the total MEDD, LR outperformed other models in the AUC (0.67). For predicting the analgesics efficacy, SVM achieved the highest specificity (0.97), and best calibration (ECE of 0.06), while RF and GBM achieved the same highest AUC, 0.68. RF model emerged as the best calibrated model with ECE of 0.02 for pain intensity prediction and 0.05 for MEDD prediction. Baseline pain scores and vital signs demonstrated the most contributed features for the different predictive models. Conclusion: These ML models are promising in predicting end-of-treatment acute pain and opioid requirements and analgesics efficacy in OC/OPC patients undergoing RT. Baseline pain score, vital sign changes were identified as crucial predictors. Implementation of these models in clinical practice could facilitate early risk stratification and personalized pain management. Prospective multicentric studies and external validation are essential for further refinement and generalizability.

8.

Investigation of autosegmentation techniques on T2-weighted MRI for off-line dose reconstruction in MR-linac workflow for head and neck cancers.

McDonald, Brigid A; Cardenas, Carlos E; O'Connell, Nicolette; Ahmed, Sara; Naser, Mohamed A; Wahid, Kareem A; Xu, Jiaofeng; Thill, Dan; Zuhour, Raed J; Mesko, Shane; Augustyn, Alexander; Buszek, Samantha M; Grant, Stephen; Chapman, Bhavana V; Bagley, Alexander F; He, Renjie; Mohamed, Abdallah S R; Christodouleas, John; Brock, Kristy K; Fuller, Clifton D.

Med Phys ; 51(1): 278-291, 2024 Jan.

Article in English | MEDLINE | ID: mdl-37475466

ABSTRACT

BACKGROUND: In order to accurately accumulate delivered dose for head and neck cancer patients treated with the Adapt to Position workflow on the 1.5T magnetic resonance imaging (MRI)-linear accelerator (MR-linac), the low-resolution T2-weighted MRIs used for daily setup must be segmented to enable reconstruction of the delivered dose at each fraction. PURPOSE: In this pilot study, we evaluate various autosegmentation methods for head and neck organs at risk (OARs) on on-board setup MRIs from the MR-linac for off-line reconstruction of delivered dose. METHODS: Seven OARs (parotid glands, submandibular glands, mandible, spinal cord, and brainstem) were contoured on 43 images by seven observers each. Ground truth contours were generated using a simultaneous truth and performance level estimation (STAPLE) algorithm. Twenty total autosegmentation methods were evaluated in ADMIRE: 1-9) atlas-based autosegmentation using a population atlas library (PAL) of 5/10/15 patients with STAPLE, patch fusion (PF), random forest (RF) for label fusion; 10-19) autosegmentation using images from a patient's 1-4 prior fractions (individualized patient prior [IPP]) using STAPLE/PF/RF; 20) deep learning (DL) (3D ResUNet trained on 43 ground truth structure sets plus 45 contoured by one observer). Execution time was measured for each method. Autosegmented structures were compared to ground truth structures using the Dice similarity coefficient, mean surface distance (MSD), Hausdorff distance (HD), and Jaccard index (JI). For each metric and OAR, performance was compared to the inter-observer variability using Dunn's test with control. Methods were compared pairwise using the Steel-Dwass test for each metric pooled across all OARs. Further dosimetric analysis was performed on three high-performing autosegmentation methods (DL, IPP with RF and 4 fractions [IPP_RF_4], IPP with 1 fraction [IPP_1]), and one low-performing (PAL with STAPLE and 5 atlases [PAL_ST_5]). For five patients, delivered doses from clinical plans were recalculated on setup images with ground truth and autosegmented structure sets. Differences in maximum and mean dose to each structure between the ground truth and autosegmented structures were calculated and correlated with geometric metrics. RESULTS: DL and IPP methods performed best overall, all significantly outperforming inter-observer variability and with no significant difference between methods in pairwise comparison. PAL methods performed worst overall; most were not significantly different from the inter-observer variability or from each other. DL was the fastest method (33 s per case) and PAL methods the slowest (3.7-13.8 min per case). Execution time increased with a number of prior fractions/atlases for IPP and PAL. For DL, IPP_1, and IPP_RF_4, the majority (95%) of dose differences were within ± 250 cGy from ground truth, but outlier differences up to 785 cGy occurred. Dose differences were much higher for PAL_ST_5, with outlier differences up to 1920 cGy. Dose differences showed weak but significant correlations with all geometric metrics (R2 between 0.030 and 0.314). CONCLUSIONS: The autosegmentation methods offering the best combination of performance and execution time are DL and IPP_1. Dose reconstruction on on-board T2-weighted MRIs is feasible with autosegmented structures with minimal dosimetric variation from ground truth, but contours should be visually inspected prior to dose reconstruction in an end-to-end dose accumulation workflow.

Subject(s)

Head and Neck Neoplasms , Radiotherapy Planning, Computer-Assisted , Humans , Pilot Projects , Workflow , Radiotherapy Planning, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Head and Neck Neoplasms/diagnostic imaging , Head and Neck Neoplasms/radiotherapy , Magnetic Resonance Imaging/methods , Organs at Risk

9.

Associations Between Radiation Oncologist Demographic Factors and Segmentation Similarity Benchmarks: Insights From a Crowd-Sourced Challenge Using Bayesian Estimation.

Wahid, Kareem A; Sahin, Onur; Kundu, Suprateek; Lin, Diana; Alanis, Anthony; Tehami, Salik; Kamel, Serageldin; Duke, Simon; Sherer, Michael V; Rasmussen, Mathis; Korreman, Stine; Fuentes, David; Cislo, Michael; Nelms, Benjamin E; Christodouleas, John P; Murphy, James D; Mohamed, Abdallah S R; He, Renjie; Naser, Mohammed A; Gillespie, Erin F; Fuller, Clifton D.

JCO Clin Cancer Inform ; 8: e2300174, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38870441

ABSTRACT

PURPOSE: The quality of radiotherapy auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of clinician-derived segmentations are poorly understood; our study aims to quantify these factors. METHODS: Organ at risk (OAR) and tumor-related segmentations provided by radiation oncologists from the Contouring Collaborative for Consensus in Radiation Oncology data set were used. Segmentations were derived from five disease sites: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and GI. Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus, which served as a reference standard benchmark. The Dice similarity coefficient (DSC) was primarily used as a metric for the comparisons. DSC was stratified into binary groups on the basis of structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Bayesian estimation were used to investigate the association between demographic variables and the binarized DSC for each disease site. Variables with a highest density interval excluding zero were considered to substantially affect the outcome measure. RESULTS: Five hundred seventy-four, 110, 452, 112, and 48 segmentations were used for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of segmentations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumors, respectively. Regression analysis revealed that the structure being tumor-related had a substantial negative impact on binarized DSC for the breast, sarcoma, H&N, and GI cases. There were no recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality relative to benchmarks.

Subject(s)

Bayes Theorem , Benchmarking , Radiation Oncologists , Humans , Benchmarking/methods , Female , Radiotherapy Planning, Computer-Assisted/methods , Neoplasms/epidemiology , Neoplasms/radiotherapy , Organs at Risk , Male , Radiation Oncology/standards , Radiation Oncology/methods , Demography , Observer Variation

10.

Mandibular dose-volume predicts time-to-osteoradionecrosis in an actuarial normal-tissue complication probability (NTCP) model: External validation of right-censored clinico-dosimetric and competing risk application across international multi-institutional observational cohorts and online graphical user interface clinical support tool assessment.

Humbert-Vidan, Laia; Kamel, Serageldin; Wentzel, Andrew; Kaffey, Zaphanlene; Abdelaal, Moamen; Spier, Kyle B; West, Natalie A; Marai, G Elisabeta; Canahuate, Guadalupe; Zhang, Xinhua; Chen, Melissa M; Wahid, Kareem A; Rigert, Jillian; Hosseinian, Seyedmohammadhossein; Schaefer, Andrew J; Brock, Kristy K; Chambers, Mark; Otun, Adegbenga O; Aponte-Wesson, Ruth; Patel, Vinod; Hope, Andrew; Phan, Jack; Garden, Adam S; Frank, Steven J; Morrison, William H; Spiotto, Michael T; Rosenthal, David; Lee, Anna; He, Renjie; Naser, Mohamed A; Watson, Erin; Hutcheson, Katherine A; Mohamed, Abdallah S R; Sandulache, Vlad C; van Dijk, Lisanne V; Moreno, Amy C; Guerrero Urbano, Teresa; Lai, Stephen Y; Fuller, Clifton D.

medRxiv ; 2024 Aug 20.

Article in English | MEDLINE | ID: mdl-39228724

ABSTRACT

Background: Existing studies on osteoradionecrosis of the jaw (ORNJ) have primarily used cross-sectional data, assessing risk factors at a single time point. Determining the time-to-event profile of ORNJ has important implications to monitor oral health in head and neck cancer (HNC) long-term survivors. Methods: Demographic, clinical and dosimetric data were retrospectively obtained for a clinical observational cohort of 1129 patients with HNC treated with radiotherapy (RT) at The University of Texas MD Anderson Cancer Center. ORNJ was diagnosed in 198 patients (18%). A multivariable logistic regression analysis with forward stepwise variable selection identified significant predictors for ORNJ. These predictors were then used to train a Weibull Accelerated Failure Time (AFT) model, which was externally validated using an independent cohort of 265 patients (92 ORNJ cases and 173 controls) treated at Guy's and St. Thomas' Hospitals. Findings: Our model identified that each unit increase in D25% is significantly associated with a 12% shorter time to ORNJ (Adjusted Time Ratio [ATR] 0·88, p<0·005); pre-RT dental extractions was associated to a 27% faster (ATR 0·73, p=0·13) onset of ORNJ; male patients experienced a 38% shorter time to ORNJ (ATR 0·62, p = 0·11). The model demonstrated strong internal calibration (integrated Brier score of 0·133, D-calibration p-value 0.998) and optimal discrimination at 72 months (Harrell's C-index of 0·72). The model also showed good generalization to the independent cohort, despite a slight drop in performance. Interpretation: This study is the first to demonstrate a direct relationship between radiation dose and the time to ORNJ onset, providing a novel characterization of the impact of delivered dose not only on the probability of a late effect (ORNJ), but the conditional risk during survivorship. Funding: This work was supported by various funding sources including NIH, NIDCR, NCI, NAPT, NASA, BCM, Affirmed Pharma, CRUK, KWF Dutch Cancer Society, NWO ZonMw, and the Apache Corporation.

11.

Segmentation stability of human head and neck cancer medical images for radiotherapy applications under de-identification conditions: Benchmarking data sharing and artificial intelligence use-cases.

Sahlsten, Jaakko; Wahid, Kareem A; Glerean, Enrico; Jaskari, Joel; Naser, Mohamed A; He, Renjie; Kann, Benjamin H; Mäkitie, Antti; Fuller, Clifton D; Kaski, Kimmo.

Front Oncol ; 13: 1120392, 2023.

Article in English | MEDLINE | ID: mdl-36925936

ABSTRACT

Background: Demand for head and neck cancer (HNC) radiotherapy data in algorithmic development has prompted increased image dataset sharing. Medical images must comply with data protection requirements so that re-use is enabled without disclosing patient identifiers. Defacing, i.e., the removal of facial features from images, is often considered a reasonable compromise between data protection and re-usability for neuroimaging data. While defacing tools have been developed by the neuroimaging community, their acceptability for radiotherapy applications have not been explored. Therefore, this study systematically investigated the impact of available defacing algorithms on HNC organs at risk (OARs). Methods: A publicly available dataset of magnetic resonance imaging scans for 55 HNC patients with eight segmented OARs (bilateral submandibular glands, parotid glands, level II neck lymph nodes, level III neck lymph nodes) was utilized. Eight publicly available defacing algorithms were investigated: afni_refacer, DeepDefacer, defacer, fsl_deface, mask_face, mri_deface, pydeface, and quickshear. Using a subset of scans where defacing succeeded (N=29), a 5-fold cross-validation 3D U-net based OAR auto-segmentation model was utilized to perform two main experiments: 1.) comparing original and defaced data for training when evaluated on original data; 2.) using original data for training and comparing the model evaluation on original and defaced data. Models were primarily assessed using the Dice similarity coefficient (DSC). Results: Most defacing methods were unable to produce any usable images for evaluation, while mask_face, fsl_deface, and pydeface were unable to remove the face for 29%, 18%, and 24% of subjects, respectively. When using the original data for evaluation, the composite OAR DSC was statistically higher (p ≤ 0.05) for the model trained with the original data with a DSC of 0.760 compared to the mask_face, fsl_deface, and pydeface models with DSCs of 0.742, 0.736, and 0.449, respectively. Moreover, the model trained with original data had decreased performance (p ≤ 0.05) when evaluated on the defaced data with DSCs of 0.673, 0.693, and 0.406 for mask_face, fsl_deface, and pydeface, respectively. Conclusion: Defacing algorithms may have a significant impact on HNC OAR auto-segmentation model training and testing. This work highlights the need for further development of HNC-specific image anonymization methods.

12.

Weekly Intra-Treatment Diffusion Weighted Imaging Dataset for Head and Neck Cancer Patients Undergoing MR-linac Treatment.

El-Habashy, Dina M; Wahid, Kareem A; Renjie, He; McDonald, Brigid; Mulder, Samuel J; Ding, Yao; Salzillo, Travis; Stephen, Lai; Christodouleas, John; Dresner, Alex; Wang, Jihong; Naser, Mohamed A; Fuller, Clifton D; Mohamed, Abdallah Sherif Radwan.

medRxiv ; 2023 Aug 20.

Article in English | MEDLINE | ID: mdl-37645931

ABSTRACT

Radiation therapy (RT) is a crucial treatment for head and neck squamous cell carcinoma (HNSCC), however it can have adverse effects on patients' long-term function and quality of life. Biomarkers that can predict tumor response to RT are being explored to personalize treatment and improve outcomes. While tissue and blood biomarkers have limitations, imaging biomarkers derived from magnetic resonance imaging (MRI) offer detailed information. The integration of MRI and a linear accelerator in the MR-Linac system allows for MR-guided radiation therapy (MRgRT), offering precise visualization and treatment delivery. This data descriptor offers a valuable repository for weekly intra-treatment diffusion-weighted imaging (DWI) data obtained from head and neck cancer patients. By analyzing the sequential DWI changes and their correlation with treatment response, as well as oncological and survival outcomes, the study provides valuable insights into the clinical implications of DWI in HNSCC. [Table: see text].

13.

E pluribus unum: prospective acceptability benchmarking from the Contouring Collaborative for Consensus in Radiation Oncology crowdsourced initiative for multiobserver segmentation.

Lin, Diana; Wahid, Kareem A; Nelms, Benjamin E; He, Renjie; Naser, Mohammed A; Duke, Simon; Sherer, Michael V; Christodouleas, John P; Mohamed, Abdallah S R; Cislo, Michael; Murphy, James D; Fuller, Clifton D; Gillespie, Erin F.

J Med Imaging (Bellingham) ; 10(Suppl 1): S11903, 2023 Feb.

Article in English | MEDLINE | ID: mdl-36761036

ABSTRACT

Purpose: Contouring Collaborative for Consensus in Radiation Oncology (C3RO) is a crowdsourced challenge engaging radiation oncologists across various expertise levels in segmentation. An obstacle to artificial intelligence (AI) development is the paucity of multiexpert datasets; consequently, we sought to characterize whether aggregate segmentations generated from multiple nonexperts could meet or exceed recognized expert agreement. Approach: Participants who contoured ≥ 1 region of interest (ROI) for the breast, sarcoma, head and neck (H&N), gynecologic (GYN), or gastrointestinal (GI) cases were identified as a nonexpert or recognized expert. Cohort-specific ROIs were combined into single simultaneous truth and performance level estimation (STAPLE) consensus segmentations. STAPLE nonexpert ROIs were evaluated against STAPLE expert contours using Dice similarity coefficient (DSC). The expert interobserver DSC ( IODSC expert ) was calculated as an acceptability threshold between STAPLE nonexpert and STAPLE expert . To determine the number of nonexperts required to match the IODSC expert for each ROI, a single consensus contour was generated using variable numbers of nonexperts and then compared to the IODSC expert . Results: For all cases, the DSC values for STAPLE nonexpert versus STAPLE expert were higher than comparator expert IODSC expert for most ROIs. The minimum number of nonexpert segmentations needed for a consensus ROI to achieve IODSC expert acceptability criteria ranged between 2 and 4 for breast, 3 and 5 for sarcoma, 3 and 5 for H&N, 3 and 5 for GYN, and 3 for GI. Conclusions: Multiple nonexpert-generated consensus ROIs met or exceeded expert-derived acceptability thresholds. Five nonexperts could potentially generate consensus segmentations for most ROIs with performance approximating experts, suggesting nonexpert segmentations as feasible cost-effective AI inputs.

14.

Prospective validation of diffusion-weighted MRI as a biomarker of tumor response and oncologic outcomes in head and neck cancer: Results from an observational biomarker pre-qualification study.

Mohamed, Abdallah S R; Abusaif, Abdelrahman; He, Renjie; Wahid, Kareem A; Salama, Vivian; Youssef, Sara; McDonald, Brigid A; Naser, Mohamed; Ding, Yao; Salzillo, Travis C; AboBakr, Moamen A; Wang, Jihong; Lai, Stephen Y; Fuller, Clifton D.

Radiother Oncol ; 183: 109641, 2023 06.

Article in English | MEDLINE | ID: mdl-36990394

ABSTRACT

PURPOSE: To determine DWI parameters associated with tumor response and oncologic outcomes in head and neck (HNC) patients treated with radiotherapy (RT). METHODS: HNC patients in a prospective study were included. Patients had MRIs pre-, mid-, and post-RT completion. We used T2-weighted sequences for tumor segmentation which were co-registered to respective DWIs for extraction of apparent diffusion coefficient (ADC) measurements. Treatment response was assessed at mid- and post-RT and was defined as: complete response (CR) vs. non-complete response (non-CR). The Mann-Whitney U test was used to compare ADC between CR and non-CR. Recursive partitioning analysis (RPA) was performed to identify ADC threshold associated with relapse. Cox proportional hazards models were done for clinical vs. clinical and imaging parameters and internal validation was done using bootstrapping technique. RESULTS: Eighty-one patients were included. Median follow-up was 31 months. For patients with post-RT CR, there was a significant increase in mean ADC at mid-RT compared to baseline ((1.8 ± 0.29) × 10-3 mm2/s vs. (1.37 ± 0.22) × 10-3 mm2/s, p < 0.0001), while patients with non-CR had no significant increase (p > 0.05). RPA identified GTV-P delta (Δ)ADCmean < 7% at mid-RT as the most significant parameter associated with worse LC and RFS (p = 0.01). Uni- and multi-variable analysis showed that GTV-P ΔADCmean at mid-RT ≥ 7% was significantly associated with better LC and RFS. The addition of ΔADCmean significantly improved the c-indices of LC and RFS models compared with standard clinical variables (0.85 vs. 0.77 and 0.74 vs. 0.68 for LC and RFS, respectively, p < 0.0001 for both). CONCLUSION: ΔADCmean at mid-RT is a strong predictor of oncologic outcomes in HNC. Patients with no significant increase of primary tumor ADC at mid-RT are at high risk of disease relapse.

Subject(s)

Head and Neck Neoplasms , Neoplasm Recurrence, Local , Humans , Prospective Studies , Neoplasm Recurrence, Local/diagnostic imaging , Diffusion Magnetic Resonance Imaging/methods , Head and Neck Neoplasms/diagnostic imaging , Head and Neck Neoplasms/radiotherapy , Magnetic Resonance Imaging , Biomarkers

15.

Application of simultaneous uncertainty quantification for image segmentation with probabilistic deep learning: Performance benchmarking of oropharyngeal cancer target delineation as a use-case.

Sahlsten, Jaakko; Jaskari, Joel; Wahid, Kareem A; Ahmed, Sara; Glerean, Enrico; He, Renjie; Kann, Benjamin H; Mäkitie, Antti; Fuller, Clifton D; Naser, Mohamed A; Kaski, Kimmo.

medRxiv ; 2023 Feb 24.

Article in English | MEDLINE | ID: mdl-36865296

ABSTRACT

Background: Oropharyngeal cancer (OPC) is a widespread disease, with radiotherapy being a core treatment modality. Manual segmentation of the primary gross tumor volume (GTVp) is currently employed for OPC radiotherapy planning, but is subject to significant interobserver variability. Deep learning (DL) approaches have shown promise in automating GTVp segmentation, but comparative (auto)confidence metrics of these models predictions has not been well-explored. Quantifying instance-specific DL model uncertainty is crucial to improving clinician trust and facilitating broad clinical implementation. Therefore, in this study, probabilistic DL models for GTVp auto-segmentation were developed using large-scale PET/CT datasets, and various uncertainty auto-estimation methods were systematically investigated and benchmarked. Methods: We utilized the publicly available 2021 HECKTOR Challenge training dataset with 224 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations as a development set. A separate set of 67 co-registered PET/CT scans of OPC patients with corresponding GTVp segmentations was used for external validation. Two approximate Bayesian deep learning methods, the MC Dropout Ensemble and Deep Ensemble, both with five submodels, were evaluated for GTVp segmentation and uncertainty performance. The segmentation performance was evaluated using the volumetric Dice similarity coefficient (DSC), mean surface distance (MSD), and Hausdorff distance at 95% (95HD). The uncertainty was evaluated using four measures from literature: coefficient of variation (CV), structure expected entropy, structure predictive entropy, and structure mutual information, and additionally with our novel Dice-risk measure. The utility of uncertainty information was evaluated with the accuracy of uncertainty-based segmentation performance prediction using the Accuracy vs Uncertainty (AvU) metric, and by examining the linear correlation between uncertainty estimates and DSC. In addition, batch-based and instance-based referral processes were examined, where the patients with high uncertainty were rejected from the set. In the batch referral process, the area under the referral curve with DSC (R-DSC AUC) was used for evaluation, whereas in the instance referral process, the DSC at various uncertainty thresholds were examined. Results: Both models behaved similarly in terms of the segmentation performance and uncertainty estimation. Specifically, the MC Dropout Ensemble had 0.776 DSC, 1.703 mm MSD, and 5.385 mm 95HD. The Deep Ensemble had 0.767 DSC, 1.717 mm MSD, and 5.477 mm 95HD. The uncertainty measure with the highest DSC correlation was structure predictive entropy with correlation coefficients of 0.699 and 0.692 for the MC Dropout Ensemble and the Deep Ensemble, respectively. The highest AvU value was 0.866 for both models. The best performing uncertainty measure for both models was the CV which had R-DSC AUC of 0.783 and 0.782 for the MC Dropout Ensemble and Deep Ensemble, respectively. With referring patients based on uncertainty thresholds from 0.85 validation DSC for all uncertainty measures, on average the DSC improved from the full dataset by 4.7% and 5.0% while referring 21.8% and 22% patients for MC Dropout Ensemble and Deep Ensemble, respectively. Conclusion: We found that many of the investigated methods provide overall similar but distinct utility in terms of predicting segmentation quality and referral performance. These findings are a critical first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation.

16.

Artificial Intelligence and Machine Learning in Cancer Related Pain: A Systematic Review.

Salama, Vivian; Godinich, Brandon; Geng, Yimin; Humbert-Vidan, Laia; Maule, Laura; Wahid, Kareem A; Naser, Mohamed A; He, Renjie; Mohamed, Abdallah S R; Fuller, Clifton D; Moreno, Amy C.

medRxiv ; 2023 Dec 08.

Article in English | MEDLINE | ID: mdl-38105979

ABSTRACT

Background/objective: Pain is a challenging multifaceted symptom reported by most cancer patients, resulting in a substantial burden on both patients and healthcare systems. This systematic review aims to explore applications of artificial intelligence/machine learning (AI/ML) in predicting pain-related outcomes and supporting decision-making processes in pain management in cancer. Methods: A comprehensive search of Ovid MEDLINE, EMBASE and Web of Science databases was conducted using terms including "Cancer", "Pain", "Pain Management", "Analgesics", "Opioids", "Artificial Intelligence", "Machine Learning", "Deep Learning", and "Neural Networks" published up to September 7, 2023. The screening process was performed using the Covidence screening tool. Only original studies conducted in human cohorts were included. AI/ML models, their validation and performance and adherence to TRIPOD guidelines were summarized from the final included studies. Results: This systematic review included 44 studies from 2006-2023. Most studies were prospective and uni-institutional. There was an increase in the trend of AI/ML studies in cancer pain in the last 4 years. Nineteen studies used AI/ML for classifying cancer patients' pain development after cancer therapy, with median AUC 0.80 (range 0.76-0.94). Eighteen studies focused on cancer pain research with median AUC 0.86 (range 0.50-0.99), and 7 focused on applying AI/ML for cancer pain management decisions with median AUC 0.71 (range 0.47-0.89). Multiple ML models were investigated with. median AUC across all models in all studies (0.77). Random forest models demonstrated the highest performance (median AUC 0.81), lasso models had the highest median sensitivity (1), while Support Vector Machine had the highest median specificity (0.74). Overall adherence of included studies to TRIPOD guidelines was 70.7%. Lack of external validation (14%) and clinical application (23%) of most included studies was detected. Reporting of model calibration was also missing in the majority of studies (5%). Conclusion: Implementation of various novel AI/ML tools promises significant advances in the classification, risk stratification, and management decisions for cancer pain. These advanced tools will integrate big health-related data for personalized pain management in cancer patients. Further research focusing on model calibration and rigorous external clinical validation in real healthcare settings is imperative for ensuring its practical and reliable application in clinical practice.

17.

Large scale crowdsourced radiotherapy segmentations across a variety of cancer anatomic sites.

Wahid, Kareem A; Lin, Diana; Sahin, Onur; Cislo, Michael; Nelms, Benjamin E; He, Renjie; Naser, Mohammed A; Duke, Simon; Sherer, Michael V; Christodouleas, John P; Mohamed, Abdallah S R; Murphy, James D; Fuller, Clifton D; Gillespie, Erin F.

Sci Data ; 10(1): 161, 2023 03 22.

Article in English | MEDLINE | ID: mdl-36949088

ABSTRACT

Clinician generated segmentation of tumor and healthy tissue regions of interest (ROIs) on medical images is crucial for radiotherapy. However, interobserver segmentation variability has long been considered a significant detriment to the implementation of high-quality and consistent radiotherapy dose delivery. This has prompted the increasing development of automated segmentation approaches. However, extant segmentation datasets typically only provide segmentations generated by a limited number of annotators with varying, and often unspecified, levels of expertise. In this data descriptor, numerous clinician annotators manually generated segmentations for ROIs on computed tomography images across a variety of cancer sites (breast, sarcoma, head and neck, gynecologic, gastrointestinal; one patient per cancer site) for the Contouring Collaborative for Consensus in Radiation Oncology challenge. In total, over 200 annotators (experts and non-experts) contributed using a standardized annotation platform (ProKnow). Subsequently, we converted Digital Imaging and Communications in Medicine data into Neuroimaging Informatics Technology Initiative format with standardized nomenclature for ease of use. In addition, we generated consensus segmentations for experts and non-experts using the Simultaneous Truth and Performance Level Estimation method. These standardized, structured, and easily accessible data are a valuable resource for systematically studying variability in segmentation applications.

Subject(s)

Crowdsourcing , Neoplasms , Radiation Oncology , Humans , Female , Neoplasms/diagnostic imaging , Neoplasms/radiotherapy , Tomography, X-Ray Computed , Radiotherapy Planning, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods

18.

Quality assurance assessment of intra-acquisition diffusion-weighted and T2-weighted magnetic resonance imaging registration and contour propagation for head and neck cancer radiotherapy.

Naser, Mohamed A; Wahid, Kareem A; Ahmed, Sara; Salama, Vivian; Dede, Cem; Edwards, Benjamin W; Lin, Ruitao; McDonald, Brigid; Salzillo, Travis C; He, Renjie; Ding, Yao; Abdelaal, Moamen Abobakr; Thill, Daniel; O'Connell, Nicolette; Willcut, Virgil; Christodouleas, John P; Lai, Stephen Y; Fuller, Clifton D; Mohamed, Abdallah S R.

Med Phys ; 50(4): 2089-2099, 2023 Apr.

Article in English | MEDLINE | ID: mdl-36519973

ABSTRACT

BACKGROUND/PURPOSE: Adequate image registration of anatomical and functional magnetic resonance imaging (MRI) scans is necessary for MR-guided head and neck cancer (HNC) adaptive radiotherapy planning. Despite the quantitative capabilities of diffusion-weighted imaging (DWI) MRI for treatment plan adaptation, geometric distortion remains a considerable limitation. Therefore, we systematically investigated various deformable image registration (DIR) methods to co-register DWI and T2-weighted (T2W) images. MATERIALS/METHODS: We compared three commercial (ADMIRE, Velocity, Raystation) and three open-source (Elastix with default settings [Elastix Default], Elastix with parameter set 23 [Elastix 23], Demons) post-acquisition DIR methods applied to T2W and DWI MRI images acquired during the same imaging session in twenty immobilized HNC patients. In addition, we used the non-registered images (None) as a control comparator. Ground-truth segmentations of radiotherapy structures (tumour and organs at risk) were generated by a physician expert on both image sequences. For each registration approach, structures were propagated from T2W to DWI images. These propagated structures were then compared with ground-truth DWI structures using the Dice similarity coefficient and mean surface distance. RESULTS: 19 left submandibular glands, 18 right submandibular glands, 20 left parotid glands, 20 right parotid glands, 20 spinal cords, and 12 tumours were delineated. Most DIR methods took <30 s to execute per case, with the exception of Elastix 23 which took â¼458 s to execute per case. ADMIRE and Elastix 23 demonstrated improved performance over None for all metrics and structures (Bonferroni-corrected p < 0.05), while the other methods did not. Moreover, ADMIRE and Elastix 23 significantly improved performance in individual and pooled analysis compared to all other methods. CONCLUSIONS: The ADMIRE DIR method offers improved geometric performance with reasonable execution time so should be favoured for registering T2W and DWI images acquired during the same scan session in HNC patients. These results are important to ensure the appropriate selection of registration strategies for MR-guided radiotherapy.

Subject(s)

Head and Neck Neoplasms , Radiotherapy Planning, Computer-Assisted , Humans , Radiotherapy Planning, Computer-Assisted/methods , Head and Neck Neoplasms/diagnostic imaging , Head and Neck Neoplasms/radiotherapy , Magnetic Resonance Imaging/methods , Diffusion Magnetic Resonance Imaging , Radiotherapy Dosage , Image Processing, Computer-Assisted/methods , Algorithms

19.

Development and Validation of an Automated Image-Based Deep Learning Platform for Sarcopenia Assessment in Head and Neck Cancer.

Ye, Zezhong; Saraf, Anurag; Ravipati, Yashwanth; Hoebers, Frank; Catalano, Paul J; Zha, Yining; Zapaishchykova, Anna; Likitlersuang, Jirapat; Guthier, Christian; Tishler, Roy B; Schoenfeld, Jonathan D; Margalit, Danielle N; Haddad, Robert I; Mak, Raymond H; Naser, Mohamed; Wahid, Kareem A; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Mäkitie, Antti A; Fuller, Clifton D; Aerts, Hugo J W L; Kann, Benjamin H.

JAMA Netw Open ; 6(8): e2328280, 2023 08 01.

Article in English | MEDLINE | ID: mdl-37561460

ABSTRACT

Importance: Sarcopenia is an established prognostic factor in patients with head and neck squamous cell carcinoma (HNSCC); the quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical skeletal muscle segmentation and cross-sectional area. However, manual muscle segmentation is labor intensive, prone to interobserver variability, and impractical for large-scale clinical use. Objective: To develop and externally validate a fully automated image-based deep learning platform for cervical vertebral muscle segmentation and SMI calculation and evaluate associations with survival and treatment toxicity outcomes. Design, Setting, and Participants: For this prognostic study, a model development data set was curated from publicly available and deidentified data from patients with HNSCC treated at MD Anderson Cancer Center between January 1, 2003, and December 31, 2013. A total of 899 patients undergoing primary radiation for HNSCC with abdominal computed tomography scans and complete clinical information were selected. An external validation data set was retrospectively collected from patients undergoing primary radiation therapy between January 1, 1996, and December 31, 2013, at Brigham and Women's Hospital. The data analysis was performed between May 1, 2022, and March 31, 2023. Exposure: C3 vertebral skeletal muscle segmentation during radiation therapy for HNSCC. Main Outcomes and Measures: Overall survival and treatment toxicity outcomes of HNSCC. Results: The total patient cohort comprised 899 patients with HNSCC (median [range] age, 58 [24-90] years; 140 female [15.6%] and 755 male [84.0%]). Dice similarity coefficients for the validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI, 0.90-0.91) and 0.90 (95% CI, 0.89-0.91), respectively, with a mean 96.2% acceptable rate between 2 reviewers on external clinical testing (n = 377). Estimated cross-sectional area and SMI values were associated with manually annotated values (Pearson r = 0.99; P < .001) across data sets. On multivariable Cox proportional hazards regression, SMI-derived sarcopenia was associated with worse overall survival (hazard ratio, 2.05; 95% CI, 1.04-4.04; P = .04) and longer feeding tube duration (median [range], 162 [6-1477] vs 134 [15-1255] days; hazard ratio, 0.66; 95% CI, 0.48-0.89; P = .006) than no sarcopenia. Conclusions and Relevance: This prognostic study's findings show external validation of a fully automated deep learning pipeline to accurately measure sarcopenia in HNSCC and an association with important disease outcomes. The pipeline could enable the integration of sarcopenia assessment into clinical decision making for individuals with HNSCC.

Subject(s)

Deep Learning , Head and Neck Neoplasms , Sarcopenia , Humans , Male , Female , Middle Aged , Squamous Cell Carcinoma of Head and Neck/diagnostic imaging , Retrospective Studies , Sarcopenia/diagnostic imaging , Sarcopenia/complications , Head and Neck Neoplasms/complications , Head and Neck Neoplasms/diagnostic imaging

20.

Determining The Role Of Radiation Oncologist Demographic Factors On Segmentation Quality: Insights From A Crowd-Sourced Challenge Using Bayesian Estimation.

Wahid, Kareem A; Sahin, Onur; Kundu, Suprateek; Lin, Diana; Alanis, Anthony; Tehami, Salik; Kamel, Serageldin; Duke, Simon; Sherer, Michael V; Rasmussen, Mathis; Korreman, Stine; Fuentes, David; Cislo, Michael; Nelms, Benjamin E; Christodouleas, John P; Murphy, James D; Mohamed, Abdallah S R; He, Renjie; Naser, Mohammed A; Gillespie, Erin F; Fuller, Clifton D.

medRxiv ; 2023 Sep 05.

Article in English | MEDLINE | ID: mdl-37693394

ABSTRACT

BACKGROUND: Medical image auto-segmentation is poised to revolutionize radiotherapy workflows. The quality of auto-segmentation training data, primarily derived from clinician observers, is of utmost importance. However, the factors influencing the quality of these clinician-derived segmentations have yet to be fully understood or quantified. Therefore, the purpose of this study was to determine the role of common observer demographic variables on quantitative segmentation performance. METHODS: Organ at risk (OAR) and tumor volume segmentations provided by radiation oncologist observers from the Contouring Collaborative for Consensus in Radiation Oncology public dataset were utilized for this study. Segmentations were derived from five separate disease sites comprised of one patient case each: breast, sarcoma, head and neck (H&N), gynecologic (GYN), and gastrointestinal (GI). Segmentation quality was determined on a structure-by-structure basis by comparing the observer segmentations with an expert-derived consensus gold standard primarily using the Dice Similarity Coefficient (DSC); surface DSC was investigated as a secondary metric. Metrics were stratified into binary groups based on previously established structure-specific expert-derived interobserver variability (IOV) cutoffs. Generalized linear mixed-effects models using Markov chain Monte Carlo Bayesian estimation were used to investigate the association between demographic variables and the binarized segmentation quality for each disease site separately. Variables with a highest density interval excluding zero - loosely analogous to frequentist significance - were considered to substantially impact the outcome measure. RESULTS: After filtering by practicing radiation oncologists, 574, 110, 452, 112, and 48 structure observations remained for the breast, sarcoma, H&N, GYN, and GI cases, respectively. The median percentage of observations that crossed the expert DSC IOV cutoff when stratified by structure type was 55% and 31% for OARs and tumor volumes, respectively. Bayesian regression analysis revealed tumor category had a substantial negative impact on binarized DSC for the breast (coefficient mean ± standard deviation: -0.97 ± 0.20), sarcoma (-1.04 ± 0.54), H&N (-1.00 ± 0.24), and GI (-2.95 ± 0.98) cases. There were no clear recurring relationships between segmentation quality and demographic variables across the cases, with most variables demonstrating large standard deviations and wide highest density intervals. CONCLUSION: Our study highlights substantial uncertainty surrounding conventionally presumed factors influencing segmentation quality. Future studies should investigate additional demographic variables, more patients and imaging modalities, and alternative metrics of segmentation acceptability.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL