RESUMO
OBJECTIVES: Using diffusion basis spectrum imaging (DBSI) to examine the microstructural changes in the substantia nigra (SN) and global white matter (WM) tracts of patients with early-stage PD. METHODS: Thirty-seven age- and sex-matched patients with early-stage PD and 22 healthy controls (HCs) were enrolled in this study. All participants underwent clinical assessments and diffusion-weighted MRI scans, analyzed by diffusion tensor imaging (DTI) and DBSI to assess the pathologies of PD in SN and global WM tracts. RESULTS: The lower DTI fraction anisotropy (FA) was seen in SN of PD patients (PD: 0.316 ± 0.034 vs HCs: 0.331 ± 0.019, p = 0.015). The putative cells marker-DBSI-restricted fraction (PD: 0.132 ± 0.051 vs HCs: 0.105 ± 0.039, p = 0.031) and the edema/extracellular space marker-DBSI non-restricted-fraction (PD: 0.150 ± 0.052 vs HCs: 0.122 ± 0.052, p = 0.020) were both significantly higher and the density of axons/dendrites marker-DBSI fiber-fraction (PD: 0.718 ± 0.073 vs HCs: 0.773 ± 0.071, p = 0.003) was significantly lower in SN of PD patients. DBSI-restricted fraction in SN was negatively correlated with HAMA scores (r = - 0.501, p = 0.005), whereas DTI-FA was not correlated with any clinical scales. In WM tracts, only higher DTI axial diffusivity (AD) among DTI metrics was found in multiple WM regions in PD, while lower DBSI fiber-fraction and higher DBSI non-restricted-fraction were detected in multiple WM regions. DBSI non-restricted-fraction in both left fornix (cres)/stria terminalis (r = -0.472, p = 0.004) and right posterior thalamic radiation (r = - 0.467, p = 0.005) was negatively correlated with MMSE scores. CONCLUSION: DBSI could potentially detect and quantify the extent of inflammatory cell infiltration, fiber/dendrite loss, and edema in both SN and WM tracts in patients with early-stage PD, a finding remains to be further investigated through more extensive longitudinal DBSI analysis. CLINICAL RELEVANCE STATEMENT: Our study shows that DBSI indexes can potentially detect early-stage PD's pathological changes, with a notable ability to distinguish between inflammation and edema. This implies that DBSI has the potential to be an imaging biomarker for early PD diagnosis. KEY POINTS: ⢠Diffusion basis spectrum imaging detected higher restricted-fraction in Parkinson's disease, potentially reflecting inflammatory cell infiltration. ⢠Diffusion basis spectrum imaging detected higher non-restricted-fraction and lower fiber-fraction in Parkinson's disease, indicating the presence of edema and/or dopaminergic neuronal/dendritic loss. ⢠Diffusion basis spectrum imaging metrics correlated with non-motor symptoms, suggesting its potential diagnostic role to detect early-stage PD dysfunctions.
Assuntos
Doença de Parkinson , Substância Branca , Humanos , Imagem de Tensor de Difusão/métodos , Substância Branca/patologia , Doença de Parkinson/patologia , Substância Negra/diagnóstico por imagem , Substância Negra/patologia , Edema/patologiaRESUMO
Diffusion tensor imaging (DTI) has been employed for over 2 decades to noninvasively quantify central nervous system diseases/injuries. However, DTI is an inadequate simplification of diffusion modeling in the presence of coexisting inflammation, edema and crossing nerve fibers. We employed a tissue phantom using fixed mouse trigeminal nerves coated with various amounts of agarose gel to mimic crossing fibers in the presence of vasogenic edema. Diffusivity measures derived by DTI and diffusion basis spectrum imaging (DBSI) were compared at increasing levels of simulated edema and degrees of fiber crossing. Furthermore, we assessed the ability of DBSI, diffusion kurtosis imaging (DKI), generalized q-sampling imaging (GQI), q-ball imaging (QBI) and neurite orientation dispersion and density imaging to resolve fiber crossing, in reference to the gold standard angles measured from structural images. DTI-computed diffusivities and fractional anisotropy were significantly confounded by gel-mimicked edema and crossing fibers. Conversely, DBSI calculated accurate diffusivities of individual fibers regardless of the extent of simulated edema and degrees of fiber crossing angles. Additionally, DBSI accurately and consistently estimated crossing angles in various conditions of gel-mimicked edema when compared with the gold standard (r2 = 0.92, P = 1.9 × 10-9 , bias = 3.9°). Small crossing angles and edema significantly impact the diffusion orientation distribution function, making DKI, GQI and QBI less accurate in detecting and estimating fiber crossing angles. Lastly, we used diffusion tensor ellipsoids to demonstrate that DBSI resolves the confounds of edema and crossing fibers in the peritumoral edema region from a patient with lung cancer metastasis, while DTI failed. In summary, DBSI is able to separate two crossing fibers and accurately recover their diffusivities in a complex environment characterized by increasing crossing angles and amounts of gel-mimicked edema. DBSI also indicated better angular resolution compared with DKI, QBI and GQI.
Assuntos
Imagem de Difusão por Ressonância Magnética , Edema/diagnóstico por imagem , Modelos Biológicos , Fibras Nervosas/patologia , Imagens de Fantasmas , Nervo Trigêmeo/diagnóstico por imagem , Nervo Trigêmeo/patologia , Animais , Anisotropia , Imagem de Tensor de Difusão , Edema/patologia , Feminino , Humanos , Camundongos Endogâmicos C57BL , Substância Branca/diagnóstico por imagemRESUMO
LEVEL OF EVIDENCE: 4 TECHNICAL EFFICACY: Stage 2 J. Magn. Reson. Imaging 2020;52:269-270.
Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias Ovarianas , Imagem de Difusão por Ressonância Magnética , Feminino , Humanos , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
Unenhanced CT scans exhibit high specificity in detecting moderate-to-severe hepatic steatosis. Even though many CTs are scanned from health screening and various diagnostic contexts, their potential for hepatic steatosis detection has largely remained unexplored. The accuracy of previous methodologies has been limited by the inclusion of non-parenchymal liver regions. To overcome this limitation, we present a novel deep-learning (DL) based method tailored for the automatic selection of parenchymal portions in CT images. This innovative method automatically delineates circular regions for effectively detecting hepatic steatosis. We use 1,014 multinational CT images to develop a DL model for segmenting liver and selecting the parenchymal regions. The results demonstrate outstanding performance in both tasks. By excluding non-parenchymal portions, our DL-based method surpasses previous limitations, achieving radiologist-level accuracy in liver attenuation measurements and hepatic steatosis detection. To ensure the reproducibility, we have openly shared 1014 annotated CT images and the DL system codes. Our novel research contributes to the refinement the automated detection methodologies of hepatic steatosis on CT images, enhancing the accuracy and efficiency of healthcare screening processes.
Assuntos
Aprendizado Profundo , Fígado Gorduroso , Fígado , Tomografia Computadorizada por Raios X , Humanos , Tomografia Computadorizada por Raios X/métodos , Fígado Gorduroso/diagnóstico por imagem , Fígado Gorduroso/patologia , Fígado/diagnóstico por imagem , Fígado/patologia , Masculino , Reprodutibilidade dos Testes , FemininoRESUMO
Background and purpose: Lung cancer is a leading cause of cancer-related mortality, and stereotactic body radiotherapy (SBRT) has become a standard treatment for early-stage lung cancer. However, the heterogeneous response to radiation at the tumor level poses challenges. Currently, standardized dosage regimens lack adaptation based on individual patient or tumor characteristics. Thus, we explore the potential of delta radiomics from on-treatment magnetic resonance (MR) imaging to track radiation dose response, inform personalized radiotherapy dosing, and predict outcomes. Materials and methods: A retrospective study of 47 MR-guided lung SBRT treatments for 39 patients was conducted. Radiomic features were extracted using Pyradiomics, and stability was evaluated temporally and spatially. Delta radiomics were correlated with radiation dose delivery and assessed for associations with tumor control and survival with Cox regressions. Results: Among 107 features, 49 demonstrated temporal stability, and 57 showed spatial stability. Fifteen stable and non-collinear features were analyzed. Median Skewness and surface to volume ratio decreased with radiation dose fraction delivery, while coarseness and 90th percentile values increased. Skewness had the largest relative median absolute changes (22 %-45 %) per fraction from baseline and was associated with locoregional failure (p = 0.012) by analysis of covariance. Skewness, Elongation, and Flatness were significantly associated with local recurrence-free survival, while tumor diameter and volume were not. Conclusions: Our study establishes the feasibility and stability of delta radiomics analysis for MR-guided lung SBRT. Findings suggest that MR delta radiomics can capture short-term radiographic manifestations of the intra-tumoral radiation effect.
RESUMO
Pediatric glioma recurrence can cause morbidity and mortality; however, recurrence pattern and severity are heterogeneous and challenging to predict with established clinical and genomic markers. Resultingly, almost all children undergo frequent, long-term, magnetic resonance (MR) brain surveillance regardless of individual recurrence risk. Deep learning analysis of longitudinal MR may be an effective approach for improving individualized recurrence prediction in gliomas and other cancers but has thus far been infeasible with current frameworks. Here, we propose a self-supervised, deep learning approach to longitudinal medical imaging analysis, temporal learning, that models the spatiotemporal information from a patient's current and prior brain MRs to predict future recurrence. We apply temporal learning to pediatric glioma surveillance imaging for 715 patients (3,994 scans) from four distinct clinical settings. We find that longitudinal imaging analysis with temporal learning improves recurrence prediction performance by up to 41% compared to traditional approaches, with improvements in performance in both low- and high-grade glioma. We find that recurrence prediction accuracy increases incrementally with the number of historical scans available per patient. Temporal deep learning may enable point-of-care decision-support for pediatric brain tumors and be adaptable more broadly to patients with other cancers and chronic diseases undergoing surveillance imaging.
RESUMO
Purpose To develop, externally test, and evaluate clinical acceptability of a deep learning pediatric brain tumor segmentation model using stepwise transfer learning. Materials and Methods In this retrospective study, the authors leveraged two T2-weighted MRI datasets (May 2001 through December 2015) from a national brain tumor consortium (n = 184; median age, 7 years [range, 1-23 years]; 94 male patients) and a pediatric cancer center (n = 100; median age, 8 years [range, 1-19 years]; 47 male patients) to develop and evaluate deep learning neural networks for pediatric low-grade glioma segmentation using a stepwise transfer learning approach to maximize performance in a limited data scenario. The best model was externally tested on an independent test set and subjected to randomized blinded evaluation by three clinicians, wherein they assessed clinical acceptability of expert- and artificial intelligence (AI)-generated segmentations via 10-point Likert scales and Turing tests. Results The best AI model used in-domain stepwise transfer learning (median Dice score coefficient, 0.88 [IQR, 0.72-0.91] vs 0.812 [IQR, 0.56-0.89] for baseline model; P = .049). With external testing, the AI model yielded excellent accuracy using reference standards from three clinical experts (median Dice similarity coefficients: expert 1, 0.83 [IQR, 0.75-0.90]; expert 2, 0.81 [IQR, 0.70-0.89]; expert 3, 0.81 [IQR, 0.68-0.88]; mean accuracy, 0.82). For clinical benchmarking (n = 100 scans), experts rated AI-based segmentations higher on average compared with other experts (median Likert score, 9 [IQR, 7-9] vs 7 [IQR 7-9]) and rated more AI segmentations as clinically acceptable (80.2% vs 65.4%). Experts correctly predicted the origin of AI segmentations in an average of 26.0% of cases. Conclusion Stepwise transfer learning enabled expert-level automated pediatric brain tumor autosegmentation and volumetric measurement with a high level of clinical acceptability. Keywords: Stepwise Transfer Learning, Pediatric Brain Tumors, MRI Segmentation, Deep Learning Supplemental material is available for this article. © RSNA, 2024.
Assuntos
Neoplasias Encefálicas , Aprendizado Profundo , Imageamento por Ressonância Magnética , Humanos , Criança , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Imageamento por Ressonância Magnética/métodos , Masculino , Adolescente , Pré-Escolar , Estudos Retrospectivos , Feminino , Lactente , Adulto Jovem , Glioma/diagnóstico por imagem , Glioma/patologia , Interpretação de Imagem Assistida por Computador/métodosRESUMO
Purpose To develop and externally test a scan-to-prediction deep learning pipeline for noninvasive, MRI-based BRAF mutational status classification for pediatric low-grade glioma. Materials and Methods This retrospective study included two pediatric low-grade glioma datasets with linked genomic and diagnostic T2-weighted MRI data of patients: Dana-Farber/Boston Children's Hospital (development dataset, n = 214 [113 (52.8%) male; 104 (48.6%) BRAF wild type, 60 (28.0%) BRAF fusion, and 50 (23.4%) BRAF V600E]) and the Children's Brain Tumor Network (external testing, n = 112 [55 (49.1%) male; 35 (31.2%) BRAF wild type, 60 (53.6%) BRAF fusion, and 17 (15.2%) BRAF V600E]). A deep learning pipeline was developed to classify BRAF mutational status (BRAF wild type vs BRAF fusion vs BRAF V600E) via a two-stage process: (a) three-dimensional tumor segmentation and extraction of axial tumor images and (b) section-wise, deep learning-based classification of mutational status. Knowledge-transfer and self-supervised approaches were investigated to prevent model overfitting, with a primary end point of the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, a novel metric, center of mass distance, was developed to quantify the model attention around the tumor. Results A combination of transfer learning from a pretrained medical imaging-specific network and self-supervised label cross-training (TransferX) coupled with consensus logic yielded the highest classification performance with an AUC of 0.82 (95% CI: 0.72, 0.91), 0.87 (95% CI: 0.61, 0.97), and 0.85 (95% CI: 0.66, 0.95) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively, on internal testing. On external testing, the pipeline yielded an AUC of 0.72 (95% CI: 0.64, 0.86), 0.78 (95% CI: 0.61, 0.89), and 0.72 (95% CI: 0.64, 0.88) for BRAF wild type, BRAF fusion, and BRAF V600E, respectively. Conclusion Transfer learning and self-supervised cross-training improved classification performance and generalizability for noninvasive pediatric low-grade glioma mutational status prediction in a limited data scenario. Keywords: Pediatrics, MRI, CNS, Brain/Brain Stem, Oncology, Feature Detection, Diagnosis, Supervised Learning, Transfer Learning, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2024.
Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Criança , Masculino , Feminino , Neoplasias Encefálicas/diagnóstico por imagem , Estudos Retrospectivos , Proteínas Proto-Oncogênicas B-raf/genética , Glioma/diagnóstico , Aprendizado de MáquinaRESUMO
BACKGROUND: Postoperative recurrence risk for pediatric low-grade gliomas (pLGGs) is challenging to predict by conventional clinical, radiographic, and genomic factors. We investigated if deep learning of MRI tumor features could improve postoperative pLGG risk stratification. METHODS: We used pre-trained deep learning (DL) tool designed for pLGG segmentation to extract pLGG imaging features from preoperative T2-weighted MRI from patients who underwent surgery (DL-MRI features). Patients were pooled from two institutions: Dana Farber/Boston Children's Hospital (DF/BCH) and the Children's Brain Tumor Network (CBTN). We trained three DL logistic hazard models to predict postoperative event-free survival (EFS) probabilities with 1) clinical features, 2) DL-MRI features, and 3) multimodal (clinical and DL-MRI features). We evaluated the models with a time-dependent Concordance Index (Ctd) and risk group stratification with Kaplan Meier plots and log-rank tests. We developed an automated pipeline integrating pLGG segmentation and EFS prediction with the best model. RESULTS: Of the 396 patients analyzed (median follow-up: 85 months, range: 1.5-329 months), 214 (54%) underwent gross total resection and 110 (28%) recurred. The multimodal model improved EFS prediction compared to the DL-MRI and clinical models (Ctd: 0.85 (95% CI: 0.81-0.93), 0.79 (95% CI: 0.70-0.88), and 0.72 (95% CI: 0.57-0.77), respectively). The multimodal model improved risk-group stratification (3-year EFS for predicted high-risk: 31% versus low-risk: 92%, p<0.0001). CONCLUSIONS: DL extracts imaging features that can inform postoperative recurrence prediction for pLGG. Multimodal DL improves postoperative risk stratification for pLGG and may guide postoperative decision-making. Larger, multicenter training data may be needed to improve model generalizability.
RESUMO
Background: Following chemoradiotherapy for high-grade glioma (HGG), it is often challenging to distinguish treatment changes from true tumor progression using conventional MRI. The diffusion basis spectrum imaging (DBSI) hindered fraction is associated with tissue edema or necrosis, which are common treatment-related changes. We hypothesized that DBSI hindered fraction may augment conventional imaging for earlier diagnosis of progression versus treatment effect. Methods: Adult patients were prospectively recruited if they had a known histologic diagnosis of HGG and completed standard-of-care chemoradiotherapy. DBSI and conventional MRI data were acquired longitudinally beginning 4 weeks post-radiation. Conventional MRI and DBSI metrics were compared with respect to their ability to diagnose progression versus treatment effect. Results: Twelve HGG patients were enrolled between August 2019 and February 2020, and 9 were ultimately analyzed (5 progression, 4 treatment effect). Within new or enlarging contrast-enhancing regions, DBSI hindered fraction was significantly higher in the treatment effect group compared to progression group (P = .0004). Compared to serial conventional MRI alone, inclusion of DBSI would have led to earlier diagnosis of either progression or treatment effect in 6 (66.7%) patients by a median of 7.7 (interquartile range = 0-20.1) weeks. Conclusions: In the first longitudinal prospective study of DBSI in adult HGG patients, we found that in new or enlarging contrast-enhancing regions following therapy, DBSI hindered fraction is elevated in cases of treatment effect compared to those with progression. Hindered fraction map may be a valuable adjunct to conventional MRI to distinguish tumor progression from treatment effect.
RESUMO
Purpose: To develop and externally validate a scan-to-prediction deep-learning pipeline for noninvasive, MRI-based BRAF mutational status classification for pLGG. Materials and Methods: We conducted a retrospective study of two pLGG datasets with linked genomic and diagnostic T2-weighted MRI of patients: BCH (development dataset, n=214 [60 (28%) BRAF fusion, 50 (23%) BRAF V600E, 104 (49%) wild-type), and Child Brain Tumor Network (CBTN) (external validation, n=112 [60 (53%) BRAF-Fusion, 17 (15%) BRAF-V600E, 35 (32%) wild-type]). We developed a deep learning pipeline to classify BRAF mutational status (V600E vs. fusion vs. wildtype) via a two-stage process: 1) 3D tumor segmentation and extraction of axial tumor images, and 2) slice-wise, deep learning-based classification of mutational status. We investigated knowledge-transfer and self-supervised approaches to prevent model overfitting with a primary endpoint of the area under the receiver operating characteristic curve (AUC). To enhance model interpretability, we developed a novel metric, COMDist, that quantifies the accuracy of model attention around the tumor. Results: A combination of transfer learning from a pretrained medical imaging-specific network and self-supervised label cross-training (TransferX) coupled with consensus logic yielded the highest macro-average AUC (0.82 [95% CI: 0.70-0.90]) and accuracy (77%) on internal validation, with an AUC improvement of +17.7% and a COMDist improvement of +6.4% versus training from scratch. On external validation, the TransferX model yielded AUC (0.73 [95% CI 0.68-0.88]) and accuracy (75%). Conclusion: Transfer learning and self-supervised cross-training improved classification performance and generalizability for noninvasive pLGG mutational status prediction in a limited data scenario.
RESUMO
Artificial intelligence (AI) and machine learning (ML) are becoming critical in developing and deploying personalized medicine and targeted clinical trials. Recent advances in ML have enabled the integration of wider ranges of data including both medical records and imaging (radiomics). However, the development of prognostic models is complex as no modeling strategy is universally superior to others and validation of developed models requires large and diverse datasets to demonstrate that prognostic models developed (regardless of method) from one dataset are applicable to other datasets both internally and externally. Using a retrospective dataset of 2,552 patients from a single institution and a strict evaluation framework that included external validation on three external patient cohorts (873 patients), we crowdsourced the development of ML models to predict overall survival in head and neck cancer (HNC) using electronic medical records (EMR) and pretreatment radiological images. To assess the relative contributions of radiomics in predicting HNC prognosis, we compared 12 different models using imaging and/or EMR data. The model with the highest accuracy used multitask learning on clinical data and tumor volume, achieving high prognostic accuracy for 2-year and lifetime survival prediction, outperforming models relying on clinical data only, engineered radiomics, or complex deep neural network architecture. However, when we attempted to extend the best performing models from this large training dataset to other institutions, we observed significant reductions in the performance of the model in those datasets, highlighting the importance of detailed population-based reporting for AI/ML model utility and stronger validation frameworks. We have developed highly prognostic models for overall survival in HNC using EMRs and pretreatment radiological images based on a large, retrospective dataset of 2,552 patients from our institution.Diverse ML approaches were used by independent investigators. The model with the highest accuracy used multitask learning on clinical data and tumor volume.External validation of the top three performing models on three datasets (873 patients) with significant differences in the distributions of clinical and demographic variables demonstrated significant decreases in model performance. Significance: ML combined with simple prognostic factors outperformed multiple advanced CT radiomics and deep learning methods. ML models provided diverse solutions for prognosis of patients with HNC but their prognostic value is affected by differences in patient populations and require extensive validation.
Assuntos
Aprendizado Profundo , Neoplasias de Cabeça e Pescoço , Humanos , Prognóstico , Estudos Retrospectivos , Inteligência Artificial , Neoplasias de Cabeça e Pescoço/diagnóstico por imagemRESUMO
Purpose: Artificial intelligence (AI)-automated tumor delineation for pediatric gliomas would enable real-time volumetric evaluation to support diagnosis, treatment response assessment, and clinical decision-making. Auto-segmentation algorithms for pediatric tumors are rare, due to limited data availability, and algorithms have yet to demonstrate clinical translation. Methods: We leveraged two datasets from a national brain tumor consortium (n=184) and a pediatric cancer center (n=100) to develop, externally validate, and clinically benchmark deep learning neural networks for pediatric low-grade glioma (pLGG) segmentation using a novel in-domain, stepwise transfer learning approach. The best model [via Dice similarity coefficient (DSC)] was externally validated and subject to randomized, blinded evaluation by three expert clinicians wherein clinicians assessed clinical acceptability of expert- and AI-generated segmentations via 10-point Likert scales and Turing tests. Results: The best AI model utilized in-domain, stepwise transfer learning (median DSC: 0.877 [IQR 0.715-0.914]) versus baseline model (median DSC 0.812 [IQR 0.559-0.888]; p<0.05). On external testing (n=60), the AI model yielded accuracy comparable to inter-expert agreement (median DSC: 0.834 [IQR 0.726-0.901] vs. 0.861 [IQR 0.795-0.905], p=0.13). On clinical benchmarking (n=100 scans, 300 segmentations from 3 experts), the experts rated the AI model higher on average compared to other experts (median Likert rating: 9 [IQR 7-9]) vs. 7 [IQR 7-9], p<0.05 for each). Additionally, the AI segmentations had significantly higher (p<0.05) overall acceptability compared to experts on average (80.2% vs. 65.4%). Experts correctly predicted the origins of AI segmentations in an average of 26.0% of cases. Conclusions: Stepwise transfer learning enabled expert-level, automated pediatric brain tumor auto-segmentation and volumetric measurement with a high level of clinical acceptability. This approach may enable development and translation of AI imaging segmentation algorithms in limited data scenarios.
RESUMO
Purpose: Sarcopenia is an established prognostic factor in patients diagnosed with head and neck squamous cell carcinoma (HNSCC). The quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical neck skeletal muscle (SM) segmentation and cross-sectional area. However, manual SM segmentation is labor-intensive, prone to inter-observer variability, and impractical for large-scale clinical use. To overcome this challenge, we have developed and externally validated a fully-automated image-based deep learning (DL) platform for cervical vertebral SM segmentation and SMI calculation, and evaluated the relevance of this with survival and toxicity outcomes. Materials and Methods: 899 patients diagnosed as having HNSCC with CT scans from multiple institutes were included, with 335 cases utilized for training, 96 for validation, 48 for internal testing and 393 for external testing. Ground truth single-slice segmentations of SM at the C3 vertebra level were manually generated by experienced radiation oncologists. To develop an efficient method of segmenting the SM, a multi-stage DL pipeline was implemented, consisting of a 2D convolutional neural network (CNN) to select the middle slice of C3 section and a 2D U-Net to segment SM areas. The model performance was evaluated using the Dice Similarity Coefficient (DSC) as the primary metric for the internal test set, and for the external test set the quality of automated segmentation was assessed manually by two experienced radiation oncologists. The L3 skeletal muscle area (SMA) and SMI were then calculated from the C3 cross sectional area (CSA) of the auto-segmented SM. Finally, established SMI cut-offs were used to perform further analyses to assess the correlation with survival and toxicity endpoints in the external institution with univariable and multivariable Cox regression. Results: DSCs for validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI: 0.90 - 0.91) and 0.90 (95% CI: 0.89 - 0.91), respectively. The predicted CSA is highly correlated with the ground-truth CSA in both validation (r = 0.99, p < 0.0001) and test sets (r = 0.96, p < 0.0001). In the external test set (n = 377), 96.2% of the SM segmentations were deemed acceptable by consensus expert review. Predicted SMA and SMI values were highly correlated with the ground-truth values, with Pearson r ß 0.99 (p < 0.0001) for both the female and male patients in all datasets. Sarcopenia was associated with worse OS (HR 2.05 [95% CI 1.04 - 4.04], p = 0.04) and longer PEG tube duration (median 162 days vs. 134 days, HR 1.51 [95% CI 1.12 - 2.08], p = 0.006 in multivariate analysis. Conclusion: We developed and externally validated a fully-automated platform that strongly correlates with imaging-assessed sarcopenia in patients with H&N cancer that correlates with survival and toxicity outcomes. This study constitutes a significant stride towards the integration of sarcopenia assessment into decision-making for individuals diagnosed with HNSCC. SUMMARY STATEMENT: In this study, we developed and externally validated a deep learning model to investigate the impact of sarcopenia, defined as the loss of skeletal muscle mass, on patients with head and neck squamous cell carcinoma (HNSCC) undergoing radiotherapy. We demonstrated an efficient, fullyautomated deep learning pipeline that can accurately segment C3 skeletal muscle area, calculate cross-sectional area, and derive a skeletal muscle index to diagnose sarcopenia from a standard of care CT scan. In multi-institutional data, we found that pre-treatment sarcopenia was associated with significantly reduced overall survival and an increased risk of adverse events. Given the increased vulnerability of patients with HNSCC, the assessment of sarcopenia prior to radiotherapy may aid in informed treatment decision-making and serve as a predictive marker for the necessity of early supportive measures.
RESUMO
BACKGROUND: Pretreatment identification of pathological extranodal extension (ENE) would guide therapy de-escalation strategies for in human papillomavirus (HPV)-associated oropharyngeal carcinoma but is diagnostically challenging. ECOG-ACRIN Cancer Research Group E3311 was a multicentre trial wherein patients with HPV-associated oropharyngeal carcinoma were treated surgically and assigned to a pathological risk-based adjuvant strategy of observation, radiation, or concurrent chemoradiation. Despite protocol exclusion of patients with overt radiographic ENE, more than 30% had pathological ENE and required postoperative chemoradiation. We aimed to evaluate a CT-based deep learning algorithm for prediction of ENE in E3311, a diagnostically challenging cohort wherein algorithm use would be impactful in guiding decision-making. METHODS: For this retrospective evaluation of deep learning algorithm performance, we obtained pretreatment CTs and corresponding surgical pathology reports from the multicentre, randomised de-escalation trial E3311. All enrolled patients on E3311 required pretreatment and diagnostic head and neck imaging; patients with radiographically overt ENE were excluded per study protocol. The lymph node with largest short-axis diameter and up to two additional nodes were segmented on each scan and annotated for ENE per pathology reports. Deep learning algorithm performance for ENE prediction was compared with four board-certified head and neck radiologists. The primary endpoint was the area under the curve (AUC) of the receiver operating characteristic. FINDINGS: From 178 collected scans, 313 nodes were annotated: 71 (23%) with ENE in general, 39 (13%) with ENE larger than 1 mm ENE. The deep learning algorithm AUC for ENE classification was 0·86 (95% CI 0·82-0·90), outperforming all readers (p<0·0001 for each). Among radiologists, there was high variability in specificity (43-86%) and sensitivity (45-96%) with poor inter-reader agreement (κ 0·32). Matching the algorithm specificity to that of the reader with highest AUC (R2, false positive rate 22%) yielded improved sensitivity to 75% (+ 13%). Setting the algorithm false positive rate to 30% yielded 90% sensitivity. The algorithm showed improved performance compared with radiologists for ENE larger than 1 mm (p<0·0001) and in nodes with short-axis diameter 1 cm or larger. INTERPRETATION: The deep learning algorithm outperformed experts in predicting pathological ENE on a challenging cohort of patients with HPV-associated oropharyngeal carcinoma from a randomised clinical trial. Deep learning algorithms should be evaluated prospectively as a treatment selection tool. FUNDING: ECOG-ACRIN Cancer Research Group and the National Cancer Institute of the US National Institutes of Health.
Assuntos
Carcinoma , Aprendizado Profundo , Neoplasias Orofaríngeas , Infecções por Papillomavirus , Humanos , Papillomavirus Humano , Estudos Retrospectivos , Infecções por Papillomavirus/diagnóstico por imagem , Infecções por Papillomavirus/complicações , Extensão Extranodal , Neoplasias Orofaríngeas/diagnóstico por imagem , Neoplasias Orofaríngeas/patologia , Algoritmos , Carcinoma/complicações , Tomografia Computadorizada por Raios XRESUMO
Importance: Sarcopenia is an established prognostic factor in patients with head and neck squamous cell carcinoma (HNSCC); the quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical skeletal muscle segmentation and cross-sectional area. However, manual muscle segmentation is labor intensive, prone to interobserver variability, and impractical for large-scale clinical use. Objective: To develop and externally validate a fully automated image-based deep learning platform for cervical vertebral muscle segmentation and SMI calculation and evaluate associations with survival and treatment toxicity outcomes. Design, Setting, and Participants: For this prognostic study, a model development data set was curated from publicly available and deidentified data from patients with HNSCC treated at MD Anderson Cancer Center between January 1, 2003, and December 31, 2013. A total of 899 patients undergoing primary radiation for HNSCC with abdominal computed tomography scans and complete clinical information were selected. An external validation data set was retrospectively collected from patients undergoing primary radiation therapy between January 1, 1996, and December 31, 2013, at Brigham and Women's Hospital. The data analysis was performed between May 1, 2022, and March 31, 2023. Exposure: C3 vertebral skeletal muscle segmentation during radiation therapy for HNSCC. Main Outcomes and Measures: Overall survival and treatment toxicity outcomes of HNSCC. Results: The total patient cohort comprised 899 patients with HNSCC (median [range] age, 58 [24-90] years; 140 female [15.6%] and 755 male [84.0%]). Dice similarity coefficients for the validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI, 0.90-0.91) and 0.90 (95% CI, 0.89-0.91), respectively, with a mean 96.2% acceptable rate between 2 reviewers on external clinical testing (n = 377). Estimated cross-sectional area and SMI values were associated with manually annotated values (Pearson r = 0.99; P < .001) across data sets. On multivariable Cox proportional hazards regression, SMI-derived sarcopenia was associated with worse overall survival (hazard ratio, 2.05; 95% CI, 1.04-4.04; P = .04) and longer feeding tube duration (median [range], 162 [6-1477] vs 134 [15-1255] days; hazard ratio, 0.66; 95% CI, 0.48-0.89; P = .006) than no sarcopenia. Conclusions and Relevance: This prognostic study's findings show external validation of a fully automated deep learning pipeline to accurately measure sarcopenia in HNSCC and an association with important disease outcomes. The pipeline could enable the integration of sarcopenia assessment into clinical decision making for individuals with HNSCC.
Assuntos
Aprendizado Profundo , Neoplasias de Cabeça e Pescoço , Sarcopenia , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Carcinoma de Células Escamosas de Cabeça e Pescoço/diagnóstico por imagem , Estudos Retrospectivos , Sarcopenia/diagnóstico por imagem , Sarcopenia/complicações , Neoplasias de Cabeça e Pescoço/complicações , Neoplasias de Cabeça e Pescoço/diagnóstico por imagemRESUMO
Lean muscle mass (LMM) is an important aspect of human health. Temporalis muscle thickness is a promising LMM marker but has had limited utility due to its unknown normal growth trajectory and reference ranges and lack of standardized measurement. Here, we develop an automated deep learning pipeline to accurately measure temporalis muscle thickness (iTMT) from routine brain magnetic resonance imaging (MRI). We apply iTMT to 23,876 MRIs of healthy subjects, ages 4 through 35, and generate sex-specific iTMT normal growth charts with percentiles. We find that iTMT was associated with specific physiologic traits, including caloric intake, physical activity, sex hormone levels, and presence of malignancy. We validate iTMT across multiple demographic groups and in children with brain tumors and demonstrate feasibility for individualized longitudinal monitoring. The iTMT pipeline provides unprecedented insights into temporalis muscle growth during human development and enables the use of LMM tracking to inform clinical decision-making.