RESUMO
PURPOSE: Attenuation correction and scatter compensation (AC/SC) are two main steps toward quantitative PET imaging, which remain challenging in PET-only and PET/MRI systems. These can be effectively tackled via deep learning (DL) methods. However, trustworthy, and generalizable DL models commonly require well-curated, heterogeneous, and large datasets from multiple clinical centers. At the same time, owing to legal/ethical issues and privacy concerns, forming a large collective, centralized dataset poses significant challenges. In this work, we aimed to develop a DL-based model in a multicenter setting without direct sharing of data using federated learning (FL) for AC/SC of PET images. METHODS: Non-attenuation/scatter corrected and CT-based attenuation/scatter corrected (CT-ASC) 18F-FDG PET images of 300 patients were enrolled in this study. The dataset consisted of 6 different centers, each with 50 patients, with scanner, image acquisition, and reconstruction protocols varying across the centers. CT-based ASC PET images served as the standard reference. All images were reviewed to include high-quality and artifact-free PET images. Both corrected and uncorrected PET images were converted to standardized uptake values (SUVs). We used a modified nested U-Net utilizing residual U-block in a U-shape architecture. We evaluated two FL models, namely sequential (FL-SQ) and parallel (FL-PL) and compared their performance with the baseline centralized (CZ) learning model wherein the data were pooled to one server, as well as center-based (CB) models where for each center the model was built and evaluated separately. Data from each center were divided to contribute to training (30 patients), validation (10 patients), and test sets (10 patients). Final evaluations and reports were performed on 60 patients (10 patients from each center). RESULTS: In terms of percent SUV absolute relative error (ARE%), both FL-SQ (CI:12.21-14.81%) and FL-PL (CI:11.82-13.84%) models demonstrated excellent agreement with the centralized framework (CI:10.32-12.00%), while FL-based algorithms improved model performance by over 11% compared to CB training strategy (CI: 22.34-26.10%). Furthermore, the Mann-Whitney test between different strategies revealed no significant differences between CZ and FL-based algorithms (p-value > 0.05) in center-categorized mode. At the same time, a significant difference was observed between the different training approaches on the overall dataset (p-value < 0.05). In addition, voxel-wise comparison, with respect to reference CT-ASC, exhibited similar performance for images predicted by CZ (R2 = 0.94), FL-SQ (R2 = 0.93), and FL-PL (R2 = 0.92), while CB model achieved a far lower coefficient of determination (R2 = 0.74). Despite the strong correlations between CZ and FL-based methods compared to reference CT-ASC, a slight underestimation of predicted voxel values was observed. CONCLUSION: Deep learning-based models provide promising results toward quantitative PET image reconstruction. Specifically, we developed two FL models and compared their performance with center-based and centralized models. The proposed FL-based models achieved higher performance compared to center-based models, comparable with centralized models. Our work provided strong empirical evidence that the FL framework can fully benefit from the generalizability and robustness of DL models used for AC/SC in PET, while obviating the need for the direct sharing of datasets between clinical imaging centers.
Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Tomografia por Emissão de Pósitrons/métodos , Imageamento por Ressonância Magnética/métodosRESUMO
PURPOSE: Image artefacts continue to pose challenges in clinical molecular imaging, resulting in misdiagnoses, additional radiation doses to patients and financial costs. Mismatch and halo artefacts occur frequently in gallium-68 (68Ga)-labelled compounds whole-body PET/CT imaging. Correcting for these artefacts is not straightforward and requires algorithmic developments, given that conventional techniques have failed to address them adequately. In the current study, we employed differential privacy-preserving federated transfer learning (FTL) to manage clinical data sharing and tackle privacy issues for building centre-specific models that detect and correct artefacts present in PET images. METHODS: Altogether, 1413 patients with 68Ga prostate-specific membrane antigen (PSMA)/DOTA-TATE (TOC) PET/CT scans from 3 countries, including 8 different centres, were enrolled in this study. CT-based attenuation and scatter correction (CT-ASC) was used in all centres for quantitative PET reconstruction. Prior to model training, an experienced nuclear medicine physician reviewed all images to ensure the use of high-quality, artefact-free PET images (421 patients' images). A deep neural network (modified U2Net) was trained on 80% of the artefact-free PET images to utilize centre-based (CeBa), centralized (CeZe) and the proposed differential privacy FTL frameworks. Quantitative analysis was performed in 20% of the clean data (with no artefacts) in each centre. A panel of two nuclear medicine physicians conducted qualitative assessment of image quality, diagnostic confidence and image artefacts in 128 patients with artefacts (256 images for CT-ASC and FTL-ASC). RESULTS: The three approaches investigated in this study for 68Ga-PET imaging (CeBa, CeZe and FTL) resulted in a mean absolute error (MAE) of 0.42 ± 0.21 (CI 95%: 0.38 to 0.47), 0.32 ± 0.23 (CI 95%: 0.27 to 0.37) and 0.28 ± 0.15 (CI 95%: 0.25 to 0.31), respectively. Statistical analysis using the Wilcoxon test revealed significant differences between the three approaches, with FTL outperforming CeBa and CeZe (p-value < 0.05) in the clean test set. The qualitative assessment demonstrated that FTL-ASC significantly improved image quality and diagnostic confidence and decreased image artefacts, compared to CT-ASC in 68Ga-PET imaging. In addition, mismatch and halo artefacts were successfully detected and disentangled in the chest, abdomen and pelvic regions in 68Ga-PET imaging. CONCLUSION: The proposed approach benefits from using large datasets from multiple centres while preserving patient privacy. Qualitative assessment by nuclear medicine physicians showed that the proposed model correctly addressed two main challenging artefacts in 68Ga-PET imaging. This technique could be integrated in the clinic for 68Ga-PET imaging artefact detection and disentanglement using multicentric heterogeneous datasets.
Assuntos
Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Neoplasias da Próstata , Masculino , Humanos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Artefatos , Radioisótopos de Gálio , Privacidade , Tomografia por Emissão de Pósitrons/métodos , Aprendizado de Máquina , Processamento de Imagem Assistida por Computador/métodosRESUMO
In this study, an inter-fraction organ deformation simulation framework for the locally advanced cervical cancer (LACC), which considers the anatomical flexibility, rigidity, and motion within an image deformation, was proposed. Data included 57 CT scans (7202 2D slices) of patients with LACC randomly divided into the train (n = 42) and test (n = 15) datasets. In addition to CT images and the corresponding RT structure (bladder, cervix, and rectum), the bone was segmented, and the coaches were eliminated. The correlated stochastic field was simulated using the same size as the target image (used for deformation) to produce the general random deformation. The deformation field was optimized to have a maximum amplitude in the rectum region, a moderate amplitude in the bladder region, and an amplitude as minimum as possible within bony structures. The DIRNet is a convolutional neural network that consists of convolutional regressors, spatial transformation, as well as resampling blocks. It was implemented by different parameters. Mean Dice indices of 0.89 ± 0.02, 0.96 ± 0.01, and 0.93 ± 0.02 were obtained for the cervix, bladder, and rectum (defined as at-risk organs), respectively. Furthermore, a mean average symmetric surface distance of 1.61 ± 0.46 mm for the cervix, 1.17 ± 0.15 mm for the bladder, and 1.06 ± 0.42 mm for the rectum were achieved. In addition, a mean Jaccard of 0.86 ± 0.04 for the cervix, 0.93 ± 0.01 for the bladder, and 0.88 ± 0.04 for the rectum were observed on the test dataset (15 subjects). Deep learning-based non-rigid image registration is, therefore, proposed for the high-dose-rate brachytherapy in inter-fraction cervical cancer since it outperformed conventional algorithms.
Assuntos
Braquiterapia , Aprendizado Profundo , Neoplasias do Colo do Útero , Feminino , Humanos , Braquiterapia/métodos , Dosagem Radioterapêutica , Planejamento da Radioterapia Assistida por Computador/métodos , Reto , Neoplasias do Colo do Útero/diagnóstico por imagem , Neoplasias do Colo do Útero/radioterapiaRESUMO
Artificial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and specific recommendations are missing. A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations on compiling test datasets. We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in different countries? The recommendations are intended to help AI developers demonstrate the utility of their products and to help pathologists and regulatory agencies verify reported performance measures. Further research is needed to formulate criteria for sufficiently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic workflows in the future.
Assuntos
Inteligência Artificial , Patologia , Humanos , Previsões , Conjuntos de Dados como AssuntoRESUMO
[Figure: see text].
Assuntos
COVID-19/complicações , Hemorragias Intracranianas/complicações , AVC Isquêmico/complicações , Trombose dos Seios Intracranianos/complicações , Trombose Venosa/complicações , Adulto , Idoso , COVID-19/epidemiologia , Feminino , Geografia , Gastos em Saúde , Humanos , Cooperação Internacional , Hemorragias Intracranianas/epidemiologia , AVC Isquêmico/epidemiologia , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Risco , Trombose dos Seios Intracranianos/epidemiologia , Resultado do Tratamento , Trombose Venosa/epidemiologia , Adulto JovemRESUMO
(1) Background: Atrial fibrillation (AF) is a major risk factor for stroke and is often underdiagnosed, despite being present in 13-26% of ischemic stroke patients. Recently, a significant number of machine learning (ML)-based models have been proposed for AF prediction and detection for primary and secondary stroke prevention. However, clinical translation of these technological innovations to close the AF care gap has been scant. Herein, we sought to systematically examine studies, employing ML models to predict incident AF in a population without prior AF or to detect paroxysmal AF in stroke cohorts to identify key reasons for the lack of translation into the clinical workflow. We conclude with a set of recommendations to improve the clinical translatability of ML-based models for AF. (2) Methods: MEDLINE, Embase, Web of Science, Clinicaltrials.gov, and ICTRP databases were searched for relevant articles from the inception of the databases up to September 2022 to identify peer-reviewed articles in English that used ML methods to predict incident AF or detect AF after stroke and reported adequate performance metrics. The search yielded 2815 articles, of which 16 studies using ML models to predict incident AF and three studies focusing on ML models to detect AF post-stroke were included. (3) Conclusions: This study highlights that (1) many models utilized only a limited subset of variables available from patients' health records; (2) only 37% of models were externally validated, and stratified analysis was often lacking; (3) 0% of models and 53% of datasets were explicitly made available, limiting reproducibility and transparency; and (4) data pre-processing did not include bias mitigation and sufficient details, leading to potential selection bias. Low generalizability, high false alarm rate, and lack of interpretability were identified as additional factors to be addressed before ML models can be widely deployed in the clinical care setting. Given these limitations, our recommendations to improve the uptake of ML models for better AF outcomes include improving generalizability, reducing potential systemic biases, and investing in external validation studies whilst developing a transparent modeling pipeline to ensure reproducibility.
RESUMO
BACKGROUND: Deep learning is a promising way to improve health care. Image-processing medical disciplines, such as pathology, are expected to be transformed by deep learning. The first clinically applicable deep-learning diagnostic support tools are already available in cancer pathology, and their number is increasing. However, data on the environmental sustainability of these tools are scarce. We aimed to conduct an environmental-sustainability analysis of a theoretical implementation of deep learning in patient-care pathology. METHODS: For this modelling study, we first assembled and calculated relevant data and parameters of a digital-pathology workflow. Data were breast and prostate specimens from the university clinic at the Institute of Pathology of the Rheinisch-Westfälische Technische Hochschule Aachen (Aachen, Germany), for which commercially available deep learning was already available. Only specimens collected between Jan 1 and Dec 31, 2019 were used, to omit potential biases due to the COVID-19 pandemic. Our final selection was based on 2 representative weeks outside holidays, covering different types of specimens. To calculate carbon dioxide (CO2) or CO2 equivalent (CO2 eq) emissions of deep learning in pathology, we gathered relevant data for exact numbers and sizes of whole-slide images (WSIs), which were generated by scanning histopathology samples of prostate and breast specimens. We also evaluated different data input scenarios (including all slide tiles, only tiles containing tissue, or only tiles containing regions of interest). To convert estimated energy consumption from kWh to CO2 eq, we used the internet protocol address of the computational server and the Electricity Maps database to obtain information on the sources of the local electricity grid (ie, renewable vs non-renewable), and estimated the number of trees and proportion of the local and world's forests needed to sequester the CO2 eq emissions. We calculated the computational requirements and CO2 eq emissions of 30 deep-learning models that varied in task and size. The first scenario represented the use of one commercially available deep-learning model for one task in one case (1-task), the second scenario considered two deep-learning models for two tasks per case (2-task), the third scenario represented a future, potentially automated workflow that could handle 7 tasks per case (7-task), and the fourth scenario represented the use of a single potential, large, computer-vision model that could conduct multiple tasks (multitask). We also compared the performance (ie, accuracy) and CO2 eq emissions of different deep-learning models for the classification of renal cell carcinoma on WSIs, also from Rheinisch-Westfälische Technische Hochschule Aachen. We also tested other approaches to reducing CO2 eq emissions, including model pruning and an alternative method for histopathology analysis (pathomics). FINDINGS: The pathology database contained 35 552 specimens (237 179 slides), 6420 of which were prostate specimens (10 115 slides) and 11 801 of which were breast specimens (19 763 slides). We selected and subsequently digitised 140 slides from eight breast-cancer cases and 223 slides from five prostate-cancer cases. Applying large deep-learning models on all WSI tiles of prostate and breast pathology cases would result in yearly CO2 eq emissions of 7·65 metric tons (t; 95% CI 7·62-7·68) with the use of a single deep-learning model per case; yearly CO2 eq emissions were up to 100·56 t (100·21-100·99) with the use of seven deep-learning models per case. CO2 eq emissions for different deep-learning model scenarios, data inputs, and deep-learning model sizes for all slides varied from 3·61 t (3·59-3·63) to 2795·30 t (1177·51-6482·13. For the estimated number of overall pathology cases worldwide, the yearly CO2 eq emissions varied, reaching up to 16 megatons (Mt) of CO2 eq, requiring up to 86 590 km2 (0·22%) of world forest to sequester the CO2 eq emissions. Use of the 7-task scenario and small deep-learning models on slides containing tissue only could substantially reduce CO2 eq emissions worldwide by up to 141 times (0·1 Mt, 95% CI 0·1-0·1). Considering the local environment in Aachen, Germany, the maximum CO2 eq emission from the use of deep learning in digital pathology only would require 32·8% (95% CI 13·8-76·6) of the local forest to sequester the CO2 eq emissions. A single pathomics run on a tissue could provide information that was comparable to or even better than the output of multitask deep-learning models, but with 147 times reduced CO2 eq emissions. INTERPRETATION: Our findings suggest that widespread use of deep learning in pathology might have considerable global-warming potential. The medical community, policy decision makers, and the public should be aware of this potential and encourage the use of CO2 eq emissions reduction strategies where possible. FUNDING: German Research Foundation, European Research Council, German Federal Ministry of Education and Research, Health, Economic Affairs and Climate Action, and the Innovation Fund of the Federal Joint Committee.
Assuntos
Aprendizado Profundo , Gases de Efeito Estufa , Neoplasias , Humanos , Gases de Efeito Estufa/análise , Dióxido de Carbono/análise , PandemiasRESUMO
Background: Self-management among stroke survivors is effective in mitigating the risk of a recurrent stroke. This study aims to determine the prevalence of self-management and its associated factors among stroke survivors in the United States. Methods: We analyzed the Behavioral Risk Factor Surveillance System (BRFSS) data from 2016 to 2021, a nationally representative health survey. A new outcome variable, stroke self-management (SSM = low or SSM = high), was defined based on five AHA guideline-recommended self-management practices, including regular physical activity, maintaining body mass index, regular doctor checkups, smoking cessation, and limiting alcohol consumption. A low level of self-management was defined as adherence to three or fewer practices. Results: Among 95,645 American stroke survivors, 46.7% have low self-management. Stroke survivors aged less than 65 are less likely to self-manage (low SSM: 56.8% vs. 42.3%; p < 0.0001). Blacks are less likely to self-manage than non-Hispanic Whites (low SSM: 52.0% vs. 48.6%; p < 0.0001); however, when adjusted for demographic and clinical factors, the difference was dissipated. Higher education and income levels are associated with better self-management (OR: 2.49, [95%CI: 2.16-2.88] and OR: 1.45, [95%CI: 1.26-1.67], respectively). Further sub-analysis revealed that women are less likely to be physically active (OR: 0.88, [95%CI: 0.81-0.95]) but more likely to manage their alcohol consumption (OR: 1.57, [95%CI: 1.29-1.92]). Stroke survivors residing in the Stroke Belt did not self-manage as well as their counterparts (low-SSM: 53.1% vs. 48.0%; p < 0.001). Conclusions: The substantial diversity in self-management practices emphasizes the need for tailored interventions. Particularly, multi-modal interventions should be targeted toward specific populations, including younger stroke survivors with lower education and income.
RESUMO
BACKGROUND: PET/CT images combining anatomic and metabolic data provide complementary information that can improve clinical task performance. PET image segmentation algorithms exploiting the multi-modal information available are still lacking. PURPOSE: Our study aimed to assess the performance of PET and CT image fusion for gross tumor volume (GTV) segmentations of head and neck cancers (HNCs) utilizing conventional, deep learning (DL), and output-level voting-based fusions. METHODS: The current study is based on a total of 328 histologically confirmed HNCs from six different centers. The images were automatically cropped to a 200 × 200 head and neck region box, and CT and PET images were normalized for further processing. Eighteen conventional image-level fusions were implemented. In addition, a modified U2-Net architecture as DL fusion model baseline was used. Three different input, layer, and decision-level information fusions were used. Simultaneous truth and performance level estimation (STAPLE) and majority voting to merge different segmentation outputs (from PET and image-level and network-level fusions), that is, output-level information fusion (voting-based fusions) were employed. Different networks were trained in a 2D manner with a batch size of 64. Twenty percent of the dataset with stratification concerning the centers (20% in each center) were used for final result reporting. Different standard segmentation metrics and conventional PET metrics, such as SUV, were calculated. RESULTS: In single modalities, PET had a reasonable performance with a Dice score of 0.77 ± 0.09, while CT did not perform acceptably and reached a Dice score of only 0.38 ± 0.22. Conventional fusion algorithms obtained a Dice score range of [0.76-0.81] with guided-filter-based context enhancement (GFCE) at the low-end, and anisotropic diffusion and Karhunen-Loeve transform fusion (ADF), multi-resolution singular value decomposition (MSVD), and multi-level image decomposition based on latent low-rank representation (MDLatLRR) at the high-end. All DL fusion models achieved Dice scores of 0.80. Output-level voting-based models outperformed all other models, achieving superior results with a Dice score of 0.84 for Majority_ImgFus, Majority_All, and Majority_Fast. A mean error of almost zero was achieved for all fusions using SUVpeak , SUVmean and SUVmedian . CONCLUSION: PET/CT information fusion adds significant value to segmentation tasks, considerably outperforming PET-only and CT-only methods. In addition, both conventional image-level and DL fusions achieve competitive results. Meanwhile, output-level voting-based fusion using majority voting of several algorithms results in statistically significant improvements in the segmentation of HNC.
Assuntos
Neoplasias de Cabeça e Pescoço , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Humanos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Algoritmos , Neoplasias de Cabeça e Pescoço/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodosRESUMO
BACKGROUND: Notwithstanding the encouraging results of previous studies reporting on the efficiency of deep learning (DL) in COVID-19 prognostication, clinical adoption of the developed methodology still needs to be improved. To overcome this limitation, we set out to predict the prognosis of a large multi-institutional cohort of patients with COVID-19 using a DL-based model. PURPOSE: This study aimed to evaluate the performance of deep privacy-preserving federated learning (DPFL) in predicting COVID-19 outcomes using chest CT images. METHODS: After applying inclusion and exclusion criteria, 3055 patients from 19 centers, including 1599 alive and 1456 deceased, were enrolled in this study. Data from all centers were split (randomly with stratification respective to each center and class) into a training/validation set (70%/10%) and a hold-out test set (20%). For the DL model, feature extraction was performed on 2D slices, and averaging was performed at the final layer to construct a 3D model for each scan. The DensNet model was used for feature extraction. The model was developed using centralized and FL approaches. For FL, we employed DPFL approaches. Membership inference attack was also evaluated in the FL strategy. For model evaluation, different metrics were reported in the hold-out test sets. In addition, models trained in two scenarios, centralized and FL, were compared using the DeLong test for statistical differences. RESULTS: The centralized model achieved an accuracy of 0.76, while the DPFL model had an accuracy of 0.75. Both the centralized and DPFL models achieved a specificity of 0.77. The centralized model achieved a sensitivity of 0.74, while the DPFL model had a sensitivity of 0.73. A mean AUC of 0.82 and 0.81 with 95% confidence intervals of (95% CI: 0.79-0.85) and (95% CI: 0.77-0.84) were achieved by the centralized model and the DPFL model, respectively. The DeLong test did not prove statistically significant differences between the two models (p-value = 0.98). The AUC values for the inference attacks fluctuate between 0.49 and 0.51, with an average of 0.50 ± 0.003 and 95% CI for the mean AUC of 0.500 to 0.501. CONCLUSION: The performance of the proposed model was comparable to centralized models while operating on large and heterogeneous multi-institutional datasets. In addition, the model was resistant to inference attacks, ensuring the privacy of shared data during the training process.
Assuntos
COVID-19 , Aprendizado Profundo , Tomografia Computadorizada por Raios X , COVID-19/diagnóstico por imagem , Humanos , Prognóstico , Masculino , Feminino , Idoso , Pessoa de Meia-Idade , Privacidade , Radiografia Torácica , Conjuntos de Dados como AssuntoRESUMO
Anomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.
RESUMO
BACKGROUND AND OBJECTIVE: Generalizable and trustworthy deep learning models for PET/CT image segmentation necessitates large diverse multi-institutional datasets. However, legal, ethical, and patient privacy issues challenge sharing of datasets between different centers. To overcome these challenges, we developed a federated learning (FL) framework for multi-institutional PET/CT image segmentation. METHODS: A dataset consisting of 328 FL (HN) cancer patients who underwent clinical PET/CT examinations gathered from six different centers was enrolled. A pure transformer network was implemented as fully core segmentation algorithms using dual channel PET/CT images. We evaluated different frameworks (single center-based, centralized baseline, as well as seven different FL algorithms) using 68 PET/CT images (20% of each center data). In particular, the implemented FL algorithms include clipping with the quantile estimator (ClQu), zeroing with the quantile estimator (ZeQu), federated averaging (FedAvg), lossy compression (LoCo), robust aggregation (RoAg), secure aggregation (SeAg), and Gaussian differentially private FedAvg with adaptive quantile clipping (GDP-AQuCl). RESULTS: The Dice coefficient was 0.80±0.11 for both centralized and SeAg FL algorithms. All FL approaches achieved centralized learning model performance with no statistically significant differences. Among the FL algorithms, SeAg and GDP-AQuCl performed better than the other techniques. However, there was no statistically significant difference. All algorithms, except the center-based approach, resulted in relative errors less than 5% for SUVmax and SUVmean for all FL and centralized methods. Centralized and FL algorithms significantly outperformed the single center-based baseline. CONCLUSIONS: The developed FL-based (with centralized method performance) algorithms exhibited promising performance for HN tumor segmentation from PET/CT images.
Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias/diagnóstico por imagem , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodosRESUMO
Mandibular retrognathia (C2Rm) is one of the most common oral pathologies. Acquiring a better understanding of the points of impact of C2Rm on the entire skull is of major interest in the diagnosis, treatment, and management of this dysmorphism, but also permits us to contribute to the debate on the changes undergone by the shape of the skull during human evolution. However, conventional methods have some limits in meeting these challenges, insofar as they require defining in advance the structures to be studied, and identifying them using landmarks. In this context, our work aims to answer these questions using AI tools and, in particular, machine learning, with the objective of relaying these treatments automatically. We propose an innovative methodology coupling convolutional neural networks (CNNs) and interpretability algorithms. Applied to a set of radiographs classified into physiological versus pathological categories, our methodology made it possible to: discuss the structures impacted by retrognathia and already identified in literature; identify new structures of potential interest in medical terms; highlight the dynamic evolution of impacted structures according to the level of gravity of C2Rm; provide for insights into the evolution of human anatomy. Results were discussed in terms of the major interest of this approach in the field of orthodontics and, more generally, in the field of automated processing of medical images.
Assuntos
Retrognatismo , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Crânio/diagnóstico por imagemRESUMO
PURPOSE: Absorbed dose calculation in magnetic resonance-guided radiation therapy (MRgRT) is commonly based on pseudo CT (pCT) images. This study investigated the feasibility of unsupervised pCT generation from MRI using a cycle generative adversarial network (CycleGAN) and a heterogenous multicentric dataset. A dosimetric analysis in three-dimensional conformal radiotherapy (3DCRT) planning was also performed. MATERIAL AND METHODS: Overall, 87 T1-weighted and 102 T2-weighted MR images alongside with their corresponding computed tomography (CT) images of brain cancer patients from multiple centers were used. Initially, images underwent a number of preprocessing steps, including rigid registration, novel CT Masker, N4 bias field correction, resampling, resizing, and rescaling. To overcome the gradient vanishing problem, residual blocks and mean squared error (MSE) loss function were utilized in the generator and in both networks (generator and discriminator), respectively. The CycleGAN was trained and validated using 70 T1 and 80 T2 randomly selected patients in an unsupervised manner. The remaining patients were used as a holdout test set to report final evaluation metrics. The generated pCTs were validated in the context of 3DCRT. RESULTS: The CycleGAN model using masked T2 images achieved better performance with a mean absolute error (MAE) of 61.87 ± 22.58 HU, peak signal to noise ratio (PSNR) of 27.05 ± 2.25 (dB), and structural similarity index metric (SSIM) of 0.84 ± 0.05 on the test dataset. T1-weighted MR images used for dosimetric assessment revealed a gamma index of 3%, 3 mm, 2%, 2 mm and 1%, 1 mm with acceptance criteria of 98.96% ± 1.1%, 95% ± 3.68%, 90.1% ± 6.05%, respectively. The DVH differences between CTs and pCTs were within 2%. CONCLUSIONS: A promising pCT generation model capable of handling heterogenous multicenteric datasets was proposed. All MR sequences performed competitively with no significant difference in pCT generation. The proposed CT Masker proved promising in improving the model accuracy and robustness. There was no significant difference between using T1-weighted and T2-weighted MR images for pCT generation.
RESUMO
PURPOSE: The generalizability and trustworthiness of deep learning (DL)-based algorithms depend on the size and heterogeneity of training datasets. However, because of patient privacy concerns and ethical and legal issues, sharing medical images between different centers is restricted. Our objective is to build a federated DL-based framework for PET image segmentation utilizing a multicentric dataset and to compare its performance with the centralized DL approach. METHODS: PET images from 405 head and neck cancer patients from 9 different centers formed the basis of this study. All tumors were segmented manually. PET images converted to SUV maps were resampled to isotropic voxels (3 × 3 × 3 mm3) and then normalized. PET image subvolumes (12 × 12 × 12 cm3) consisting of whole tumors and background were analyzed. Data from each center were divided into train/validation (80% of patients) and test sets (20% of patients). The modified R2U-Net was used as core DL model. A parallel federated DL model was developed and compared with the centralized approach where the data sets are pooled to one server. Segmentation metrics, including Dice similarity and Jaccard coefficients, percent relative errors (RE%) of SUVpeak, SUVmean, SUVmedian, SUVmax, metabolic tumor volume, and total lesion glycolysis were computed and compared with manual delineations. RESULTS: The performance of the centralized versus federated DL methods was nearly identical for segmentation metrics: Dice (0.84 ± 0.06 vs 0.84 ± 0.05) and Jaccard (0.73 ± 0.08 vs 0.73 ± 0.07). For quantitative PET parameters, we obtained comparable RE% for SUVmean (6.43% ± 4.72% vs 6.61% ± 5.42%), metabolic tumor volume (12.2% ± 16.2% vs 12.1% ± 15.89%), and total lesion glycolysis (6.93% ± 9.6% vs 7.07% ± 9.85%) and negligible RE% for SUVmax and SUVpeak. No significant differences in performance (P > 0.05) between the 2 frameworks (centralized vs federated) were observed. CONCLUSION: The developed federated DL model achieved comparable quantitative performance with respect to the centralized DL model. Federated DL models could provide robust and generalizable segmentation, while addressing patient privacy and legal and ethical issues in clinical data sharing.
Assuntos
Aprendizado Profundo , Neoplasias de Cabeça e Pescoço , Algoritmos , Humanos , Processamento de Imagem Assistida por Computador/métodos , Tomografia por Emissão de PósitronsRESUMO
Background and purpose.Accurate volume delineation plays an essential role in radiotherapy. Contouring is a potential source of uncertainties in radiotherapy treatment planning that could affect treatment outcomes. Therefore, reducing the degree of contouring uncertainties is crucial. The role of utilized imaging modality in the organ delineation uncertainties has been investigated. This systematic review explores the influential factors on inter-and intra-observer uncertainties of target volume and organs at risk (OARs) delineation focusing on the used imaging modality for these uncertainties reduction and the reported subsequent histopathology and follow-up assessment.Methods and materials.An inclusive search strategy has been conducted to query the available online databases (Scopus, Google Scholar, PubMed, and Medline). 'Organ at risk', 'target', 'delineation', 'uncertainties', 'radiotherapy' and their relevant terms were utilized using every database searching syntax. Final article extraction was performed following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline. Included studies were limited to the ones published in English between 1995 and 2020 and that just deal with computed tomography (CT) and magnetic resonance imaging (MRI) modalities.Results.A total of 923 studies were screened and 78 were included of which 31 related to the prostate 20 to the breast, 18 to the head and neck, and 9 to the brain tumor site. 98% of the extracted studies performed volumetric analysis. Only 24% of the publications reported the dose deviations resulted from variation in volume delineation Also, heterogeneity in studied populations and reported geometric and volumetric parameters were identified such that quantitative synthesis was not appropriate.Conclusion.This review highlightes the inter- and intra-observer variations that could lead to contouring uncertainties and impede tumor control in radiotherapy. For improving volume delineation and reducing inter-observer variability, the implementation of well structured training programs, homogeneity in following consensus and guidelines, reliable ground truth selection, and proper imaging modality utilization could be clinically beneficial.
Assuntos
Radioterapia (Especialidade) , Humanos , Masculino , Variações Dependentes do Observador , Órgãos em Risco , Próstata , Planejamento da Radioterapia Assistida por ComputadorRESUMO
The future of healthcare is an organic blend of technology, innovation, and human connection. As artificial intelligence (AI) is gradually becoming a go-to technology in healthcare to improve efficiency and outcomes, we must understand our limitations. We should realize that our goal is not only to provide faster and more efficient care, but also to deliver an integrated solution to ensure that the care is fair and not biased to a group of sub-population. In this context, the field of cardio-cerebrovascular diseases, which encompasses a wide range of conditions-from heart failure to stroke-has made some advances to provide assistive tools to care providers. This article aimed to provide an overall thematic review of recent development focusing on various AI applications in cardio-cerebrovascular diseases to identify gaps and potential areas of improvement. If well designed, technological engines have the potential to improve healthcare access and equitability while reducing overall costs, diagnostic errors, and disparity in a system that affects patients and providers and strives for efficiency.
RESUMO
BACKGROUND: SARS-CoV-2 infected patients are suggested to have a higher incidence of thrombotic events such as acute ischemic strokes (AIS). This study aimed at exploring vascular comorbidity patterns among SARS-CoV-2 infected patients with subsequent stroke. We also investigated whether the comorbidities and their frequencies under each subclass of TOAST criteria were similar to the AIS population studies prior to the pandemic. METHODS: This is a report from the Multinational COVID-19 Stroke Study Group. We present an original dataset of SASR-CoV-2 infected patients who had a subsequent stroke recorded through our multicenter prospective study. In addition, we built a dataset of previously reported patients by conducting a systematic literature review. We demonstrated distinct subgroups by clinical risk scoring models and unsupervised machine learning algorithms, including hierarchical K-Means (ML-K) and Spectral clustering (ML-S). RESULTS: This study included 323 AIS patients from 71 centers in 17 countries from the original dataset and 145 patients reported in the literature. The unsupervised clustering methods suggest a distinct cohort of patients (ML-K: 36% and ML-S: 42%) with no or few comorbidities. These patients were more than 6 years younger than other subgroups and more likely were men (ML-K: 59% and ML-S: 60%). The majority of patients in this subgroup suffered from an embolic-appearing stroke on imaging (ML-K: 83% and ML-S: 85%) and had about 50% risk of large vessel occlusions (ML-K: 50% and ML-S: 53%). In addition, there were two cohorts of patients with large-artery atherosclerosis (ML-K: 30% and ML-S: 43% of patients) and cardioembolic strokes (ML-K: 34% and ML-S: 15%) with consistent comorbidity and imaging patterns. Binominal logistic regression demonstrated that ischemic heart disease (odds ratio (OR), 4.9; 95% confidence interval (CI), 1.6-14.7), atrial fibrillation (OR, 14.0; 95% CI, 4.8-40.8), and active neoplasm (OR, 7.1; 95% CI, 1.4-36.2) were associated with cardioembolic stroke. CONCLUSIONS: Although a cohort of young and healthy men with cardioembolic and large vessel occlusions can be distinguished using both clinical sub-grouping and unsupervised clustering, stroke in other patients may be explained based on the existing comorbidities.
RESUMO
BACKGROUND: Assessment of health-related quality of life (HRQOL or utility) is a complex issue especially in children with temporary health states. OBJECTIVES: To assess the utility of prone positioning as a prophylactic postsurgical approach with the aid of 5 frequently used general instruments. METHODS: Visual analogue scale (VAS), time trade-off (TTO), modified TTO (m-TTO), standard gamble (SG), and chain of gambles (ChGs) instruments and interview with the parent caregivers were used to measure the HRQOL (utility value) of patients who were admitted in the surgical wards of Children's Medical Center Hospital between July and November 2015. RESULTS: A total of 74 parent caregivers with a mean age of 30.48 ± 6.66 years were enrolled. On the basis of the Gaussian model of the repeated VAS measures, we classified the behavior of the participants into 4 clusters. Cumulative study of all these clusters demonstrated that TTO has the highest utility measure for prone positioning (0.682 ± 0.359), whereas the lowest utility value was measured by VAS2 (0.132 ± 0.569). In addition, all VAS measures underestimated the preferences. Overall, values of TTO, m-TTO, and ChGs remained consistent through each of these 4 clusters (intracluster consistency) and within each cluster (intercluster consistency). The adopted utility value of prone positioning based on these 3 instruments was estimated as 0.68 ± 0.21. CONCLUSIONS: We recommended a model for assessment of HRQOL in children with temporary health states to overcome the challenges of each isolated instrument and used this model to measure the utility value of prone positioning in pediatric patients.