RESUMO
Survival prediction post-cystectomy is essential for the follow-up care of bladder cancer patients. This study aimed to evaluate artificial intelligence (AI)-large language models (LLMs) for extracting clinical information and improving image analysis, with an initial application involving predicting five-year survival rates of patients after radical cystectomy for bladder cancer. Data were retrospectively collected from medical records and CT urograms (CTUs) of bladder cancer patients between 2001 and 2020. Of 781 patients, 163 underwent chemotherapy, had pre- and post-chemotherapy CTUs, underwent radical cystectomy, and had an available post-surgery five-year survival follow-up. Five AI-LLMs (Dolly-v2, Vicuna-13b, Llama-2.0-13b, GPT-3.5, and GPT-4.0) were used to extract clinical descriptors from each patient's medical records. As a reference standard, clinical descriptors were also extracted manually. Radiomics and deep learning descriptors were extracted from CTU images. The developed multi-modal predictive model, CRD, was based on the clinical (C), radiomics (R), and deep learning (D) descriptors. The LLM retrieval accuracy was assessed. The performances of the survival predictive models were evaluated using AUC and Kaplan-Meier analysis. For the 163 patients (mean age 64 ± 9 years; M:F 131:32), the LLMs achieved extraction accuracies of 74%~87% (Dolly), 76%~83% (Vicuna), 82%~93% (Llama), 85%~91% (GPT-3.5), and 94%~97% (GPT-4.0). For a test dataset of 64 patients, the CRD model achieved AUCs of 0.89 ± 0.04 (manually extracted information), 0.87 ± 0.05 (Dolly), 0.83 ± 0.06~0.84 ± 0.05 (Vicuna), 0.81 ± 0.06~0.86 ± 0.05 (Llama), 0.85 ± 0.05~0.88 ± 0.05 (GPT-3.5), and 0.87 ± 0.05~0.88 ± 0.05 (GPT-4.0). This study demonstrates the use of LLM model-extracted clinical information, in conjunction with imaging analysis, to improve the prediction of clinical outcomes, with bladder cancer as an initial example.
RESUMO
Early diagnosis of lung cancer can significantly improve patient outcomes. We developed a Growth Predictive model based on the Wasserstein Generative Adversarial Network framework (GP-WGAN) to predict the nodule growth patterns in the follow-up LDCT scans. The GP-WGAN was trained with a training set (N = 776) containing 1121 pairs of nodule images with about 1-year intervals and deployed to an independent test set of 450 nodules on baseline LDCT scans to predict nodule images (GP-nodules) in their 1-year follow-up scans. The 450 GP-nodules were finally classified as malignant or benign by a lung cancer risk prediction (LCRP) model, achieving a test AUC of 0.827 ± 0.028, which was comparable to the AUC of 0.862 ± 0.028 achieved by the same LCRP model classifying real follow-up nodule images (p = 0.071). The net reclassification index yielded consistent outcomes (NRI = 0.04; p = 0.62). Other baseline methods, including Lung-RADS and the Brock model, achieved significantly lower performance (p < 0.05). The results demonstrated that the GP-nodules predicted by our GP-WGAN model achieved comparable performance with the nodules in the real follow-up scans for lung cancer diagnosis, indicating the potential to detect lung cancer earlier when coupled with accelerated clinical management versus the current approach of waiting until the next screening exam.
RESUMO
Purpose To evaluate the feasibility of leveraging serial low-dose CT (LDCT) scans to develop a radiomics-based reinforcement learning (RRL) model for improving early diagnosis of lung cancer at baseline screening. Materials and Methods In this retrospective study, 1951 participants (female patients, 822; median age, 61 years [range, 55-74 years]) (male patients, 1129; median age, 62 years [range, 55-74 years]) were randomly selected from the National Lung Screening Trial between August 2002 and April 2004. An RRL model using serial LDCT scans (S-RRL) was trained and validated using data from 1404 participants (372 with lung cancer) containing 2525 available serial LDCT scans up to 3 years. A baseline RRL (B-RRL) model was trained with only LDCT scans acquired at baseline screening for comparison. The 547 held-out individuals (150 with lung cancer) were used as an independent test set for performance evaluation. The area under the receiver operating characteristic curve (AUC) and the net reclassification index (NRI) were used to assess the performances of the models in the classification of screen-detected nodules. Results Deployment to the held-out baseline scans showed that the S-RRL model achieved a significantly higher test AUC (0.88 [95% CI: 0.85, 0.91]) than both the Brock model (AUC, 0.84 [95% CI: 0.81, 0.88]; P = .02) and the B-RRL model (AUC, 0.86 [95% CI: 0.83, 0.90]; P = .02). Lung cancer risk stratification was significantly improved by the S-RRL model as compared with Lung CT Screening Reporting and Data System (NRI, 0.29; P < .001) and the Brock model (NRI, 0.12; P = .008). Conclusion The S-RRL model demonstrated the potential to improve early diagnosis and risk stratification for lung cancer at baseline screening as compared with the B-RRL model and clinical models. Keywords: Radiomics-based Reinforcement Learning, Lung Cancer Screening, Low-Dose CT, Machine Learning © RSNA, 2024 Supplemental material is available for this article.
Assuntos
Detecção Precoce de Câncer , Neoplasias Pulmonares , Tomografia Computadorizada por Raios X , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico , Pessoa de Meia-Idade , Masculino , Feminino , Detecção Precoce de Câncer/métodos , Idoso , Tomografia Computadorizada por Raios X/métodos , Estudos Retrospectivos , Doses de Radiação , Estudos de Viabilidade , Aprendizado de Máquina , Programas de Rastreamento/métodos , Pulmão/diagnóstico por imagem , RadiômicaRESUMO
This review focuses on the principles, applications, and performance of mpMRI for bladder imaging. Quantitative imaging biomarkers (QIBs) derived from mpMRI are increasingly used in oncological applications, including tumor staging, prognosis, and assessment of treatment response. To standardize mpMRI acquisition and interpretation, an expert panel developed the Vesical Imaging-Reporting and Data System (VI-RADS). Many studies confirm the standardization and high degree of inter-reader agreement to discriminate muscle invasiveness in bladder cancer, supporting VI-RADS implementation in routine clinical practice. The standard MRI sequences for VI-RADS scoring are anatomical imaging, including T2w images, and physiological imaging with diffusion-weighted MRI (DW-MRI) and dynamic contrast-enhanced MRI (DCE-MRI). Physiological QIBs derived from analysis of DW- and DCE-MRI data and radiomic image features extracted from mpMRI images play an important role in bladder cancer. The current development of AI tools for analyzing mpMRI data and their potential impact on bladder imaging are surveyed. AI architectures are often implemented based on convolutional neural networks (CNNs), focusing on narrow/specific tasks. The application of AI can substantially impact bladder imaging clinical workflows; for example, manual tumor segmentation, which demands high time commitment and has inter-reader variability, can be replaced by an autosegmentation tool. The use of mpMRI and AI is projected to drive the field toward the personalized management of bladder cancer patients.
RESUMO
Accurate survival prediction for bladder cancer patients who have undergone radical cystectomy can improve their treatment management. However, the existing predictive models do not take advantage of both clinical and radiological imaging data. This study aimed to fill this gap by developing an approach that leverages the strengths of clinical (C), radiomics (R), and deep-learning (D) descriptors to improve survival prediction. The dataset comprised 163 patients, including clinical, histopathological information, and CT urography scans. The data were divided by patient into training, validation, and test sets. We analyzed the clinical data by a nomogram and the image data by radiomics and deep-learning models. The descriptors were input into a BPNN model for survival prediction. The AUCs on the test set were (C): 0.82 ± 0.06, (R): 0.73 ± 0.07, (D): 0.71 ± 0.07, (CR): 0.86 ± 0.05, (CD): 0.86 ± 0.05, and (CRD): 0.87 ± 0.05. The predictions based on D and CRD descriptors showed a significant difference (p = 0.007). For Kaplan-Meier survival analysis, the deceased and alive groups were stratified successfully by C (p < 0.001) and CRD (p < 0.001), with CRD predicting the alive group more accurately. The results highlight the potential of combining C, R, and D descriptors to accurately predict the survival of bladder cancer patients after cystectomy.
RESUMO
BACKGROUND: The noise in digital breast tomosynthesis (DBT) includes x-ray quantum noise and detector readout noise. The total radiation dose of a DBT scan is kept at about the level of a digital mammogram but the detector noise is increased due to acquisition of multiple projections. The high noise can degrade the detectability of subtle lesions, specifically microcalcifications (MCs). PURPOSE: We previously developed a deep-learning-based denoiser to improve the image quality of DBT. In the current study, we conducted an observer performance study with breast radiologists to investigate the feasibility of using deep-learning-based denoising to improve the detection of MCs in DBT. METHODS: We have a modular breast phantom set containing seven 1-cm-thick heterogeneous 50% adipose/50% fibroglandular slabs custom-made by CIRS, Inc. (Norfolk, VA). We made six 5-cm-thick breast phantoms embedded with 144 simulated MC clusters of four nominal speck sizes (0.125-0.150, 0.150-0.180, 0.180-0.212, 0.212-0.250 mm) at random locations. The phantoms were imaged with a GE Pristina DBT system using the automatic standard (STD) mode. The phantoms were also imaged with the STD+ mode that increased the average glandular dose by 54% to be used as a reference condition for comparison of radiologists' reading. Our previously trained and validated denoiser was deployed to the STD images to obtain a denoised DBT set (dnSTD). Seven breast radiologists participated as readers to detect the MCs in the DBT volumes of the six phantoms under the three conditions (STD, STD+, dnSTD), totaling 18 DBT volumes. Each radiologist read all the 18 DBT volumes sequentially, which were arranged in a different order for each reader in a counter-balanced manner to minimize any potential reading order effects. They marked the location of each detected MC cluster and provided a conspicuity rating and their confidence level for the perceived cluster. The visual grading characteristics (VGC) analysis was used to compare the conspicuity ratings and the confidence levels of the radiologists for the detection of MCs. RESULTS: The average sensitivities over all MC speck sizes were 65.3%, 73.2%, and 72.3%, respectively, for the radiologists reading the STD, dnSTD, and STD+ volumes. The sensitivity for dnSTD was significantly higher than that for STD (p < 0.005, two-tailed Wilcoxon signed rank test) and comparable to that for STD+. The average false positive rates were 3.9 ± 4.6, 2.8 ± 3.7, and 2.7 ± 3.9 marks per DBT volume, respectively, for reading the STD, dnSTD, and STD+ images but the difference between dnSTD and STD or STD+ did not reach statistical significance. The overall conspicuity ratings and confidence levels by VGC analysis for dnSTD were significantly higher than those for both STD and STD+ (p ≤ 0.001). The critical alpha value for significance was adjusted to be 0.025 with Bonferroni correction. CONCLUSIONS: This observer study using breast phantom images showed that deep-learning-based denoising has the potential to improve the detection of MCs in noisy DBT images and increase radiologists' confidence in differentiating noise from MCs without increasing radiation dose. Further studies are needed to evaluate the generalizability of these results to the wide range of DBTs from human subjects and patient populations in clinical settings.
Assuntos
Doenças Mamárias , Calcinose , Mamografia , Feminino , Humanos , Mama/diagnóstico por imagem , Mama/patologia , Doenças Mamárias/diagnóstico por imagem , Doenças Mamárias/patologia , Calcinose/diagnóstico por imagem , Calcinose/patologia , Aprendizado Profundo , Mamografia/métodos , Imagens de FantasmasRESUMO
A murine model of myelofibrosis in tibia was used in a co-clinical trial to evaluate segmentation methods for application of image-based biomarkers to assess disease status. The dataset (32 mice with 157 3D MRI scans including 49 test-retest pairs scanned on consecutive days) was split into approximately 70% training, 10% validation, and 20% test subsets. Two expert annotators (EA1 and EA2) performed manual segmentations of the mouse tibia (EA1: all data; EA2: test and validation). Attention U-net (A-U-net) model performance was assessed for accuracy with respect to EA1 reference using the average Jaccard index (AJI), volume intersection ratio (AVI), volume error (AVE), and Hausdorff distance (AHD) for four training scenarios: full training, two half-splits, and a single-mouse subsets. The repeatability of computer versus expert segmentations for tibia volume of test-retest pairs was assessed by within-subject coefficient of variance (%wCV). A-U-net models trained on full and half-split training sets achieved similar average accuracy (with respect to EA1 annotations) for test set: AJI = 83-84%, AVI = 89-90%, AVE = 2-3%, and AHD = 0.5 mm-0.7 mm, exceeding EA2 accuracy: AJ = 81%, AVI = 83%, AVE = 14%, and AHD = 0.3 mm. The A-U-net model repeatability wCV [95% CI]: 3 [2, 5]% was notably better than that of expert annotators EA1: 5 [4, 9]% and EA2: 8 [6, 13]%. The developed deep learning model effectively automates murine bone marrow segmentation with accuracy comparable to human annotators and substantially improved repeatability.
Assuntos
Aprendizado Profundo , Mielofibrose Primária , Humanos , Animais , Camundongos , Processamento de Imagem Assistida por Computador/métodos , Mielofibrose Primária/diagnóstico por imagem , Tíbia/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodosRESUMO
Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. Design, Setting, and Participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22â¯032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. Main Outcomes and Measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. Conclusions and Relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Humanos , Feminino , Benchmarking , Mamografia/métodos , Algoritmos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Neoplasias da Mama/diagnóstico por imagemRESUMO
This study developed a recursive training strategy to train a deep learning model for nuclei detection and segmentation using incomplete annotation. A dataset of 141 H&E stained breast cancer pathologic images with incomplete annotation was randomly split into training/validation set and test set of 89 and 52 images, respectively. The positive training samples were extracted at each annotated cell and augmented with affine translation. The negative training samples were selected from the non-cellular regions free of nuclei using a histogram-based semi-automatic method. A U-Net model was initially trained by minimizing a custom loss function. After the first stage of training, the trained U-Net model was applied to the images in the training set in an inference mode. The U-Net segmented objects with high quality were selected by a semi-automated method. Combining the newly selected high quality objects with the annotated nuclei and the previously generated negative samples, the U-Net model was retrained recursively until the stopping criteria were satisfied. For the 52 test images, the U-Net trained with and without using our recursive training method achieved a sensitivity of 90.3% and 85.3% for nuclei detection, respectively. For nuclei segmentation, the average Dice coefficient and average Jaccard index were 0.831±0.213 and 0.750±0.217, 0.780±0.270 and 0.697±0.264, for U-Net with and without recursive training, respectively. The improvement achieved by our proposed method was statistically significant (P < 0.05). In conclusion, our recursive training method effectively enlarged the set of annotated objects for training the deep learning model and further improved the detection and segmentation performance.
RESUMO
OBJECTIVE: Accurate segmentation of the lung nodule in computed tomography images is a critical component of a computer-assisted lung cancer detection/diagnosis system. However, lung nodule segmentation is a challenging task due to the heterogeneity of nodules. This study is to develop a hybrid deep learning (H-DL) model for the segmentation of lung nodules with a wide variety of sizes, shapes, margins, and opacities. MATERIALS AND METHODS: A dataset collected from Lung Image Database Consortium image collection containing 847 cases with lung nodules manually annotated by at least two radiologists with nodule diameters greater than 7 mm and less than 45 mm was randomly split into 683 training/validation and 164 independent test cases. The 50% consensus consolidation of radiologists' annotation was used as the reference standard for each nodule. We designed a new H-DL model combining two deep convolutional neural networks (DCNNs) with different structures as encoders to increase the learning capabilities for the segmentation of complex lung nodules. Leveraging the basic symmetric U-shaped architecture of U-Net, we redesigned two new U-shaped deep learning (U-DL) models that were expanded to six levels of convolutional layers. One U-DL model used a shallow DCNN structure containing 16 convolutional layers adapted from the VGG-19 as the encoder, and the other used a deep DCNN structure containing 200 layers adapted from DenseNet-201 as the encoder, while the same decoder with only one convolutional layer at each level was used in both U-DL models, and we referred to them as the shallow and deep U-DL models. Finally, an ensemble layer was used to combine the two U-DL models into the H-DL model. We compared the effectiveness of the H-DL, the shallow U-DL and the deep U-DL models by deploying them separately to the test set. The accuracy of volume segmentation for each nodule was evaluated by the 3D Dice coefficient and Jaccard index (JI) relative to the reference standard. For comparison, we calculated the median and minimum of the 3D Dice and JI over the individual radiologists who segmented each nodule, referred to as M-Dice, min-Dice, M-JI, and min-JI. RESULTS: For the 164 test cases with 327 nodules, our H-DL model achieved an average 3D Dice coefficient of 0.750 ± 0.135 and an average JI of 0.617 ± 0.159. The radiologists' average M-Dice was 0.778 ± 0.102, and the average M-JI was 0.651 ± 0.127; both were significantly higher than those achieved by the H-DL model (p < 0.05). The radiologists' average min-Dice (0.685 ± 0.139) and the average min-JI (0.537 ± 0.153) were significantly lower than those achieved by the H-DL model (p < 0.05). The results indicated that the H-DL model approached the average performance of radiologists and was superior to the radiologist whose manual segmentation had the min-Dice and min-JI. Moreover, the average Dice and average JI achieved by the H-DL model were significantly higher than those achieved by the individual shallow U-DL model (Dice of 0.745 ± 0.139, JI of 0.611 ± 0.161; p < 0.05) or the individual deep U-DL model alone (Dice of 0.739 ± 0.145, JI of 0.604 ± 0.163; p < 0.05). CONCLUSION: Our newly developed H-DL model outperformed the individual shallow or deep U-DL models. The H-DL method combining multilevel features learned by both the shallow and deep DCNNs could achieve segmentation accuracy comparable to radiologists' segmentation for nodules with wide ranges of image characteristics.
Assuntos
Aprendizado Profundo , Nódulo Pulmonar Solitário , Nódulo Pulmonar Solitário/diagnóstico , HumanosRESUMO
This observer study investigates the effect of computerized artificial intelligence (AI)-based decision support system (CDSS-T) on physicians' diagnostic accuracy in assessing bladder cancer treatment response. The performance of 17 observers was evaluated when assessing bladder cancer treatment response without and with CDSS-T using pre- and post-chemotherapy CTU scans in 123 patients having 157 pre- and post-treatment cancer pairs. The impact of cancer case difficulty, observers' clinical experience, institution affiliation, specialty, and the assessment times on the observers' diagnostic performance with and without using CDSS-T were analyzed. It was found that the average performance of the 17 observers was significantly improved (p = 0.002) when aided by the CDSS-T. The cancer case difficulty, institution affiliation, specialty, and the assessment times influenced the observers' performance without CDSS-T. The AI-based decision support system has the potential to improve the diagnostic accuracy in assessing bladder cancer treatment response and result in more consistent performance among all physicians.
Assuntos
Sistemas de Apoio a Decisões Clínicas , Neoplasias da Bexiga Urinária , Inteligência Artificial , Humanos , Tomografia Computadorizada por Raios X , Neoplasias da Bexiga Urinária/diagnóstico por imagem , Neoplasias da Bexiga Urinária/terapia , UrografiaRESUMO
OBJECTIVES: To compare radiologists' sensitivity, confidence level, and reading efficiency of detecting microcalcifications in digital breast tomosynthesis (DBT) at two clinically relevant dose levels. MATERIALS AND METHODS: Six 5-cm-thick heterogeneous breast phantoms embedded with a total of 144 simulated microcalcification clusters of four speck sizes were imaged at two dose modes by a clinical DBT system. The DBT volumes at the two dose levels were read independently by six MQSA radiologists and one fellow with 1-33 years (median 12 years) of experience in a fully-crossed counter-balanced manner. The radiologist located each potential cluster and rated its conspicuity and his/her confidence that the marked location contained a cluster. The differences in the results between the two dose modes were analyzed by two-tailed paired t-test. RESULTS: Compared to the lower-dose mode, the average glandular dose in the higher-dose mode for the 5-cm phantoms increased from 1.34 to 2.07 mGy. The detection sensitivity increased for all speck sizes and significantly for the two smaller sizes (p <0.05). An average of 13.8% fewer false positive clusters was marked. The average conspicuity rating and the radiologists' confidence level were higher for all speck sizes and reached significance (p <0.05) for the three larger sizes. The average reading time per detected cluster reduced significantly (p <0.05) by an average of 13.2%. CONCLUSION: For a 5-cm-thick breast, an increase in average glandular dose from 1.34 to 2.07 mGy for DBT imaging increased the conspicuity of microcalcifications, improved the detection sensitivity by radiologists, increased their confidence levels, reduced false positive detections, and increased the reading efficiency.
Assuntos
Neoplasias da Mama , Calcinose , Mama/diagnóstico por imagem , Calcinose/diagnóstico por imagem , Feminino , Humanos , Masculino , Mamografia/métodos , Imagens de Fantasmas , RadiologistasRESUMO
Lung cancer is by far the leading cause of cancer death in the US. Recent studies have demonstrated the effectiveness of screening using low dose CT (LDCT) in reducing lung cancer related mortality. While lung nodules are detected with a high rate of sensitivity, this exam has a low specificity rate and it is still difficult to separate benign and malignant lesions. The ISBI 2018 Lung Nodule Malignancy Prediction Challenge, developed by a team from the Quantitative Imaging Network of the National Cancer Institute, was focused on the prediction of lung nodule malignancy from two sequential LDCT screening exams using automated (non-manual) algorithms. We curated a cohort of 100 subjects who participated in the National Lung Screening Trial and had established pathological diagnoses. Data from 30 subjects were randomly selected for training and the remaining was used for testing. Participants were evaluated based on the area under the receiver operating characteristic curve (AUC) of nodule-wise malignancy scores generated by their algorithms on the test set. The challenge had 17 participants, with 11 teams submitting reports with method description, mandated by the challenge rules. Participants used quantitative methods, resulting in a reporting test AUC ranging from 0.698 to 0.913. The top five contestants used deep learning approaches, reporting an AUC between 0.87 - 0.91. The team's predictor did not achieve significant differences from each other nor from a volume change estimate (p =.05 with Bonferroni-Holm's correction).
Assuntos
Neoplasias Pulmonares , Nódulo Pulmonar Solitário , Algoritmos , Humanos , Pulmão , Neoplasias Pulmonares/diagnóstico por imagem , Curva ROC , Nódulo Pulmonar Solitário/diagnóstico por imagem , Tomografia Computadorizada por Raios XRESUMO
(1) Purpose: The objective was to evaluate CT perfusion and radiomic features for prediction of one year disease free survival in laryngeal and hypopharyngeal cancer. (2) Method and Materials: This retrospective study included pre and post therapy CT neck studies in 36 patients with laryngeal/hypopharyngeal cancer. Tumor contouring was performed semi-autonomously by the computer and manually by two radiologists. Twenty-six radiomic features including morphological and gray-level features were extracted by an internally developed and validated computer-aided image analysis system. The five perfusion features analyzed included permeability surface area product (PS), blood flow (flow), blood volume (BV), mean transit time (MTT), and time-to-maximum (Tmax). One year persistent/recurrent disease data were obtained following the final treatment of definitive chemoradiation or after total laryngectomy. We performed a two-loop leave-one-out feature selection and linear discriminant analysis classifier with generation of receiver operating characteristic (ROC) curves and confidence intervals (CI). (3) Results: 10 patients (28%) had recurrence/persistent disease at 1 year. For prediction, the change in blood flow demonstrated a training AUC of 0.68 (CI 0.47-0.85) and testing AUC of 0.66 (CI 0.47-0.85). The best features selected were a combination of perfusion and radiomic features including blood flow and computer-estimated percent volume changes-training AUC of 0.68 (CI 0.5-0.85) and testing AUC of 0.69 (CI 0.5-0.85). The laryngoscopic percent change in volume was a poor predictor with a testing AUC of 0.4 (CI 0.16-0.57). (4) Conclusions: A combination of CT perfusion and radiomic features are potential predictors of one-year disease free survival in laryngeal and hypopharyngeal cancer patients.
Assuntos
Neoplasias Hipofaríngeas , Intervalo Livre de Doença , Humanos , Neoplasias Hipofaríngeas/diagnóstico por imagem , Neoplasias Hipofaríngeas/cirurgia , Recidiva Local de Neoplasia , Perfusão , Projetos Piloto , Estudos Retrospectivos , Tomografia Computadorizada por Raios XRESUMO
PURPOSE: Transfer learning is commonly used in deep learning for medical imaging to alleviate the problem of limited available data. In this work, we studied the risk of feature leakage and its dependence on sample size when using pretrained deep convolutional neural network (DCNN) as feature extractor for classification breast masses in mammography. METHODS: Feature leakage occurs when the training set is used for feature selection and classifier modeling while the cost function is guided by the validation performance or informed by the test performance. The high-dimensional feature space extracted from pretrained DCNN suffers from the curse of dimensionality; feature subsets that can provide excessively optimistic performance can be found for the validation set or test set if the latter is allowed for unlimited reuse during algorithm development. We designed a simulation study to examine feature leakage when using DCNN as feature extractor for mass classification in mammography. Four thousand five hundred and seventy-seven unique mass lesions were partitioned by patient into three sets: 3222 for training, 508 for validation, and 847 for independent testing. Three pretrained DCNNs, AlexNet, GoogLeNet, and VGG16, were first compared using a training set in fourfold cross validation and one was selected as the feature extractor. To assess generalization errors, the independent test set was sequestered as truly unseen cases. A training set of a range of sizes from 10% to 75% was simulated by random drawing from the available training set in addition to 100% of the training set. Three commonly used feature classifiers, the linear discriminant, the support vector machine, and the random forest were evaluated. A sequential feature selection method was used to find feature subsets that could achieve high classification performance in terms of the area under the receiver operating characteristic curve (AUC) in the validation set. The extent of feature leakage and the impact of training set size were analyzed by comparison to the performance in the unseen test set. RESULTS: All three classifiers showed large generalization error between the validation set and the independent sequestered test set at all sample sizes. The generalization error decreased as the sample size increased. At 100% of the sample size, one classifier achieved an AUC as high as 0.91 on the validation set while the corresponding performance on the unseen test set only reached an AUC of 0.72. CONCLUSIONS: Our results demonstrate that large generalization errors can occur in AI tools due to feature leakage. Without evaluation on unseen test cases, optimistically biased performance may be reported inadvertently, and can lead to unrealistic expectations and reduce confidence for clinical implementation.
Assuntos
Mamografia , Redes Neurais de Computação , Algoritmos , Mama/diagnóstico por imagem , Humanos , Tamanho da AmostraRESUMO
We evaluated the intraobserver variability of physicians aided by a computerized decision-support system for treatment response assessment (CDSS-T) to identify patients who show complete response to neoadjuvant chemotherapy for bladder cancer, and the effects of the intraobserver variability on physicians' assessment accuracy. A CDSS-T tool was developed that uses a combination of deep learning neural network and radiomic features from computed tomography (CT) scans to detect bladder cancers that have fully responded to neoadjuvant treatment. Pre- and postchemotherapy CT scans of 157 bladder cancers from 123 patients were collected. In a multireader, multicase observer study, physician-observers estimated the likelihood of pathologic T0 disease by viewing paired pre/posttreatment CT scans placed side by side on an in-house-developed graphical user interface. Five abdominal radiologists, 4 diagnostic radiology residents, 2 oncologists, and 1 urologist participated as observers. They first provided an estimate without CDSS-T and then with CDSS-T. A subset of cases was evaluated twice to study the intraobserver variability and its effects on observer consistency. The mean areas under the curves for assessment of pathologic T0 disease were 0.85 for CDSS-T alone, 0.76 for physicians without CDSS-T and improved to 0.80 for physicians with CDSS-T (P = .001) in the original evaluation, and 0.78 for physicians without CDSS-T and improved to 0.81 for physicians with CDSS-T (P = .010) in the repeated evaluation. The intraobserver variability was significantly reduced with CDSS-T (P < .0001). The CDSS-T can significantly reduce physicians' variability and improve their accuracy for identifying complete response of muscle-invasive bladder cancer to neoadjuvant chemotherapy.
Assuntos
Sistemas de Apoio a Decisões Clínicas , Neoplasias da Bexiga Urinária , Humanos , Variações Dependentes do Observador , Médicos , Tomografia Computadorizada por Raios X , Neoplasias da Bexiga Urinária/diagnóstico por imagem , Neoplasias da Bexiga Urinária/tratamento farmacológicoRESUMO
PURPOSE: Develop a quantitative image analysis method to characterize the heterogeneous patterns of nodule components for the classification of pathological categories of nodules. MATERIALS AND METHODS: With IRB approval and permission of the National Lung Screening Trial (NLST) project, 103 subjects with low dose CT (LDCT) were used in this study. We developed a radiomic quantitative CT attenuation distribution descriptor (qADD) to characterize the heterogeneous patterns of nodule components and a hybrid model (qADD+) that combined qADD with subject demographic data and radiologist-provided nodule descriptors to differentiate aggressive tumors from indolent tumors or benign nodules with pathological categorization as reference standard. The classification performances of qADD and qADDâ¯+â¯were evaluated and compared to the Brock and the Mayo Clinic models by analysis of the area under the receiver operating characteristic curve (AUC). RESULTS: The radiomic features were consistently selected into qADDs to differentiate pathological invasive nodules from (1) preinvasive nodules, (2) benign nodules, and (3) the group of preinvasive and benign nodules, achieving test AUCs of 0.847⯱â¯0.002, 0.842⯱â¯0.002 and 0.810⯱â¯0.001, respectively. The qADDâ¯+â¯obtained test AUCs of 0.867⯱â¯0.002, 0.888⯱â¯0.001 and 0.852⯱â¯0.001, respectively, which were higher than both the Brock and the Mayo Clinic models. CONCLUSION: The pathologic invasiveness of lung tumors could be categorized according to the CT attenuation distribution patterns of the nodule components manifested on LDCT images, and the majority of invasive lung cancers could be identified at baseline LDCT scans.
Assuntos
Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Nódulos Pulmonares Múltiplos/diagnóstico por imagem , Nódulos Pulmonares Múltiplos/patologia , Tomografia Computadorizada por Raios X/métodos , Idoso , Área Sob a Curva , Diagnóstico Diferencial , Feminino , Humanos , Pulmão/diagnóstico por imagem , Pulmão/patologia , Masculino , Pessoa de Meia-Idade , Curva ROC , Doses de RadiaçãoRESUMO
Computer-aided diagnosis (CAD) has been a major field of research for the past few decades. CAD uses machine learning methods to analyze imaging and/or nonimaging patient data and makes assessment of the patient's condition, which can then be used to assist clinicians in their decision-making process. The recent success of the deep learning technology in machine learning spurs new research and development efforts to improve CAD performance and to develop CAD for many other complex clinical tasks. In this paper, we discuss the potential and challenges in developing CAD tools using deep learning technology or artificial intelligence (AI) in general, the pitfalls and lessons learned from CAD in screening mammography and considerations needed for future implementation of CAD or AI in clinical use. It is hoped that the past experiences and the deep learning technology will lead to successful advancement and lasting growth in this new era of CAD, thereby enabling CAD to deliver intelligent aids to improve health care.
Assuntos
Aprendizado Profundo , Diagnóstico por Computador/métodos , HumanosRESUMO
Deep convolutional neural network (DCNN), now popularly called artificial intelligence (AI), has shown the potential to improve over previous computer-assisted tools in medical imaging developed in the past decades. A DCNN has millions of free parameters that need to be trained, but the training sample set is limited in size for most medical imaging tasks so that transfer learning is typically used. Automatic data mining may be an efficient way to enlarge the collected data set but the data can be noisy such as incorrect labels or even a wrong type of image. In this work we studied the generalization error of DCNN with transfer learning in medical imaging for the task of classifying malignant and benign masses on mammograms. With a finite available data set, we simulated a training set containing corrupted data or noisy labels. The balance between learning and memorization of the DCNN was manipulated by varying the proportion of corrupted data in the training set. The generalization error of DCNN was analyzed by the area under the receiver operating characteristic curve for the training and test sets and the weight changes after transfer learning. The study demonstrates that the transfer learning strategy of DCNN for such tasks needs to be designed properly, taking into consideration the constraints of the available training set having limited size and quality for the classification task at hand, to minimize memorization and improve generalizability.
Assuntos
Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Feminino , Humanos , Mamografia , Curva ROCRESUMO
Computer-aided diagnosis (CAD) has been a popular area of research and development in the past few decades. In CAD, machine learning methods and multidisciplinary knowledge and techniques are used to analyze the patient information and the results can be used to assist clinicians in their decision making process. CAD may analyze imaging information alone or in combination with other clinical data. It may provide the analyzed information directly to the clinician or correlate the analyzed results with the likelihood of certain diseases based on statistical modeling of the past cases in the population. CAD systems can be developed to provide decision support for many applications in the patient care processes, such as lesion detection, characterization, cancer staging, treatment planning and response assessment, recurrence and prognosis prediction. The new state-of-the-art machine learning technique, known as deep learning (DL), has revolutionized speech and text recognition as well as computer vision. The potential of major breakthrough by DL in medical image analysis and other CAD applications for patient care has brought about unprecedented excitement of applying CAD, or artificial intelligence (AI), to medicine in general and to radiology in particular. In this paper, we will provide an overview of the recent developments of CAD using DL in breast imaging and discuss some challenges and practical issues that may impact the advancement of artificial intelligence and its integration into clinical workflow.