RESUMEN
PURPOSE: To discuss the worldwide applications and potential impact of artificial intelligence (AI) for the diagnosis, management and analysis of treatment outcomes of common retinal diseases. METHODS: We performed an online literature review, using PubMed Central (PMC), of AI applications to evaluate and manage retinal diseases. Search terms included AI for screening, diagnosis, monitoring, management, and treatment outcomes for age-related macular degeneration (AMD), diabetic retinopathy (DR), retinal surgery, retinal vascular disease, retinopathy of prematurity (ROP) and sickle cell retinopathy (SCR). Additional search terms included AI and color fundus photographs, optical coherence tomography (OCT), and OCT angiography (OCTA). We included original research articles and review articles. RESULTS: Research studies have investigated and shown the utility of AI for screening for diseases such as DR, AMD, ROP, and SCR. Research studies using validated and labeled datasets confirmed AI algorithms could predict disease progression and response to treatment. Studies showed AI facilitated rapid and quantitative interpretation of retinal biomarkers seen on OCT and OCTA imaging. Research articles suggest AI may be useful for planning and performing robotic surgery. Studies suggest AI holds the potential to help lessen the impact of socioeconomic disparities on the outcomes of retinal diseases. CONCLUSIONS: AI applications for retinal diseases can assist the clinician, not only by disease screening and monitoring for disease recurrence but also in quantitative analysis of treatment outcomes and prediction of treatment response. The public health impact on the prevention of blindness from DR, AMD, and other retinal vascular diseases remains to be determined.
Asunto(s)
Inteligencia Artificial , Interpretación de Imagen Asistida por Computador , Tamizaje Masivo , Enfermedades de la Retina , Enfermedades de la Retina/diagnóstico por imagen , Enfermedades de la Retina/terapia , Tamizaje Masivo/métodos , Biomarcadores/análisis , Progresión de la Enfermedad , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Retina/diagnóstico por imagenRESUMEN
BACKGROUND: Late gadolinium enhancement (LGE) of the myocardium has significant diagnostic and prognostic implications, with even small areas of enhancement being important. Distinguishing between definitely normal and definitely abnormal LGE images is usually straightforward, but diagnostic uncertainty arises when reporters are not sure whether the observed LGE is genuine or not. This uncertainty might be resolved by repetition (to remove artifact) or further acquisition of intersecting images, but this must take place before the scan finishes. Real-time quality assurance by humans is a complex task requiring training and experience, so being able to identify which images have an intermediate likelihood of LGE while the scan is ongoing, without the presence of an expert is of high value. This decision-support could prompt immediate image optimization or acquisition of supplementary images to confirm or refute the presence of genuine LGE. This could reduce ambiguity in reports. METHODS: Short-axis, phase-sensitive inversion recovery late gadolinium images were extracted from our clinical cardiac magnetic resonance (CMR) database and shuffled. Two, independent, blinded experts scored each individual slice for "LGE likelihood" on a visual analog scale, from 0 (absolute certainty of no LGE) to 100 (absolute certainty of LGE), with 50 representing clinical equipoise. The scored images were split into two classes-either "high certainty" of whether LGE was present or not, or "low certainty." The dataset was split into training, validation, and test sets (70:15:15). A deep learning binary classifier based on the EfficientNetV2 convolutional neural network architecture was trained to distinguish between these categories. Classifier performance on the test set was evaluated by calculating the accuracy, precision, recall, F1-score, and area under the receiver operating characteristics curve (ROC AUC). Performance was also evaluated on an external test set of images from a different center. RESULTS: One thousand six hundred and forty-five images (from 272 patients) were labeled and split at the patient level into training (1151 images), validation (247 images), and test (247 images) sets for the deep learning binary classifier. Of these, 1208 images were "high certainty" (255 for LGE, 953 for no LGE), and 437 were "low certainty". An external test comprising 247 images from 41 patients from another center was also employed. After 100 epochs, the performance on the internal test set was accuracy = 0.94, recall = 0.80, precision = 0.97, F1-score = 0.87, and ROC AUC = 0.94. The classifier also performed robustly on the external test set (accuracy = 0.91, recall = 0.73, precision = 0.93, F1-score = 0.82, and ROC AUC = 0.91). These results were benchmarked against a reference inter-expert accuracy of 0.86. CONCLUSION: Deep learning shows potential to automate quality control of late gadolinium imaging in CMR. The ability to identify short-axis images with intermediate LGE likelihood in real-time may serve as a useful decision-support tool. This approach has the potential to guide immediate further imaging while the patient is still in the scanner, thereby reducing the frequency of recalls and inconclusive reports due to diagnostic indecision.
Asunto(s)
Medios de Contraste , Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador , Valor Predictivo de las Pruebas , Humanos , Medios de Contraste/administración & dosificación , Reproducibilidad de los Resultados , Interpretación de Imagen Asistida por Computador/normas , Bases de Datos Factuales , Miocardio/patología , Masculino , Femenino , Imagen por Resonancia Cinemagnética/normas , Persona de Mediana Edad , Cardiopatías/diagnóstico por imagen , Garantía de la Calidad de Atención de Salud/normas , Variaciones Dependientes del Observador , Anciano , Imagen por Resonancia Magnética/normasRESUMEN
To develop a fully automatic model capable of reliably quantifying epicardial adipose tissue (EAT) volumes and attenuation in large scale population studies to investigate their relation to markers of cardiometabolic risk. Non-contrast cardiac CT images from the SCAPIS study were used to train and test a convolutional neural network based model to quantify EAT by: segmenting the pericardium, suppressing noise-induced artifacts in the heart chambers, and, if image sets were incomplete, imputing missing EAT volumes. The model achieved a mean Dice coefficient of 0.90 when tested against expert manual segmentations on 25 image sets. Tested on 1400 image sets, the model successfully segmented 99.4% of the cases. Automatic imputation of missing EAT volumes had an error of less than 3.1% with up to 20% of the slices in image sets missing. The most important predictors of EAT volumes were weight and waist, while EAT attenuation was predicted mainly by EAT volume. A model with excellent performance, capable of fully automatic handling of the most common challenges in large scale EAT quantification has been developed. In studies of the importance of EAT in disease development, the strong co-variation with anthropometric measures needs to be carefully considered.
Asunto(s)
Tejido Adiposo/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático , Pericardio/diagnóstico por imagen , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/normas , Masculino , Tamizaje Masivo/métodos , Persona de Mediana Edad , Programas Informáticos/normasRESUMEN
Histological stratification in metastatic non-small cell lung cancer (NSCLC) is essential to properly guide therapy. Morphological evaluation remains the basis for subtyping and is completed by additional immunohistochemistry labelling to confirm the diagnosis, which delays molecular analysis and utilises precious sample. Therefore, we tested the capacity of convolutional neural networks (CNNs) to classify NSCLC based on pathologic HES diagnostic biopsies. The model was estimated with a learning cohort of 132 NSCLC patients and validated on an external validation cohort of 65 NSCLC patients. Based on image patches, a CNN using InceptionV3 architecture was trained and optimized to classify NSCLC between squamous and non-squamous subtypes. Accuracies of 0.99, 0.87, 0.85, 0.85 was reached in the training, validation and test sets and in the external validation cohort. At the patient level, the CNN model showed a capacity to predict the tumour histology with accuracy of 0.73 and 0.78 in the learning and external validation cohorts respectively. Selecting tumour area using virtual tissue micro-array improved prediction, with accuracy of 0.82 in the external validation cohort. This study underlines the capacity of CNN to predict NSCLC subtype with good accuracy and to be applied to small pathologic samples without annotation.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/patología , Carcinoma de Células Escamosas/patología , Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático/normas , Carcinoma de Pulmón de Células no Pequeñas/clasificación , Carcinoma de Células Escamosas/clasificación , Humanos , Interpretación de Imagen Asistida por Computador/normas , Inmunohistoquímica/métodos , Sensibilidad y Especificidad , Programas Informáticos/normasRESUMEN
Importance: The Gleason grading system has been the most reliable tool for the prognosis of prostate cancer since its development. However, its clinical application remains limited by interobserver variability in grading and quantification, which has negative consequences for risk assessment and clinical management of prostate cancer. Objective: To examine the impact of an artificial intelligence (AI)-assisted approach to prostate cancer grading and quantification. Design, Setting, and Participants: This diagnostic study was conducted at the University of Wisconsin-Madison from August 2, 2017, to December 30, 2019. The study chronologically selected 589 men with biopsy-confirmed prostate cancer who received care in the University of Wisconsin Health System between January 1, 2005, and February 28, 2017. A total of 1000 biopsy slides (1 or 2 slides per patient) were selected and scanned to create digital whole-slide images, which were used to develop and validate a deep convolutional neural network-based AI-powered platform. The whole-slide images were divided into a training set (n = 838) and validation set (n = 162). Three experienced academic urological pathologists (W.H., K.A.I., and R.H., hereinafter referred to as pathologists 1, 2, and 3, respectively) were involved in the validation. Data were collected between December 29, 2018, and December 20, 2019, and analyzed from January 4, 2020, to March 1, 2021. Main Outcomes and Measures: Accuracy of prostate cancer detection by the AI-powered platform and comparison of prostate cancer grading and quantification performed by the 3 pathologists using manual vs AI-assisted methods. Results: Among 589 men with biopsy slides, the mean (SD) age was 63.8 (8.2) years, the mean (SD) prebiopsy prostate-specific antigen level was 10.2 (16.2) ng/mL, and the mean (SD) total cancer volume was 15.4% (20.1%). The AI system was able to distinguish prostate cancer from benign prostatic epithelium and stroma with high accuracy at the patch-pixel level, with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.88-0.95). The AI system achieved almost perfect agreement with the training pathologist (pathologist 1) in detecting prostate cancer at the patch-pixel level (weighted κ = 0.97; asymptotic 95% CI, 0.96-0.98) and in grading prostate cancer at the slide level (weighted κ = 0.98; asymptotic 95% CI, 0.96-1.00). Use of the AI-assisted method was associated with significant improvements in the concordance of prostate cancer grading and quantification between the 3 pathologists (eg, pathologists 1 and 2: 90.1% agreement using AI-assisted method vs 84.0% agreement using manual method; P < .001) and significantly higher weighted κ values for all pathologists (eg, pathologists 2 and 3: weighted κ = 0.92 [asymptotic 95% CI, 0.90-0.94] for AI-assisted method vs 0.76 [asymptotic 95% CI, 0.71-0.80] for manual method; P < .001) compared with the manual method. Conclusions and Relevance: In this diagnostic study, an AI-powered platform was able to detect, grade, and quantify prostate cancer with high accuracy and efficiency and was associated with significant reductions in interobserver variability. These results suggest that an AI-powered platform could potentially transform histopathological evaluation and improve risk stratification and clinical management of prostate cancer.
Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Clasificación del Tumor/métodos , Neoplasias de la Próstata/patología , Anciano , Anciano de 80 o más Años , Algoritmos , Inteligencia Artificial , Humanos , Interpretación de Imagen Asistida por Computador/normas , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , WisconsinRESUMEN
Although recent scientific studies suggest that artificial intelligence (AI) could provide value in many radiology applications, much of the hard engineering work required to consistently realize this value in practice remains to be done. In this article, we summarize the various ways in which AI can benefit radiology practice, identify key challenges that must be overcome for those benefits to be delivered, and discuss promising avenues by which these challenges can be addressed.
Asunto(s)
Inteligencia Artificial/normas , Diagnóstico por Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Radiología/métodos , Radiología/normas , Diagnóstico por Imagen/normas , Humanos , Interpretación de Imagen Asistida por Computador/normas , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
Artificial intelligence technology promises to redefine the practice of radiology. However, it exists in a nascent phase and remains largely untested in the clinical space. This nature is both a cause and consequence of the uncertain legal-regulatory environment it enters. This discussion aims to shed light on these challenges, tracing the various pathways toward approval by the US Food and Drug Administration, the future of government oversight, privacy issues, ethical dilemmas, and practical considerations related to implementation in radiologist practice.
Asunto(s)
Inteligencia Artificial/legislación & jurisprudencia , Diagnóstico por Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Radiología/legislación & jurisprudencia , Diagnóstico por Imagen/normas , Humanos , Interpretación de Imagen Asistida por Computador/normas , Estados Unidos , United States Food and Drug AdministrationRESUMEN
Psoriasis is a chronic inflammatory skin disease that occurs in various forms throughout the body and is associated with certain conditions such as heart disease, diabetes, and depression. The psoriasis area severity index (PASI) score, a tool used to evaluate the severity of psoriasis, is currently used in clinical trials and clinical research. The determination of severity is based on the subjective judgment of the clinician. Thus, the disease evaluation deviations are induced. Therefore, we propose optimal algorithms that can effectively segment the lesion area and classify the severity. In addition, a new dataset on psoriasis was built, including patch images of erythema and scaling. We performed psoriasis lesion segmentation and classified the disease severity. In addition, we evaluated the best-performing segmentation method and classifier and analyzed features that are highly related to the severity of psoriasis. In conclusion, we presented the optimal techniques for evaluating the severity of psoriasis. Our newly constructed dataset improved the generalization performance of psoriasis diagnosis and evaluation. It proposed an optimal system for specific evaluation indicators of the disease and a quantitative PASI scoring method. The proposed system can help to evaluate the severity of localized psoriasis more accurately.
Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Psoriasis/diagnóstico , Piel/diagnóstico por imagen , Piel/patología , Área Bajo la Curva , Toma de Decisiones Clínicas , Manejo de la Enfermedad , Eritema/patología , Humanos , Interpretación de Imagen Asistida por Computador/normas , Procesamiento de Imagen Asistido por Computador , Psoriasis/etiología , Índice de Severidad de la EnfermedadRESUMEN
PURPOSE: To facilitate the demonstration of the prognostic value of radiomics, multicenter radiomics studies are needed. Pooling radiomic features of such data in a statistical analysis is however challenging, as they are sensitive to the variability in scanner models, acquisition protocols and reconstruction settings, which is often unavoidable in a multicentre retrospective analysis. A statistical harmonization strategy called ComBat was utilized in radiomics studies to deal with the "center-effect". The goal of the present work was to integrate a transfer learning (TL) technique within ComBat-and recently developed alternate versions of ComBat with improved flexibility (M-ComBat) and robustness (B-ComBat)-to allow the use of a previously determined harmonization transform to the radiomic feature values of new patients from an already known center. MATERIAL AND METHODS: The proposed TL approach were incorporated in the four versions of ComBat (standard, B, M, and B-M ComBat). The proposed approach was evaluated using a dataset of 189 locally advanced cervical cancer patients from 3 centers, with magnetic resonance imaging (MRI) and positron emission tomography (PET) images, with the clinical endpoint of predicting local failure. The impact performance of the TL approach was evaluated by comparing the harmonization achieved using only parts of the data to the reference (harmonization achieved using all the available data). It was performed through three different machine learning pipelines. RESULTS: The proposed TL technique was successful in harmonizing features of new patients from a known center in all versions of ComBat, leading to predictive models reaching similar performance as the ones developed using the features harmonized with all the data available. CONCLUSION: The proposed TL approach enables applying a previously determined ComBat transform to new, previously unseen data.
Asunto(s)
Cuello del Útero/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/normas , Aprendizaje Automático/normas , Neoplasias del Cuello Uterino/diagnóstico , Adulto , Anciano , Anciano de 80 o más Años , Cuello del Útero/patología , Quimioradioterapia/métodos , Conjuntos de Datos como Asunto , Sistemas de Apoyo a Decisiones Clínicas/normas , Sistemas de Apoyo a Decisiones Clínicas/estadística & datos numéricos , Femenino , Estudios de Seguimiento , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Aprendizaje Automático/estadística & datos numéricos , Imagen por Resonancia Magnética/normas , Imagen por Resonancia Magnética/estadística & datos numéricos , Persona de Mediana Edad , Tomografía de Emisión de Positrones/normas , Tomografía de Emisión de Positrones/estadística & datos numéricos , Estudios Retrospectivos , Tomografía Computarizada por Rayos X/normas , Tomografía Computarizada por Rayos X/estadística & datos numéricos , Resultado del Tratamiento , Neoplasias del Cuello Uterino/terapia , Adulto JovenRESUMEN
The substantial improvement in the efficiency of switching filters, intended for the removal of impulsive noise within color images is described. Numerous noisy pixel detection and replacement techniques are evaluated, where the filtering performance for color images and subsequent results are assessed using statistical reasoning. Denoising efficiency for the applied detection and interpolation techniques are assessed when the location of corrupted pixels are identified by noisy pixel detection algorithms and also in the scenario when they are already known. The results show that improvement in objective quality measures can be achieved by using more robust detection techniques, combined with novel methods of corrupted pixel restoration. A significant increase in the image denoising performance is achieved for both pixel detection and interpolation, surpassing current filtering methods especially via the application of a convolutional network. The interpolation techniques used in the image inpainting methods also significantly increased the efficiency of impulsive noise removal.
Asunto(s)
Algoritmos , Aumento de la Imagen/normas , Interpretación de Imagen Asistida por Computador/normas , Procesamiento de Señales Asistido por Computador/instrumentación , Relación Señal-Ruido , HumanosRESUMEN
Intraepidermal nerve fiber density (IENFD) measurements in skin biopsy are performed manually by 1-3 operators. To improve diagnostic accuracy and applicability in clinical practice, we developed an automated method for fast IENFD determination with low operator-dependency. Sixty skin biopsy specimens were stained with the axonal marker PGP9.5 and imaged using a widefield fluorescence microscope. IENFD was first determined manually by 3 independent observers. Subsequently, images were processed in their Z-max projection and the intradermal line was delineated automatically. IENFD was calculated automatically (fluorescent images automated counting [FIAC]) and compared with manual counting on the same fluorescence images (fluorescent images manual counting [FIMC]), and with classical manual counting (CMC) data. A FIMC showed lower variability among observers compared with CMC (interclass correlation [ICC] = 0.996 vs 0.950). FIMC and FIAC showed high reliability (ICC = 0.999). A moderate-to-high (ICC = 0.705) was observed between CMC and FIAC counting. The algorithm process took on average 15 seconds to perform FIAC counting, compared with 10 minutes for FIMC counting. This automated method rapidly and reliably detects small nerve fibers in skin biopsies with clear advantages over the classical manual technique.
Asunto(s)
Axones/patología , Epidermis/patología , Interpretación de Imagen Asistida por Computador/métodos , Algoritmos , Axones/metabolismo , Biopsia/métodos , Epidermis/inervación , Humanos , Interpretación de Imagen Asistida por Computador/normas , Microscopía Fluorescente/métodos , Ubiquitina Tiolesterasa/metabolismoRESUMEN
Fast magnetic resonance imaging (MRI) is crucial for clinical applications that can alleviate motion artefacts and increase patient throughput. K-space undersampling is an obvious approach to accelerate MR acquisition. However, undersampling of k-space data can result in blurring and aliasing artefacts for the reconstructed images. Recently, several studies have been proposed to use deep learning-based data-driven models for MRI reconstruction and have obtained promising results. However, the comparison of these methods remains limited because the models have not been trained on the same datasets and the validation strategies may be different. The purpose of this work is to conduct a comparative study to investigate the generative adversarial network (GAN)-based models for MRI reconstruction. We reimplemented and benchmarked four widely used GAN-based architectures including DAGAN, ReconGAN, RefineGAN and KIGAN. These four frameworks were trained and tested on brain, knee and liver MRI images using twofold, fourfold and sixfold accelerations, respectively, with a random undersampling mask. Both quantitative evaluations and qualitative visualization have shown that the RefineGAN method has achieved superior performance in reconstruction with better accuracy and perceptual quality compared to other GAN-based methods. This article is part of the theme issue 'Synergistic tomographic image reconstruction: part 1'.
Asunto(s)
Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador/métodos , Imagen por Resonancia Magnética/métodos , Redes Neurales de la Computación , Algoritmos , Benchmarking , Encéfalo/diagnóstico por imagen , Compresión de Datos , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Interpretación de Imagen Asistida por Computador/normas , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Rodilla/diagnóstico por imagen , Funciones de Verosimilitud , Modelos Lineales , Hígado/diagnóstico por imagen , Imagen por Resonancia Magnética/normas , Imagen por Resonancia Magnética/estadística & datos numéricosRESUMEN
Importance: Interstitial fibrosis and tubular atrophy (IFTA) is a strong indicator of decline in kidney function and is measured using histopathological assessment of kidney biopsy core. At present, a noninvasive test to assess IFTA is not available. Objective: To develop and validate a deep learning (DL) algorithm to quantify IFTA from kidney ultrasonography images. Design, Setting, and Participants: This was a single-center diagnostic study of consecutive patients who underwent native kidney biopsy at John H. Stroger Jr. Hospital of Cook County, Chicago, Illinois, between January 1, 2014, and December 31, 2018. A DL algorithm was trained, validated, and tested to classify IFTA from kidney ultrasonography images. Of 6135 Crimmins-filtered ultrasonography images, 5523 were used for training (5122 images) and validation (401 images), and 612 were used to test the accuracy of the DL system. Kidney segmentation was performed using the UNet architecture, and classification was performed using a convolution neural network-based feature extractor and extreme gradient boosting. IFTA scored by a nephropathologist on trichrome stained kidney biopsy slide was used as the reference standard. IFTA was divided into 4 grades (grade 1, 0%-24%; grade 2, 25%-49%; grade 3, 50%-74%; and grade 4, 75%-100%). Data analysis was performed from December 2019 to May 2020. Main Outcomes and Measures: Prediction of IFTA grade was measured using the metrics precision, recall, accuracy, and F1 score. Results: This study included 352 patients (mean [SD] age 47.43 [14.37] years), of whom 193 (54.82%) were women. There were 159 patients with IFTA grade 1 (2701 ultrasonography images), 74 patients with IFTA grade 2 (1239 ultrasonography images), 41 patients with IFTA grade 3 (701 ultrasonography images), and 78 patients with IFTA grade 4 (1494 ultrasonography images). Kidney ultrasonography images were segmented with 91% accuracy. In the independent test set, the point estimates for performance matrices showed precision of 0.8927 (95% CI, 0.8682-0.9172), recall of 0.8037 (95% CI, 0.7722-0.8352), accuracy of 0.8675 (95% CI, 0.8406-0.8944), and an F1 score of 0.8389 (95% CI, 0.8098-0.8680) at the image level. Corresponding estimates at the patient level were precision of 0.9003 (95% CI, 0.8644-0.9362), recall of 0.8421 (95% CI, 0.7984-0.8858), accuracy of 0.8955 (95% CI, 0.8589-0.9321), and an F1 score of 0.8639 (95% CI, 0.8228-0.9049). Accuracy at the patient level was highest for IFTA grade 1 and IFTA grade 4. The accuracy (approximately 90%) remained high irrespective of the timing of ultrasonography studies and the biopsy diagnosis. The predictive performance of the DL system did not show significant improvement when combined with baseline clinical characteristics. Conclusions and Relevance: These findings suggest that a DL algorithm can accurately and independently predict IFTA from kidney ultrasonography images.
Asunto(s)
Algoritmos , Biopsia/normas , Aprendizaje Profundo , Fibrosis/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/normas , Enfermedades Renales/diagnóstico por imagen , Ultrasonografía/normas , Adulto , Chicago , Femenino , Fibrosis/fisiopatología , Humanos , Enfermedades Renales/complicaciones , Enfermedades Renales/fisiopatología , Masculino , Persona de Mediana Edad , Guías de Práctica Clínica como Asunto/normasRESUMEN
Volumetric estimates of subcortical and cortical structures, extracted from T1-weighted MRIs, are widely used in many clinical and research applications. Here, we investigate the impact of the presence of white matter hyperintensities (WMHs) on FreeSurfer gray matter (GM) structure volumes and its possible bias on functional relationships. T1-weighted images from 1,077 participants (4,321 timepoints) from the Alzheimer's Disease Neuroimaging Initiative were processed with FreeSurfer version 6.0.0. WMHs were segmented using a previously validated algorithm on either T2-weighted or Fluid-attenuated inversion recovery images. Mixed-effects models were used to assess the relationships between overlapping WMHs and GM structure volumes and overall WMH burden, as well as to investigate whether such overlaps impact associations with age, diagnosis, and cognitive performance. Participants with higher WMH volumes had higher overlaps with GM volumes of bilateral caudate, cerebral cortex, putamen, thalamus, pallidum, and accumbens areas (p < .0001). When not corrected for WMHs, caudate volumes increased with age (p < .0001) and were not different between cognitively healthy individuals and age-matched probable Alzheimer's disease patients. After correcting for WMHs, caudate volumes decreased with age (p < .0001), and Alzheimer's disease patients had lower caudate volumes than cognitively healthy individuals (p < .01). Uncorrected caudate volume was not associated with ADAS13 scores, whereas corrected lower caudate volumes were significantly associated with poorer cognitive performance (p < .0001). Presence of WMHs leads to systematic inaccuracies in GM segmentations, particularly for the caudate, which can also change clinical associations. While specifically measured for the Freesurfer toolkit, this problem likely affects other algorithms.
Asunto(s)
Enfermedad de Alzheimer , Sustancia Gris , Interpretación de Imagen Asistida por Computador/normas , Leucoaraiosis , Imagen por Resonancia Magnética/normas , Neuroimagen/normas , Anciano , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/patología , Sustancia Gris/diagnóstico por imagen , Sustancia Gris/patología , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Leucoaraiosis/diagnóstico por imagen , Leucoaraiosis/patología , Estudios Longitudinales , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodosRESUMEN
OBJECTIVES: Despite its use in determining nigrostriatal degeneration, the lack of a consistent interpretation of nigrosome 1 susceptibility map-weighted imaging (SMwI) limits its generalized applicability. To implement and evaluate a diagnostic algorithm based on convolutional neural networks for interpreting nigrosome 1 SMwI for determining nigrostriatal degeneration in idiopathic Parkinson's disease (IPD). METHODS: In this retrospective study, we enrolled 267 IPD patients and 160 control subjects (125 patients with drug-induced parkinsonism and 35 healthy subjects) at our institute, and 24 IPD patients and 27 control subjects at three other institutes on approval of the local institutional review boards. Dopamine transporter imaging served as the reference standard for the presence or absence of abnormalities of nigrosome 1 on SMwI. Diagnostic performance was compared between visual assessment by an experienced neuroradiologist and the developed deep learning-based diagnostic algorithm in both internal and external datasets using a bootstrapping method with 10000 re-samples by the "pROC" package of R (version 1.16.2). RESULTS: The area under the receiver operating characteristics curve (AUC) (95% confidence interval [CI]) per participant by the bootstrap method was not significantly different between visual assessment and the deep learning-based algorithm (internal validation, .9622 [0.8912-1.0000] versus 0.9534 [0.8779-0.9956], P = .1511; external validation, 0.9367 [0.8843-0.9802] versus 0.9208 [0.8634-0.9693], P = .6267), indicative of a comparable performance to visual assessment. CONCLUSIONS: Our deep learning-based algorithm for assessing abnormalities of nigrosome 1 on SMwI was found to have a comparable performance to that of an experienced neuroradiologist.
Asunto(s)
Aprendizaje Profundo , Interpretación de Imagen Asistida por Computador , Imagen por Resonancia Magnética , Enfermedad de Parkinson Secundaria/diagnóstico por imagen , Enfermedad de Parkinson/diagnóstico por imagen , Sustancia Negra/diagnóstico por imagen , Anciano , Proteínas de Transporte de Dopamina a través de la Membrana Plasmática/farmacocinética , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/normas , Masculino , Persona de Mediana Edad , Enfermedad de Parkinson Secundaria/inducido químicamente , Tomografía de Emisión de Positrones , Reproducibilidad de los Resultados , Estudios Retrospectivos , TropanosRESUMEN
Deep brain stimulation (DBS) surgery has been shown to dramatically improve the quality of life for patients with various motor dysfunctions, such as those afflicted with Parkinson's disease (PD), dystonia, and essential tremor (ET), by relieving motor symptoms associated with such pathologies. The success of DBS procedures is directly related to the proper placement of the electrodes, which requires the ability to accurately detect and identify relevant target structures within the subcortical basal ganglia region. In particular, accurate and reliable segmentation of the globus pallidus (GP) interna is of great interest for DBS surgery for PD and dystonia. In this study, we present a deep-learning based neural network, which we term GP-net, for the automatic segmentation of both the external and internal segments of the globus pallidus. High resolution 7 Tesla images from 101 subjects were used in this study; GP-net is trained on a cohort of 58 subjects, containing patients with movement disorders as well as healthy control subjects. GP-net performs 3D inference in a patient-specific manner, alleviating the need for atlas-based segmentation. GP-net was extensively validated, both quantitatively and qualitatively over 43 test subjects including patients with movement disorders and healthy control and is shown to consistently produce improved segmentation results compared with state-of-the-art atlas-based segmentations. We also demonstrate a postoperative lead location assessment with respect to a segmented globus pallidus obtained by GP-net.
Asunto(s)
Aprendizaje Profundo , Globo Pálido/anatomía & histología , Globo Pálido/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador , Imagen por Resonancia Magnética , Trastornos del Movimiento/diagnóstico por imagen , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/normas , Masculino , Persona de Mediana Edad , Trastornos del Movimiento/patología , Reproducibilidad de los Resultados , Adulto JovenRESUMEN
Machine learning has greatly facilitated the analysis of medical data, while the internal operations usually remain intransparent. To better comprehend these opaque procedures, a convolutional neural network for optical coherence tomography image segmentation was enhanced with a Traceable Relevance Explainability (T-REX) technique. The proposed application was based on three components: ground truth generation by multiple graders, calculation of Hamming distances among graders and the machine learning algorithm, as well as a smart data visualization ('neural recording'). An overall average variability of 1.75% between the human graders and the algorithm was found, slightly minor to 2.02% among human graders. The ambiguity in ground truth had noteworthy impact on machine learning results, which could be visualized. The convolutional neural network balanced between graders and allowed for modifiable predictions dependent on the compartment. Using the proposed T-REX setup, machine learning processes could be rendered more transparent and understandable, possibly leading to optimized applications.
Asunto(s)
Aprendizaje Profundo , Aprendizaje Automático , Tomografía de Coherencia Óptica , Adulto , Algoritmos , Animales , Inteligencia Artificial , Competencia Clínica , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Macaca fascicularis , Masculino , Persona de Mediana Edad , Imagen Multimodal/métodos , Imagen Multimodal/tendencias , Redes Neurales de la Computación , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Retina/diagnóstico por imagen , Retina/patología , Enfermedades de la Retina/diagnóstico , Enfermedades de la Retina/epidemiología , Estudios Retrospectivos , Tomografía de Coherencia Óptica/métodos , Tomografía de Coherencia Óptica/estadística & datos numéricosRESUMEN
BACKGROUND: Deep neural networks (DNNs) are widely investigated in medical image classification to achieve automated support for clinical diagnosis. It is necessary to evaluate the robustness of medical DNN tasks against adversarial attacks, as high-stake decision-making will be made based on the diagnosis. Several previous studies have considered simple adversarial attacks. However, the vulnerability of DNNs to more realistic and higher risk attacks, such as universal adversarial perturbation (UAP), which is a single perturbation that can induce DNN failure in most classification tasks has not been evaluated yet. METHODS: We focus on three representative DNN-based medical image classification tasks (i.e., skin cancer, referable diabetic retinopathy, and pneumonia classifications) and investigate their vulnerability to the seven model architectures of UAPs. RESULTS: We demonstrate that DNNs are vulnerable to both nontargeted UAPs, which cause a task failure resulting in an input being assigned an incorrect class, and to targeted UAPs, which cause the DNN to classify an input into a specific class. The almost imperceptible UAPs achieved > 80% success rates for nontargeted and targeted attacks. The vulnerability to UAPs depended very little on the model architecture. Moreover, we discovered that adversarial retraining, which is known to be an effective method for adversarial defenses, increased DNNs' robustness against UAPs in only very few cases. CONCLUSION: Unlike previous assumptions, the results indicate that DNN-based clinical diagnosis is easier to deceive because of adversarial attacks. Adversaries can cause failed diagnoses at lower costs (e.g., without consideration of data distribution); moreover, they can affect the diagnosis. The effects of adversarial defenses may not be limited. Our findings emphasize that more careful consideration is required in developing DNNs for medical imaging and their practical applications.
Asunto(s)
Diagnóstico por Imagen/clasificación , Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Redes Neurales de la Computación , Retinopatía Diabética/clasificación , Retinopatía Diabética/diagnóstico por imagen , Diagnóstico por Imagen/normas , Humanos , Fotograbar/clasificación , Neumonía/clasificación , Neumonía/diagnóstico por imagen , Radiografía Torácica/clasificación , Neoplasias Cutáneas/clasificación , Neoplasias Cutáneas/diagnóstico por imagen , Tomografía de Coherencia Óptica/clasificaciónRESUMEN
CONTEXT: Whole slide imaging (WSI) is an important component of digital pathology which includes digitization of glass slides and their storage as digital images. Implementation of WSI for primary surgical pathology diagnosis is evolving, following various studies which have evaluated the feasibility of WSI technology for primary diagnosis. AIMS, SETTINGS AND DESIGN: The present study was a single-center, observational study which included evaluation by three pathologists and aimed at assessing concordance on specialty-specific diagnosis and comparison of time taken for diagnosis on WSI and conventional light microscopy (CLM). MATERIALS AND METHODS: Seventy prostate core biopsy slides (reported between January 2016 and December 2016) were scanned using Pannoramic MIDI II scanner, 3DHISTECH, Budapest, Hungary, at 20× and 40×. Sixty slides were used for validation study following training with 10 slides. STATISTICAL ANALYSIS USED: Intraobserver concordance for diagnosis between the two platforms of evaluation was analyzed using Cohen's κ statistics and intraclass correlation coefficient (ICC); observation time for diagnosis was compared by Wilcoxon signed-rank test. RESULTS: Interpretation on WSI using 20× and 40× was comparable with no major discordance. A high level of intraobserver agreement was observed between CLM and WSI for all three observers, both for primary diagnosis (κ = 0.9) and Grade group (κ = 0.7-0.8) in cases of prostatic adenocarcinoma. The major discordance rate between CLM and WSI was 3.3%-8.3%, which reflected the expertise of the observers. The time spent for diagnosis using WSI was variable for the three pathologists. CONCLUSION: WSI is comparable to CLM and can be safely incorporated for primary histological diagnosis of prostate core biopsies.
Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Interpretación de Imagen Asistida por Computador/normas , Patología Quirúrgica/métodos , Patología Quirúrgica/normas , Próstata/patología , Neoplasias de la Próstata/diagnóstico , Adenocarcinoma/diagnóstico , Biopsia con Aguja Gruesa , Humanos , Interpretación de Imagen Asistida por Computador/instrumentación , Masculino , Microscopía/instrumentación , Microscopía/métodos , Microscopía/normas , Variaciones Dependientes del Observador , Patólogos , Patología Clínica/métodos , Patología Quirúrgica/instrumentaciónRESUMEN
OBJECTIVES: The ongoing global severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic necessitates adaptations in the practice of surgical pathology at scale. Primary diagnosis by whole-slide imaging (WSI) is a key component that would aid departments in providing uninterrupted histopathology diagnosis and maintaining revenue streams from disruption. We sought to perform rapid validation of the use of WSI in primary diagnosis meeting recommendations of the College of American Pathologists guidelines. METHODS: Glass slides from clinically reported cases from 5 participating pathologists with a preset washout period were digitally scanned and reviewed in settings identical to typical reporting. Cases were classified as concordant or with minor or major disagreement with the original diagnosis. Randomized subsampling was performed, and mean concordance rates were calculated. RESULTS: In total, 171 cases were included and distributed equally among participants. For the group as a whole, the mean concordance rate in sampled cases (n = 90) was 83.6% counting all discrepancies and 94.6% counting only major disagreements. The mean pathologist concordance rate in sampled cases (n = 18) ranged from 90.49% to 97%. CONCLUSIONS: We describe a novel double-blinded method for rapid validation of WSI for primary diagnosis. Our findings highlight the occurrence of a range of diagnostic reproducibility when deploying digital methods.