Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters











Publication year range
1.
J Electrocardiol ; 87: 153792, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39255653

ABSTRACT

INTRODUCTION: Deep learning (DL) models offer improved performance in electrocardiogram (ECG)-based classification over rule-based methods. However, for widespread adoption by clinicians, explainability methods, like saliency maps, are essential. METHODS: On a subset of 100 ECGs from patients with chest pain, we generated saliency maps using a previously validated convolutional neural network for occlusion myocardial infarction (OMI) classification. Three clinicians reviewed ECG-saliency map dyads, first assessing the likelihood of OMI from standard ECGs and then evaluating clinical relevance and helpfulness of the saliency maps, as well as their confidence in the model's predictions. Questions were answered on a Likert scale ranging from +3 (most useful/relevant) to -3 (least useful/relevant). RESULTS: The adjudicated accuracy of the three clinicians matched the DL model when considering area under the receiver operating characteristics curve (AUC) and F1 score (AUC 0.855 vs. 0.872, F1 score = 0.789 vs. 0.747). On average, clinicians found saliency maps slightly clinically relevant (0.96 ± 0.92) and slightly helpful (0.66 ± 0.98) in identifying or ruling out OMI but had higher confidence in the model's predictions (1.71 ± 0.56). Clinicians noted that leads I and aVL were often emphasized, even when obvious ST changes were present in other leads. CONCLUSION: In this clinical usability study, clinicians deemed saliency maps somewhat helpful in enhancing explainability of DL-based ECG models. The spatial convolutional layers across the 12 leads in these models appear to contribute to the discrepancy between ECG segments considered most relevant by clinicians and segments that drove DL model predictions.

2.
IEEE Access ; 12: 91410-91425, 2024.
Article in English | MEDLINE | ID: mdl-39054996

ABSTRACT

Mental illness has grown to become a prevalent and global health concern that affects individuals across various demographics. Timely detection and accurate diagnosis of mental disorders are crucial for effective treatment and support as late diagnosis could result in suicidal, harmful behaviors and ultimately death. To this end, the present study introduces a novel pipeline for the analysis of facial expressions, leveraging both the AffectNet and 2013 Facial Emotion Recognition (FER) datasets. Consequently, this research goes beyond traditional diagnostic methods by contributing a system capable of generating a comprehensive mental disorder dataset and concurrently predicting mental disorders based on facial emotional cues. Particularly, we introduce a hybrid architecture for mental disorder detection leveraging the state-of-the-art object detection algorithm, YOLOv8 to detect and classify visual cues associated with specific mental disorders. To achieve accurate predictions, an integrated learning architecture based on the fusion of Convolution Neural Networks (CNNs) and Visual Transformer (ViT) models is developed to form an ensemble classifier that predicts the presence of mental illness (e.g., depression, anxiety, and other mental disorder). The overall accuracy is improved to about 81% using the proposed ensemble technique. To ensure transparency and interpretability, we integrate techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) and saliency maps to highlight the regions in the input image that significantly contribute to the model's predictions thus providing healthcare professionals with a clear understanding of the features influencing the system's decisions thereby enhancing trust and more informed diagnostic process.

3.
Asia Pac J Ophthalmol (Phila) ; 13(4): 100087, 2024.
Article in English | MEDLINE | ID: mdl-39069106

ABSTRACT

PURPOSE: Saliency maps (SM) allow clinicians to better understand the opaque decision-making process in artificial intelligence (AI) models by visualising the important features responsible for predictions. This ultimately improves interpretability and confidence. In this work, we review the use case for SMs, exploring their impact on clinicians' understanding and trust in AI models. We use the following ophthalmic conditions as examples: (1) glaucoma, (2) myopia, (3) age-related macular degeneration, and (4) diabetic retinopathy. METHOD: A multi-field search on MEDLINE, Embase, and Web of Science was conducted using specific keywords. Only studies on the use of SMs in glaucoma, myopia, AMD, or DR were considered for inclusion. RESULTS: Findings reveal that SMs are often used to validate AI models and advocate for their adoption, potentially leading to biased claims. Overlooking the technical limitations of SMs, and the conductance of superficial assessments of their quality and relevance, was discerned. Uncertainties persist regarding the role of saliency maps in building trust in AI. It is crucial to enhance understanding of SMs' technical constraints and improve evaluation of their quality, impact, and suitability for specific tasks. Establishing a standardised framework for selecting and assessing SMs, as well as exploring their relationship with other reliability sources (e.g. safety and generalisability), is essential for enhancing clinicians' trust in AI. CONCLUSION: We conclude that SMs are not beneficial for interpretability and trust-building purposes in their current forms. Instead, SMs may confer benefits to model debugging, model performance enhancement, and hypothesis testing (e.g. novel biomarkers).


Subject(s)
Artificial Intelligence , Ophthalmologists , Humans , Trust , Glaucoma/physiopathology
4.
Sci Rep ; 14(1): 11893, 2024 05 24.
Article in English | MEDLINE | ID: mdl-38789575

ABSTRACT

Although the value of adding AI as a surrogate second reader in various scenarios has been investigated, it is unknown whether implementing an AI tool within double reading practice would capture additional subtle cancers missed by both radiologists who independently assessed the mammograms. This paper assesses the effectiveness of two state-of-the-art Artificial Intelligence (AI) models in detecting retrospectively-identified missed cancers within a screening program employing double reading practices. The study also explores the agreement between AI and radiologists in locating the lesions, considering various levels of concordance among the radiologists in locating the lesions. The Globally-aware Multiple Instance Classifier (GMIC) and Global-Local Activation Maps (GLAM) models were fine-tuned for our dataset. We evaluated the sensitivity of both models on missed cancers retrospectively identified by a panel of three radiologists who reviewed prior examinations of 729 cancer cases detected in a screening program with double reading practice. Two of these experts annotated the lesions, and based on their concordance levels, cases were categorized as 'almost perfect,' 'substantial,' 'moderate,' and 'poor.' We employed Similarity or Histogram Intersection (SIM) and Kullback-Leibler Divergence (KLD) metrics to compare saliency maps of malignant cases from the AI model with annotations from radiologists in each category. In total, 24.82% of cancers were labeled as "missed." The performance of GMIC and GLAM on the missed cancer cases was 82.98% and 79.79%, respectively, while for the true screen-detected cancers, the performances were 89.54% and 87.25%, respectively (p-values for the difference in sensitivity < 0.05). As anticipated, SIM and KLD from saliency maps were best in 'almost perfect,' followed by 'substantial,' 'moderate,' and 'poor.' Both GMIC and GLAM (p-values < 0.05) exhibited greater sensitivity at higher concordance. Even in a screening program with independent double reading, adding AI could potentially identify missed cancers. However, the challenging-to-locate lesions for radiologists impose a similar challenge for AI.


Subject(s)
Artificial Intelligence , Breast Neoplasms , Early Detection of Cancer , Mammography , Humans , Mammography/methods , Female , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/diagnosis , Retrospective Studies , Early Detection of Cancer/methods , Middle Aged , Aged , Radiographic Image Interpretation, Computer-Assisted/methods , Sensitivity and Specificity
5.
Bioengineering (Basel) ; 11(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38790320

ABSTRACT

In recent years, deep convolutional neural networks (DCNNs) have shown promising performance in medical image analysis, including breast lesion classification in 2D ultrasound (US) images. Despite the outstanding performance of DCNN solutions, explaining their decisions remains an open investigation. Yet, the explainability of DCNN models has become essential for healthcare systems to accept and trust the models. This paper presents a novel framework for explaining DCNN classification decisions of lesions in ultrasound images using the saliency maps linking the DCNN decisions to known cancer characteristics in the medical domain. The proposed framework consists of three main phases. First, DCNN models for classification in ultrasound images are built. Next, selected methods for visualization are applied to obtain saliency maps on the input images of the DCNN models. In the final phase, the visualization outputs and domain-known cancer characteristics are mapped. The paper then demonstrates the use of the framework for breast lesion classification from ultrasound images. We first follow the transfer learning approach and build two DCNN models. We then analyze the visualization outputs of the trained DCNN models using the EGrad-CAM and Ablation-CAM methods. We map the DCNN model decisions of benign and malignant lesions through the visualization outputs to the characteristics such as echogenicity, calcification, shape, and margin. A retrospective dataset of 1298 US images collected from different hospitals is used to evaluate the effectiveness of the framework. The test results show that these characteristics contribute differently to the benign and malignant lesions' decisions. Our study provides the foundation for other researchers to explain the DCNN classification decisions of other cancer types.

6.
Artif Intell Med ; 151: 102862, 2024 05.
Article in English | MEDLINE | ID: mdl-38579437

ABSTRACT

We present a novel methodology for integrating high resolution longitudinal data with the dynamic prediction capabilities of survival models. The aim is two-fold: to improve the predictive power while maintaining the interpretability of the models. To go beyond the black box paradigm of artificial neural networks, we propose a parsimonious and robust semi-parametric approach (i.e., a landmarking competing risks model) that combines routinely collected low-resolution data with predictive features extracted from a convolutional neural network, that was trained on high resolution time-dependent information. We then use saliency maps to analyze and explain the extra predictive power of this model. To illustrate our methodology, we focus on healthcare-associated infections in patients admitted to an intensive care unit.


Subject(s)
Intensive Care Units , Neural Networks, Computer , Humans , Intensive Care Units/organization & administration , Cross Infection
7.
Diagnostics (Basel) ; 14(3)2024 Feb 05.
Article in English | MEDLINE | ID: mdl-38337861

ABSTRACT

Alzheimer's disease (AD) is a progressive neurodegenerative disorder that affects millions of individuals worldwide, causing severe cognitive decline and memory impairment. The early and accurate diagnosis of AD is crucial for effective intervention and disease management. In recent years, deep learning techniques have shown promising results in medical image analysis, including AD diagnosis from neuroimaging data. However, the lack of interpretability in deep learning models hinders their adoption in clinical settings, where explainability is essential for gaining trust and acceptance from healthcare professionals. In this study, we propose an explainable AI (XAI)-based approach for the diagnosis of Alzheimer's disease, leveraging the power of deep transfer learning and ensemble modeling. The proposed framework aims to enhance the interpretability of deep learning models by incorporating XAI techniques, allowing clinicians to understand the decision-making process and providing valuable insights into disease diagnosis. By leveraging popular pre-trained convolutional neural networks (CNNs) such as VGG16, VGG19, DenseNet169, and DenseNet201, we conducted extensive experiments to evaluate their individual performances on a comprehensive dataset. The proposed ensembles, Ensemble-1 (VGG16 and VGG19) and Ensemble-2 (DenseNet169 and DenseNet201), demonstrated superior accuracy, precision, recall, and F1 scores compared to individual models, reaching up to 95%. In order to enhance interpretability and transparency in Alzheimer's diagnosis, we introduced a novel model achieving an impressive accuracy of 96%. This model incorporates explainable AI techniques, including saliency maps and grad-CAM (gradient-weighted class activation mapping). The integration of these techniques not only contributes to the model's exceptional accuracy but also provides clinicians and researchers with visual insights into the neural regions influencing the diagnosis. Our findings showcase the potential of combining deep transfer learning with explainable AI in the realm of Alzheimer's disease diagnosis, paving the way for more interpretable and clinically relevant AI models in healthcare.

8.
Cancers (Basel) ; 16(2)2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38254813

ABSTRACT

This paper investigates the adaptability of four state-of-the-art artificial intelligence (AI) models to the Australian mammographic context through transfer learning, explores the impact of image enhancement on model performance and analyses the relationship between AI outputs and histopathological features for clinical relevance and accuracy assessment. A total of 1712 screening mammograms (n = 856 cancer cases and n = 856 matched normal cases) were used in this study. The 856 cases with cancer lesions were annotated by two expert radiologists and the level of concordance between their annotations was used to establish two sets: a 'high-concordances subset' with 99% agreement of cancer location and an 'entire dataset' with all cases included. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of Globally aware Multiple Instance Classifier (GMIC), Global-Local Activation Maps (GLAM), I&H and End2End AI models, both in the pretrained and transfer learning modes, with and without applying the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm. The four AI models with and without transfer learning in the high-concordance subset outperformed those in the entire dataset. Applying the CLAHE algorithm to mammograms improved the performance of the AI models. In the high-concordance subset with the transfer learning and CLAHE algorithm applied, the AUC of the GMIC model was highest (0.912), followed by the GLAM model (0.909), I&H (0.893) and End2End (0.875). There were significant differences (p < 0.05) in the performances of the four AI models between the high-concordance subset and the entire dataset. The AI models demonstrated significant differences in malignancy probability concerning different tumour size categories in mammograms. The performance of AI models was affected by several factors such as concordance classification, image enhancement and transfer learning. Mammograms with a strong concordance with radiologists' annotations, applying image enhancement and transfer learning could enhance the accuracy of AI models.

9.
Radiol Artif Intell ; 6(1): e220221, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38166328

ABSTRACT

Purpose To determine whether saliency maps in radiology artificial intelligence (AI) are vulnerable to subtle perturbations of the input, which could lead to misleading interpretations, using prediction-saliency correlation (PSC) for evaluating the sensitivity and robustness of saliency methods. Materials and Methods In this retrospective study, locally trained deep learning models and a research prototype provided by a commercial vendor were systematically evaluated on 191 229 chest radiographs from the CheXpert dataset and 7022 MR images from a human brain tumor classification dataset. Two radiologists performed a reader study on 270 chest radiograph pairs. A model-agnostic approach for computing the PSC coefficient was used to evaluate the sensitivity and robustness of seven commonly used saliency methods. Results The saliency methods had low sensitivity (maximum PSC, 0.25; 95% CI: 0.12, 0.38) and weak robustness (maximum PSC, 0.12; 95% CI: 0.0, 0.25) on the CheXpert dataset, as demonstrated by leveraging locally trained model parameters. Further evaluation showed that the saliency maps generated from a commercial prototype could be irrelevant to the model output, without knowledge of the model specifics (area under the receiver operating characteristic curve decreased by 8.6% without affecting the saliency map). The human observer studies confirmed that it is difficult for experts to identify the perturbed images; the experts had less than 44.8% correctness. Conclusion Popular saliency methods scored low PSC values on the two datasets of perturbed chest radiographs, indicating weak sensitivity and robustness. The proposed PSC metric provides a valuable quantification tool for validating the trustworthiness of medical AI explainability. Keywords: Saliency Maps, AI Trustworthiness, Dynamic Consistency, Sensitivity, Robustness Supplemental material is available for this article. © RSNA, 2023 See also the commentary by Yanagawa and Sato in this issue.


Subject(s)
Artificial Intelligence , Radiology , Humans , Retrospective Studies , Radiography , Radiologists
10.
Front Artif Intell ; 6: 1278118, 2023.
Article in English | MEDLINE | ID: mdl-38106982

ABSTRACT

The accurate and comprehensive mapping of land cover has become a central task in modern environmental research, with increasing emphasis on machine learning approaches. However, a clear technical definition of the land cover class is a prerequisite for learning and applying a machine learning model. One of the challenging classes is naturalness and human influence, yet mapping it is important due to its critical role in biodiversity conservation, habitat assessment, and climate change monitoring. We present an interpretable machine learning approach to map patterns related to territorial protected and anthropogenic areas as proxies of naturalness and human influence using satellite imagery. To achieve this, we train a weakly-supervised convolutional neural network and subsequently apply attribution methods such as Grad-CAM and occlusion sensitivity mapping. We propose a novel network architecture that consists of an image-to-image network and a shallow, task-specific head. Both sub-networks are connected by an intermediate layer that captures high-level features in full resolution, allowing for detailed analysis with a wide range of attribution methods. We further analyze how intermediate layer activations relate to their attributions across the training dataset to establish a consistent relationship. This makes attributions consistent across different scenes and allows for a large-scale analysis of remote sensing data. The results highlight that our approach is a promising way to observe and assess naturalness and territorial protection.

11.
JMIR Dermatol ; 6: e42129, 2023 Aug 24.
Article in English | MEDLINE | ID: mdl-37616039

ABSTRACT

BACKGROUND: Previous research studies have demonstrated that medical content image retrieval can play an important role by assisting dermatologists in skin lesion diagnosis. However, current state-of-the-art approaches have not been adopted in routine consultation, partly due to the lack of interpretability limiting trust by clinical users. OBJECTIVE: This study developed a new image retrieval architecture for polarized or dermoscopic imaging guided by interpretable saliency maps. This approach provides better feature extraction, leading to better quantitative retrieval performance as well as providing interpretability for an eventual real-world implementation. METHODS: Content-based image retrieval (CBIR) algorithms rely on the comparison of image features embedded by convolutional neural network (CNN) against a labeled data set. Saliency maps are computer vision-interpretable methods that highlight the most relevant regions for the prediction made by a neural network. By introducing a fine-tuning stage that includes saliency maps to guide feature extraction, the accuracy of image retrieval is optimized. We refer to this approach as saliency-enhanced CBIR (SE-CBIR). A reader study was designed at the University Hospital Zurich Dermatology Clinic to evaluate SE-CBIR's retrieval accuracy as well as the impact of the participant's confidence on the diagnosis. RESULTS: SE-CBIR improved the retrieval accuracy by 7% (77% vs 84%) when doing single-lesion retrieval against traditional CBIR. The reader study showed an overall increase in classification accuracy of 22% (62% vs 84%) when the participant is provided with SE-CBIR retrieved images. In addition, the overall confidence in the lesion's diagnosis increased by 24%. Finally, the use of SE-CBIR as a support tool helped the participants reduce the number of nonmelanoma lesions previously diagnosed as melanoma (overdiagnosis) by 53%. CONCLUSIONS: SE-CBIR presents better retrieval accuracy compared to traditional CBIR CNN-based approaches. Furthermore, we have shown how these support tools can help dermatologists and residents improve diagnosis accuracy and confidence. Additionally, by introducing interpretable methods, we should expect increased acceptance and use of these tools in routine consultation.

12.
J King Saud Univ Comput Inf Sci ; 35(7): 101596, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37275558

ABSTRACT

COVID-19 is a contagious disease that affects the human respiratory system. Infected individuals may develop serious illnesses, and complications may result in death. Using medical images to detect COVID-19 from essentially identical thoracic anomalies is challenging because it is time-consuming, laborious, and prone to human error. This study proposes an end-to-end deep-learning framework based on deep feature concatenation and a Multi-head Self-attention network. Feature concatenation involves fine-tuning the pre-trained backbone models of DenseNet, VGG-16, and InceptionV3, which are trained on a large-scale ImageNet, whereas a Multi-head Self-attention network is adopted for performance gain. End-to-end training and evaluation procedures are conducted using the COVID-19_Radiography_Dataset for binary and multi-classification scenarios. The proposed model achieved overall accuracies (96.33% and 98.67%) and F1_scores (92.68% and 98.67%) for multi and binary classification scenarios, respectively. In addition, this study highlights the difference in accuracy (98.0% vs. 96.33%) and F_1 score (97.34% vs. 95.10%) when compared with feature concatenation against the highest individual model performance. Furthermore, a virtual representation of the saliency maps of the employed attention mechanism focusing on the abnormal regions is presented using explainable artificial intelligence (XAI) technology. The proposed framework provided better COVID-19 prediction results outperforming other recent deep learning models using the same dataset.

13.
Mol Autism ; 14(1): 5, 2023 02 09.
Article in English | MEDLINE | ID: mdl-36759875

ABSTRACT

BACKGROUND: Attenuated social attention is a key marker of autism spectrum disorder (ASD). Recent neuroimaging findings also emphasize an altered processing of sensory salience in ASD. The locus coeruleus-norepinephrine system (LC-NE) has been established as a modulator of this sensory salience processing (SSP). We tested the hypothesis that altered LC-NE functioning contributes to different SSP and results in diverging social attention in ASD. METHODS: We analyzed the baseline eye-tracking data of the EU-AIMS Longitudinal European Autism Project (LEAP) for subgroups of autistic participants (n = 166, age = 6-30 years, IQ = 61-138, gender [female/male] = 41/125) or neurotypical development (TD; n = 166, age = 6-30 years, IQ = 63-138, gender [female/male] = 49/117) that were matched for demographic variables and data quality. Participants watched brief movie scenes (k = 85) depicting humans in social situations (human) or without humans (non-human). SSP was estimated by gazes on physical and motion salience and a corresponding pupillary response that indexes phasic activity of the LC-NE. Social attention is estimated by gazes on faces via manual areas of interest definition. SSP is compared between groups and related to social attention by linear mixed models that consider temporal dynamics within scenes. Models are controlled for comorbid psychopathology, gaze behavior, and luminance. RESULTS: We found no group differences in gazes on salience, whereas pupillary responses were associated with altered gazes on physical and motion salience. In ASD compared to TD, we observed pupillary responses that were higher for non-human scenes and lower for human scenes. In ASD, we observed lower gazes on faces across the duration of the scenes. Crucially, this different social attention was influenced by gazes on physical salience and moderated by pupillary responses. LIMITATIONS: The naturalistic study design precluded experimental manipulations and stimulus control, while effect sizes were small to moderate. Covariate effects of age and IQ indicate that the findings differ between age and developmental subgroups. CONCLUSIONS: Pupillary responses as a proxy of LC-NE phasic activity during visual attention are suggested to modulate sensory salience processing and contribute to attenuated social attention in ASD.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Humans , Male , Female , Case-Control Studies , Sensation , Norepinephrine
14.
Front Artif Intell ; 5: 903875, 2022.
Article in English | MEDLINE | ID: mdl-35910188

ABSTRACT

One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different perturbation-based saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy, which includes long-term decision making. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the expected future reward after doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2,600 games. The approaches are compared using two computational metrics: dependence on the learned parameters of the underlying deep Q-network of the agents (sanity checks) and fidelity to the agents' reasoning (input degradation). During the sanity checks, we found that a popular noise-based saliency map approach for DRL agents shows little dependence on the parameters of the output layer. We demonstrate that this can be fixed by tweaking the algorithm such that it focuses on specific actions instead of the general entropy within the output values. For fidelity, we identify two main factors that influence which saliency map approach should be chosen in which situation. Particular to value-based DRL agents, we show that analyzing the agents' choice of action requires different saliency map approaches than analyzing the agents' state value estimation.

15.
Front Syst Neurosci ; 16: 882315, 2022.
Article in English | MEDLINE | ID: mdl-35712044

ABSTRACT

Finding objects is essential for almost any daily-life visual task. Saliency models have been useful to predict fixation locations in natural images during a free-exploring task. However, it is still challenging to predict the sequence of fixations during visual search. Bayesian observer models are particularly suited for this task because they represent visual search as an active sampling process. Nevertheless, how they adapt to natural images remains largely unexplored. Here, we propose a unified Bayesian model for visual search guided by saliency maps as prior information. We validated our model with a visual search experiment in natural scenes. We showed that, although state-of-the-art saliency models performed well in predicting the first two fixations in a visual search task ( 90% of the performance achieved by humans), their performance degraded to chance afterward. Therefore, saliency maps alone could model bottom-up first impressions but they were not enough to explain scanpaths when top-down task information was critical. In contrast, our model led to human-like performance and scanpaths as revealed by: first, the agreement between targets found by the model and the humans on a trial-by-trial basis; and second, the scanpath similarity between the model and the humans, that makes the behavior of the model indistinguishable from that of humans. Altogether, the combination of deep neural networks based saliency models for image processing and a Bayesian framework for scanpath integration probes to be a powerful and flexible approach to model human behavior in natural scenarios.

16.
J Digit Imaging ; 35(5): 1164-1175, 2022 10.
Article in English | MEDLINE | ID: mdl-35484439

ABSTRACT

Occlusion-based saliency maps (OBSMs) are one of the approaches for interpreting decision-making process of an artificial intelligence (AI) system. This study explores the agreement among text responses from a cohort of radiologists to describe diagnostically relevant areas on low-dose CT (LDCT) images. It also explores if radiologists' descriptions of cases misclassified by the AI provide a rationale for ruling out the AI's output. The OBSM indicating the importance of different pixels on the final decision made by an AI were generated for 10 benign cases (3 misclassified by the AI tool as malignant) and 10 malignant cases (2 misclassified by the AI tool as benign). Thirty-six radiologists were asked to use radiological vocabulary, typical to reporting LDCT scans, to describe the mapped regions of interest (ROI). The radiologists' annotations were then grouped by using a clustering-based technique. Topics were extracted from the annotations and for each ROI, a percentage of annotations containing each topic were found. Radiologists annotated 17 and 24 unique ROIs on benign and malignant cases, respectively. Agreement on the main label (e.g., "vessel," "nodule") by radiologists was only seen in only in 12% of all areas (5/41 ROI). Topic analyses identified six descriptors which are commonly associated with a lower malignancy likelihood. Eight common topics related to a higher malignancy likelihood were also determined. Occlusion-based saliency maps were used to explain an AI decision-making process to radiologists, who in turn have provided insight into the level of agreement between the AI's decision and radiological lexicon.


Subject(s)
Artificial Intelligence , Lung Neoplasms , Humans , Early Detection of Cancer/methods , Lung Neoplasms/diagnostic imaging , Radiologists , Tomography, X-Ray Computed/methods
17.
Med Image Anal ; 77: 102364, 2022 04.
Article in English | MEDLINE | ID: mdl-35101727

ABSTRACT

Deep neural networks (DNNs) have achieved physician-level accuracy on many imaging-based medical diagnostic tasks, for example classification of retinal images in ophthalmology. However, their decision mechanisms are often considered impenetrable leading to a lack of trust by clinicians and patients. To alleviate this issue, a range of explanation methods have been proposed to expose the inner workings of DNNs leading to their decisions. For imaging-based tasks, this is often achieved via saliency maps. The quality of these maps are typically evaluated via perturbation analysis without experts involved. To facilitate the adoption and success of such automated systems, however, it is crucial to validate saliency maps against clinicians. In this study, we used three different network architectures and developed ensembles of DNNs to detect diabetic retinopathy and neovascular age-related macular degeneration from retinal fundus images and optical coherence tomography scans, respectively. We used a variety of explanation methods and obtained a comprehensive set of saliency maps for explaining the ensemble-based diagnostic decisions. Then, we systematically validated saliency maps against clinicians through two main analyses - a direct comparison of saliency maps with the expert annotations of disease-specific pathologies and perturbation analyses using also expert annotations as saliency maps. We found the choice of DNN architecture and explanation method to significantly influence the quality of saliency maps. Guided Backprop showed consistently good performance across disease scenarios and DNN architectures, suggesting that it provides a suitable starting point for explaining the decisions of DNNs on retinal images.


Subject(s)
Diabetic Retinopathy , Ophthalmology , Diabetic Retinopathy/diagnostic imaging , Fundus Oculi , Humans , Neural Networks, Computer , Tomography, Optical Coherence/methods
18.
Front Neuroimaging ; 1: 1012639, 2022.
Article in English | MEDLINE | ID: mdl-37555149

ABSTRACT

Contrast and texture modifications applied during training or test-time have recently shown promising results to enhance the generalization performance of deep learning segmentation methods in medical image analysis. However, a deeper understanding of this phenomenon has not been investigated. In this study, we investigated this phenomenon using a controlled experimental setting, using datasets from the Human Connectome Project and a large set of simulated MR protocols, in order to mitigate data confounders and investigate possible explanations as to why model performance changes when applying different levels of contrast and texture-based modifications. Our experiments confirm previous findings regarding the improved performance of models subjected to contrast and texture modifications employed during training and/or testing time, but further show the interplay when these operations are combined, as well as the regimes of model improvement/worsening across scanning parameters. Furthermore, our findings demonstrate a spatial attention shift phenomenon of trained models, occurring for different levels of model performance, and varying in relation to the type of applied image modification.

19.
Perception ; 51(1): 3-24, 2022 Jan.
Article in English | MEDLINE | ID: mdl-34967251

ABSTRACT

The study of lithic technology can provide information on human cultural evolution. This article aims to analyse visual behaviour associated with the exploration of ancient stone artefacts and how this relates to perceptual mechanisms in humans. In Experiment 1, we used eye tracking to record patterns of eye fixations while participants viewed images of stone tools, including examples of worked pebbles and handaxes. The results showed that the focus of gaze was directed more towards the upper regions of worked pebbles and on the basal areas for handaxes. Knapped surfaces also attracted more fixation than natural cortex for both tool types. Fixation distribution was different to that predicted by models that calculate visual salience. Experiment 2 was an online study using a mouse-click attention tracking technique and included images of unworked pebbles and 'mixed' images combining the handaxe's outline with the pebble's unworked texture. The pattern of clicks corresponded to that revealed using eye tracking and there were differences between tools and other images. Overall, the findings suggest that visual exploration is directed towards functional aspects of tools. Studies of visual attention and exploration can supply useful information to inform understanding of human cognitive evolution and tool use.


Subject(s)
Archaeology , Eye-Tracking Technology , Cognition , Fixation, Ocular , Humans , Technology
20.
Sensors (Basel) ; 21(20)2021 Oct 14.
Article in English | MEDLINE | ID: mdl-34696044

ABSTRACT

Bottom-up saliency models identify the salient regions of an image based on features such as color, intensity and orientation. These models are typically used as predictors of human visual behavior and for computer vision tasks. In this paper, we conduct a systematic evaluation of the saliency maps computed with four selected bottom-up models on images of urban and highway traffic scenes. Saliency both over whole images and on object level is investigated and elaborated in terms of the energy and the entropy of the saliency maps. We identify significant differences with respect to the amount, size and shape-complexity of the salient areas computed by different models. Based on these findings, we analyze the likelihood that object instances fall within the salient areas of an image and investigate the agreement between the segments of traffic participants and the saliency maps of the different models. The overall and object-level analysis provides insights on the distinctive features of salient areas identified by different models, which can be used as selection criteria for prospective applications in autonomous driving such as object detection and tracking.


Subject(s)
Algorithms , Automobile Driving , Humans
SELECTION OF CITATIONS
SEARCH DETAIL