Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Visual explanations for polyp detection: How medical doctors assess intrinsic versus extrinsic explanations.

Hicks, Steven; Storås, Andrea; Riegler, Michael A; Midoglu, Cise; Hammou, Malek; de Lange, Thomas; Parasa, Sravanthi; Halvorsen, Pål; Strümke, Inga.

PLoS One ; 19(5): e0304069, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38820304

RESUMO

Deep learning has achieved immense success in computer vision and has the potential to help physicians analyze visual content for disease and other abnormalities. However, the current state of deep learning is very much a black box, making medical professionals skeptical about integrating these methods into clinical practice. Several methods have been proposed to shed some light on these black boxes, but there is no consensus on the opinion of medical doctors that will consume these explanations. This paper presents a study asking medical professionals about their opinion of current state-of-the-art explainable artificial intelligence methods when applied to a gastrointestinal disease detection use case. We compare two different categories of explanation methods, intrinsic and extrinsic, and gauge their opinion of the current value of these explanations. The results indicate that intrinsic explanations are preferred and that physicians see value in the explanations. Based on the feedback collected in our study, future explanations of medical deep neural networks can be tailored to the needs and expectations of doctors. Hopefully, this will contribute to solving the issue of black box medical systems and lead to successful implementation of this powerful technology in the clinic.

Assuntos

Aprendizado Profundo , Médicos , Humanos , Médicos/psicologia , Inteligência Artificial , Redes Neurais de Computação , Pólipos do Colo/diagnóstico , Colonoscopia/métodos

2.

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition.

Pellano, Kimji N; Strümke, Inga; Ihlen, Espen A F.

Sensors (Basel) ; 24(6)2024 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-38544204

RESUMO

The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human-computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics, namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. This study introduces a perturbation method that produces variations within the error tolerance of motion sensor tracking, ensuring the resultant skeletal data points remain within the plausible output range of human movement as captured by the tracking device. We used the NTU RGB+D 60 dataset and the EfficientGCN architecture for HAR model training and testing. The evaluation involved systematically perturbing the 3D skeleton data by applying controlled displacements at different magnitudes to assess the impact on XAI metric performance across multiple action classes. Our findings reveal that faithfulness may not consistently serve as a reliable metric across all classes for the EfficientGCN model, indicating its limited applicability in certain contexts. In contrast, stability proves to be a more robust metric, showing dependability across different perturbation magnitudes. Additionally, CAM and Grad-CAM yielded almost identical explanations, leading to closely similar metric outcomes. This suggests a need for the exploration of additional metrics and the application of more diverse XAI methods to broaden the understanding and effectiveness of XAI in skeleton-based HAR.

Assuntos

Sistema Musculoesquelético , Humanos , Reprodutibilidade dos Testes , Movimento , Esqueleto , Atividades Humanas

3.

Using machine learning model explanations to identify proteins related to severity of meibomian gland dysfunction.

Storås, Andrea M; Fineide, Fredrik; Magnø, Morten; Thiede, Bernd; Chen, Xiangjun; Strümke, Inga; Halvorsen, Pål; Galtung, Hilde; Jensen, Janicke L; Utheim, Tor P; Riegler, Michael A.

Sci Rep ; 13(1): 22946, 2023 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-38135766

RESUMO

Meibomian gland dysfunction is the most common cause of dry eye disease and leads to significantly reduced quality of life and social burdens. Because meibomian gland dysfunction results in impaired function of the tear film lipid layer, studying the expression of tear proteins might increase the understanding of the etiology of the condition. Machine learning is able to detect patterns in complex data. This study applied machine learning to classify levels of meibomian gland dysfunction from tear proteins. The aim was to investigate proteomic changes between groups with different severity levels of meibomian gland dysfunction, as opposed to only separating patients with and without this condition. An established feature importance method was used to identify the most important proteins for the resulting models. Moreover, a new method that can take the uncertainty of the models into account when creating explanations was proposed. By examining the identified proteins, potential biomarkers for meibomian gland dysfunction were discovered. The overall findings are largely confirmatory, indicating that the presented machine learning approaches are promising for detecting clinically relevant proteins. While this study provides valuable insights into proteomic changes associated with varying severity levels of meibomian gland dysfunction, it should be noted that it was conducted without a healthy control group. Future research could benefit from including such a comparison to further validate and extend the findings presented here.

Assuntos

Síndromes do Olho Seco , Disfunção da Glândula Tarsal , Humanos , Glândulas Tarsais/metabolismo , Proteômica , Qualidade de Vida , Síndromes do Olho Seco/metabolismo , Lágrimas/metabolismo

4.

Usefulness of Heat Map Explanations for Deep-Learning-Based Electrocardiogram Analysis.

Storås, Andrea M; Andersen, Ole Emil; Lockhart, Sam; Thielemann, Roman; Gnesin, Filip; Thambawita, Vajira; Hicks, Steven A; Kanters, Jørgen K; Strümke, Inga; Halvorsen, Pål; Riegler, Michael A.

Diagnostics (Basel) ; 13(14)2023 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-37510089

RESUMO

Deep neural networks are complex machine learning models that have shown promising results in analyzing high-dimensional data such as those collected from medical examinations. Such models have the potential to provide fast and accurate medical diagnoses. However, the high complexity makes deep neural networks and their predictions difficult to understand. Providing model explanations can be a way of increasing the understanding of "black box" models and building trust. In this work, we applied transfer learning to develop a deep neural network to predict sex from electrocardiograms. Using the visual explanation method Grad-CAM, heat maps were generated from the model in order to understand how it makes predictions. To evaluate the usefulness of the heat maps and determine if the heat maps identified electrocardiogram features that could be recognized to discriminate sex, medical doctors provided feedback. Based on the feedback, we concluded that, in our setting, this mode of explainable artificial intelligence does not provide meaningful information to medical doctors and is not useful in the clinic. Our results indicate that improved explanation techniques that are tailored to medical data should be developed before deep neural networks can be applied in the clinic for diagnostic purposes.

5.

Inferring feature importance with uncertainties with application to large genotype data.

Johnsen, Pål Vegard; Strümke, Inga; Langaas, Mette; DeWan, Andrew Thomas; Riemer-Sørensen, Signe.

PLoS Comput Biol ; 19(3): e1010963, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36917581

RESUMO

Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.

Assuntos

Técnicas de Genotipagem , Incerteza

6.

Causal connections between socioeconomic disparities and COVID-19 in the USA.

Banerjee, Tannista; Paul, Ayan; Srikanth, Vishak; Strümke, Inga.

Sci Rep ; 12(1): 15827, 2022 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-36138106

RESUMO

With the increasing use of machine learning models in computational socioeconomics, the development of methods for explaining these models and understanding the causal connections is gradually gaining importance. In this work, we advocate the use of an explanatory framework from cooperative game theory augmented with do calculus, namely causal Shapley values. Using causal Shapley values, we analyze socioeconomic disparities that have a causal link to the spread of COVID-19 in the USA. We study several phases of the disease spread to show how the causal connections change over time. We perform a causal analysis using random effects models and discuss the correspondence between the two methods to verify our results. We show the distinct advantages a non-linear machine learning models have over linear models when performing a multivariate analysis, especially since the machine learning models can map out non-linear correlations in the data. In addition, the causal Shapley values allow for including the causal structure in the variable importance computed for the machine learning model.

Assuntos

COVID-19 , COVID-19/epidemiologia , Causalidade , Humanos , Modelos Lineares , Aprendizado de Máquina , Fatores Socioeconômicos , Estados Unidos/epidemiologia

7.

On evaluation metrics for medical applications of artificial intelligence.

Hicks, Steven A; Strümke, Inga; Thambawita, Vajira; Hammou, Malek; Riegler, Michael A; Halvorsen, Pål; Parasa, Sravanthi.

Sci Rep ; 12(1): 5979, 2022 04 08.

Artigo em Inglês | MEDLINE | ID: mdl-35395867

RESUMO

Clinicians and software developers need to understand how proposed machine learning (ML) models could improve patient care. No single metric captures all the desirable properties of a model, which is why several metrics are typically reported to summarize a model's performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of binary classification in the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.

Assuntos

Inteligência Artificial , Benchmarking , Humanos , Aprendizado de Máquina , Software

8.

Artificial intelligence in dry eye disease.

Storås, Andrea M; Strümke, Inga; Riegler, Michael A; Grauslund, Jakob; Hammer, Hugo L; Yazidi, Anis; Halvorsen, Pål; Gundersen, Kjell G; Utheim, Tor P; Jackson, Catherine J.

Ocul Surf ; 23: 74-86, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34843999

RESUMO

Dry eye disease (DED) has a prevalence of between 5 and 50%, depending on the diagnostic criteria used and population under study. However, it remains one of the most underdiagnosed and undertreated conditions in ophthalmology. Many tests used in the diagnosis of DED rely on an experienced observer for image interpretation, which may be considered subjective and result in variation in diagnosis. Since artificial intelligence (AI) systems are capable of advanced problem solving, use of such techniques could lead to more objective diagnosis. Although the term 'AI' is commonly used, recent success in its applications to medicine is mainly due to advancements in the sub-field of machine learning, which has been used to automatically classify images and predict medical outcomes. Powerful machine learning techniques have been harnessed to understand nuances in patient data and medical images, aiming for consistent diagnosis and stratification of disease severity. This is the first literature review on the use of AI in DED. We provide a brief introduction to AI, report its current use in DED research and its potential for application in the clinic. Our review found that AI has been employed in a wide range of DED clinical tests and research applications, primarily for interpretation of interferometry, slit-lamp and meibography images. While initial results are promising, much work is still needed on model development, clinical testing and standardisation.

Assuntos

Síndromes do Olho Seco , Oftalmologia , Inteligência Artificial , Síndromes do Olho Seco/diagnóstico , Humanos , Aprendizado de Máquina

9.

To explain or not to explain?-Artificial intelligence explainability in clinical decision support systems.

Amann, Julia; Vetter, Dennis; Blomberg, Stig Nikolaj; Christensen, Helle Collatz; Coffee, Megan; Gerke, Sara; Gilbert, Thomas K; Hagendorff, Thilo; Holm, Sune; Livne, Michelle; Spezzatti, Andy; Strümke, Inga; Zicari, Roberto V; Madai, Vince Istvan.

PLOS Digit Health ; 1(2): e0000016, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-36812545

RESUMO

Explainability for artificial intelligence (AI) in medicine is a hotly debated topic. Our paper presents a review of the key arguments in favor and against explainability for AI-powered Clinical Decision Support System (CDSS) applied to a concrete use case, namely an AI-powered CDSS currently used in the emergency call setting to identify patients with life-threatening cardiac arrest. More specifically, we performed a normative analysis using socio-technical scenarios to provide a nuanced account of the role of explainability for CDSSs for the concrete use case, allowing for abstractions to a more general level. Our analysis focused on three layers: technical considerations, human factors, and the designated system role in decision-making. Our findings suggest that whether explainability can provide added value to CDSS depends on several key questions: technical feasibility, the level of validation in case of explainable algorithms, the characteristics of the context in which the system is implemented, the designated role in the decision-making process, and the key user group(s). Thus, each CDSS will require an individualized assessment of explainability needs and we provide an example of how such an assessment could look like in practice.

10.

Impact of Image Resolution on Deep Learning Performance in Endoscopy Image Classification: An Experimental Study Using a Large Dataset of Endoscopic Images.

Thambawita, Vajira; Strümke, Inga; Hicks, Steven A; Halvorsen, Pål; Parasa, Sravanthi; Riegler, Michael A.

Diagnostics (Basel) ; 11(12)2021 Nov 24.

Artigo em Inglês | MEDLINE | ID: mdl-34943421

RESUMO

Recent trials have evaluated the efficacy of deep convolutional neural network (CNN)-based AI systems to improve lesion detection and characterization in endoscopy. Impressive results are achieved, but many medical studies use a very small image resolution to save computing resources at the cost of losing details. Today, no conventions between resolution and performance exist, and monitoring the performance of various CNN architectures as a function of image resolution provides insights into how subtleties of different lesions on endoscopy affect performance. This can help set standards for image or video characteristics for future CNN-based models in gastrointestinal (GI) endoscopy. This study examines the performance of CNNs on the HyperKvasir dataset, consisting of 10,662 images from 23 different findings. We evaluate two CNN models for endoscopic image classification under quality distortions with image resolutions ranging from 32 × 32 to 512 × 512 pixels. The performance is evaluated using two-fold cross-validation and F1-score, maximum Matthews correlation coefficient (MCC), precision, and sensitivity as metrics. Increased performance was observed with higher image resolution for all findings in the dataset. MCC was achieved at image resolutions between 512 × 512 pixels for classification for the entire dataset after including all subclasses. The highest performance was observed with an MCC value of 0.9002 when the models were trained on the highest resolution and tested on the same resolution. Different resolutions and their effect on CNNs are explored. We show that image resolution has a clear influence on the performance which calls for standards in the field in the future.

11.

DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine.

Thambawita, Vajira; Isaksen, Jonas L; Hicks, Steven A; Ghouse, Jonas; Ahlberg, Gustav; Linneberg, Allan; Grarup, Niels; Ellervik, Christina; Olesen, Morten Salling; Hansen, Torben; Graff, Claus; Holstein-Rathlou, Niels-Henrik; Strümke, Inga; Hammer, Hugo L; Maleckar, Mary M; Halvorsen, Pål; Riegler, Michael A; Kanters, Jørgen K.

Sci Rep ; 11(1): 21896, 2021 11 09.

Artigo em Inglês | MEDLINE | ID: mdl-34753975

RESUMO

Recent global developments underscore the prominent role big data have in modern medical science. But privacy issues constitute a prevalent problem for collecting and sharing data between researchers. However, synthetic data generated to represent real data carrying similar information and distribution may alleviate the privacy issue. In this study, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 10-s 12-lead electrocardiograms (ECGs). We have developed and compared two methods, named WaveGAN* and Pulse2Pulse. We trained the GANs with 7,233 real normal ECGs to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN* to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. Although these synthetic ECGs mimic the dataset used for creation, the ECGs are not linked to any individuals and may thus be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using generative adversarial neural networks on normal ECGs from two population studies, thereby addressing the relevant privacy issues in medical datasets.

Assuntos

Eletrocardiografia , Redes Neurais de Computação , Simulação por Computador , Conjuntos de Dados como Assunto , Humanos , Privacidade

12.

Model independent feature attributions: Shapley values that uncover non-linear dependencies.

Fryer, Daniel Vidali; Strumke, Inga; Nguyen, Hien.

PeerJ Comput Sci ; 7: e582, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34151001

RESUMO

Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of 'fairness'. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category, which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Schmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a medical survey data set.

13.

Explaining deep neural networks for knowledge discovery in electrocardiogram analysis.

Hicks, Steven A; Isaksen, Jonas L; Thambawita, Vajira; Ghouse, Jonas; Ahlberg, Gustav; Linneberg, Allan; Grarup, Niels; Strümke, Inga; Ellervik, Christina; Olesen, Morten Salling; Hansen, Torben; Graff, Claus; Holstein-Rathlou, Niels-Henrik; Halvorsen, Pål; Maleckar, Mary M; Riegler, Michael A; Kanters, Jørgen K.

Sci Rep ; 11(1): 10949, 2021 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-34040033

RESUMO

Deep learning-based tools may annotate and interpret medical data more quickly, consistently, and accurately than medical doctors. However, as medical doctors are ultimately responsible for clinical decision-making, any deep learning-based prediction should be accompanied by an explanation that a human can understand. We present an approach called electrocardiogram gradient class activation map (ECGradCAM), which is used to generate attention maps and explain the reasoning behind deep learning-based decision-making in ECG analysis. Attention maps may be used in the clinic to aid diagnosis, discover new medical knowledge, and identify novel features and characteristics of medical tests. In this paper, we showcase how ECGradCAM attention maps can unmask how a novel deep learning model measures both amplitudes and intervals in 12-lead electrocardiograms, and we show an example of how attention maps may be used to develop novel ECG features.

Assuntos

Aprendizado Profundo , Eletrocardiografia , Descoberta do Conhecimento , Modelos Cardiovasculares , Adulto , Idoso , Algoritmos , Cardiologistas , Confiabilidade dos Dados , Diagnóstico por Computador , Feminino , Cardiopatias/diagnóstico , Cardiopatias/fisiopatologia , Humanos , Masculino , Pessoa de Meia-Idade , Análise para Determinação do Sexo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA