Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38607715

RESUMEN

In this article we propose a conceptual framework to study ensembles of conformal predictors (CP), that we call Ensemble Predictors (EP). Our approach is inspired by the application of imprecise probabilities in information fusion. Based on the proposed framework, we study, for the first time in the literature, the theoretical properties of CP ensembles in a general setting, by focusing on simple and commonly used possibilistic combination rules. We also illustrate the applicability of the proposed methods in the setting of multivariate time-series classification, showing that these methods provide better performance (in terms of both robustness, conservativeness, accuracy and running time) than both standard classification algorithms and other combination rules proposed in the literature, on a large set of benchmarks from the UCR time series archive.

2.
Artif Intell Med ; 150: 102819, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38553159

RESUMEN

This paper examines a kind of explainable AI, centered around what we term pro-hoc explanations, that is a form of support that consists of offering alternative explanations (one for each possible outcome) instead of a specific post-hoc explanation following specific advice. Specifically, our support mechanism utilizes explanations by examples, featuring analogous cases for each category in a binary setting. Pro-hoc explanations are an instance of what we called frictional AI, a general class of decision support aimed at achieving a useful compromise between the increase of decision effectiveness and the mitigation of cognitive risks, such as over-reliance, automation bias and deskilling. To illustrate an instance of frictional AI, we conducted an empirical user study to investigate its impact on the task of radiological detection of vertebral fractures in x-rays. Our study engaged 16 orthopedists in a 'human-first, second-opinion' interaction protocol. In this protocol, clinicians first made initial assessments of the x-rays without AI assistance and then provided their final diagnosis after considering the pro-hoc explanations. Our findings indicate that physicians, particularly those with less experience, perceived pro-hoc XAI support as significantly beneficial, even though it did not notably enhance their diagnostic accuracy. However, their increased confidence in final diagnoses suggests a positive overall impact. Given the promisingly high effect size observed, our results advocate for further research into pro-hoc explanations specifically, and into the broader concept of frictional AI.


Asunto(s)
Médicos , Radiología , Humanos , Toma de Decisiones Clínicas , Automatización
3.
Comput Biol Med ; 170: 108042, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38308866

RESUMEN

This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.


Asunto(s)
Procesos Mentales , Radiología , Humanos , Semántica , Columna Vertebral
4.
Clin Chem Lab Med ; 62(5): 835-843, 2024 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-38019961

RESUMEN

BACKGROUND: In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns. CONTENT: This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment. SUMMARY: AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment. OUTLOOK: As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.


Asunto(s)
Inteligencia Artificial , Edición , Humanos , Benchmarking , Investigadores
5.
Dig Liver Dis ; 2023 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-37940501

RESUMEN

Diagnostic errors impact patient health and healthcare costs. Artificial Intelligence (AI) shows promise in mitigating this burden by supporting Medical Doctors in decision-making. However, the mere display of excellent or even superhuman performance by AI in specific tasks does not guarantee a positive impact on medical practice. Effective AI assistance should target the primary causes of human errors and foster effective collaborative decision-making with human experts who remain the ultimate decision-makers. In this narrative review, we apply these principles to the specific scenario of AI assistance during colonoscopy. By unraveling the neurocognitive foundations of the colonoscopy procedure, we identify multiple bottlenecks in perception, attention, and decision-making that contribute to diagnostic errors, shedding light on potential interventions to mitigate them. Furthermore, we explored how existing AI devices fare in clinical practice and whether they achieved an optimal integration with the human decision-maker. We argue that to foster optimal Human-AI collaboration, future research should expand our knowledge of factors influencing AI's impact, establish evidence-based cognitive models, and develop training programs based on them. These efforts will enhance human-AI collaboration, ultimately improving diagnostic accuracy and patient outcomes. The principles illuminated in this review hold more general value, extending their relevance to a wide array of medical procedures and beyond.

6.
J Med Syst ; 47(1): 64, 2023 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-37195484

RESUMEN

In this paper, we present an exploratory study on the potential impact of holographic heart models and mixed reality technology on medical training, and in particular in teaching complex Congenital Heart Diseases (CHD) to medical students. Fifty-nine medical students were randomly allocated into three groups. Each participant in each group received a 30-minute lecture on a CHD condition interpretation and transcatheter treatment with different instructional tools. The participants of the first group attended a lecture in which traditional slides were projected onto a flat screen (group "regular slideware", RS). The second group was shown slides incorporating videos of holographic anatomical models (group "holographic videos", HV). Finally, those in the third group wore immersive, head-mounted devices (HMD) to interact directly with holographic anatomical models (group "mixed reality", MR). At the end of the lecture, the members of each group were asked to fill in a multiple-choice questionnaire aimed at evaluating their topic proficiency, as a proxy to evaluate the effectiveness of the training session (in terms of acquired notions); participants from group MR were also asked to fill in a questionnaire regarding the recommendability and usability of the MS Hololens HMDs, as a proxy of satisfaction regarding its use experience (UX). The findings show promising results for usability and user acceptance.


Asunto(s)
Cardiopatías Congénitas , Estudiantes de Medicina , Humanos , Aprendizaje
7.
Radiol Med ; 128(5): 544-555, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37093337

RESUMEN

OBJECTIVES: The aim of the present systematic review and meta-analysis is to assess the accuracy of automated landmarking using deep learning in comparison with manual tracing for cephalometric analysis of 3D medical images. METHODS: PubMed/Medline, IEEE Xplore, Scopus and ArXiv electronic databases were searched. Selection criteria were: ex vivo and in vivo volumetric data images suitable for 3D landmarking (Problem), a minimum of five automated landmarking performed by deep learning method (Intervention), manual landmarking (Comparison), and mean accuracy, in mm, between manual and automated landmarking (Outcome). QUADAS-2 was adapted for quality analysis. Meta-analysis was performed on studies that reported as outcome mean values and standard deviation of the difference (error) between manual and automated landmarking. Linear regression plots were used to analyze correlations between mean accuracy and year of publication. RESULTS: The initial electronic screening yielded 252 papers published between 2020 and 2022. A total of 15 studies were included for the qualitative synthesis, whereas 11 studies were used for the meta-analysis. Overall random effect model revealed a mean value of 2.44 mm, with a high heterogeneity (I2 = 98.13%, τ2 = 1.018, p-value < 0.001); risk of bias was high due to the presence of issues for several domains per study. Meta-regression indicated a significant relation between mean error and year of publication (p value = 0.012). CONCLUSION: Deep learning algorithms showed an excellent accuracy for automated 3D cephalometric landmarking. In the last two years promising algorithms have been developed and improvements in landmarks annotation accuracy have been done.


Asunto(s)
Aprendizaje Profundo , Humanos , Puntos Anatómicos de Referencia , Reproducibilidad de los Resultados , Cefalometría/métodos , Imagenología Tridimensional/métodos , Algoritmos
8.
Clin Chem Lab Med ; 61(7): 1158-1166, 2023 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-37083166

RESUMEN

OBJECTIVES: ChatGPT, a tool based on natural language processing (NLP), is on everyone's mind, and several potential applications in healthcare have been already proposed. However, since the ability of this tool to interpret laboratory test results has not yet been tested, the EFLM Working group on Artificial Intelligence (WG-AI) has set itself the task of closing this gap with a systematic approach. METHODS: WG-AI members generated 10 simulated laboratory reports of common parameters, which were then passed to ChatGPT for interpretation, according to reference intervals (RI) and units, using an optimized prompt. The results were subsequently evaluated independently by all WG-AI members with respect to relevance, correctness, helpfulness and safety. RESULTS: ChatGPT recognized all laboratory tests, it could detect if they deviated from the RI and gave a test-by-test as well as an overall interpretation. The interpretations were rather superficial, not always correct, and, only in some cases, judged coherently. The magnitude of the deviation from the RI seldom plays a role in the interpretation of laboratory tests, and artificial intelligence (AI) did not make any meaningful suggestion regarding follow-up diagnostics or further procedures in general. CONCLUSIONS: ChatGPT in its current form, being not specifically trained on medical data or laboratory data in particular, may only be considered a tool capable of interpreting a laboratory report on a test-by-test basis at best, but not on the interpretation of an overall diagnostic picture. Future generations of similar AIs with medical ground truth training data might surely revolutionize current processes in healthcare, despite this implementation is not ready yet.


Asunto(s)
Inteligencia Artificial , Química Clínica , Humanos , Laboratorios
10.
Artif Intell Med ; 138: 102506, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36990586

RESUMEN

In this paper, we study human-AI collaboration protocols, a design-oriented construct aimed at establishing and evaluating how humans and AI can collaborate in cognitive tasks. We applied this construct in two user studies involving 12 specialist radiologists (the knee MRI study) and 44 ECG readers of varying expertise (the ECG study), who evaluated 240 and 20 cases, respectively, in different collaboration configurations. We confirm the utility of AI support but find that XAI can be associated with a "white-box paradox", producing a null or detrimental effect. We also find that the order of presentation matters: AI-first protocols are associated with higher diagnostic accuracy than human-first protocols, and with higher accuracy than both humans and AI alone. Our findings identify the best conditions for AI to augment human diagnostic skills, rather than trigger dysfunctional responses and cognitive biases that can undermine decision effectiveness.


Asunto(s)
Inteligencia Artificial , Humanos
11.
Diagnostics (Basel) ; 13(6)2023 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-36980497

RESUMEN

Total hip (THA) and total knee (TKA) arthroplasty procedures have steadily increased over the past few decades, and their use is expected to grow further, mainly due to an increasing number of elderly patients. Cost-containment strategies, supporting a rapid recovery with a positive functional outcomes, high patient satisfaction, and enhanced patient reported outcomes, are needed. A Fast Track surgical procedure (FT) is a coordinated perioperative approach aimed at expediting early mobilization and recovery following surgery and, accordingly, shortening the length of hospital stay (LOS), convalescence and costs. In this view, rapid rehabilitation surgery optimizes traditional rehabilitation methods by integrating evidence-based practices into the procedure. The aim of the present study was to compare the effectiveness of Fast Track versus Care-as-Usual surgical procedures and pathways (including rehabilitation) on a mid-term patient-reported outcome (PROs), the SF12 (with regard both to Physical and Mental Scores), 3 months after hip or knee replacement surgery, with the use of Propensity score-matching (PSM) analysis to address the issue of the comparability of the groups in a non-randomized study. We were interested in the evaluation of the entire pathways, including the postoperative rehabilitation stage, therefore, we only used early home discharge as a surrogate to differentiate between the Fast Track and Care-as-Usual rehabilitation pathways. Our study shows that the entire Fast Track pathway, which includes the post-operative rehabilitation stage, has a significantly positive impact on physical health-related status (SF12 Physical Scores), as perceived by patients 3 months after hip or knee replacement surgery, as opposed to the standardized program, both in terms of the PROs score and the relative improvements observed, as compared with the minimum clinically important difference. This result encourages additional research into the effects of Fast Track rehabilitation on the entire process of care for patients undergoing hip or knee arthroplasty, focusing only on patient-reported outcomes.

12.
Recenti Prog Med ; 114(3): 137-138, 2023 03.
Artículo en Italiano | MEDLINE | ID: mdl-36815413

RESUMEN

Artificial intelligence is able to read and interpret Ecg traces quickly and precisely, increasing the diagnostic capacity and offering the possibility of anticipating preventive therapies. However, there is no evidence on the clinical utility and cost-effectiveness of certain practical applications. In fact, the literature shows the prognostic importance in favor of prevention, but clear evidence is not available that correcting strokes, embolisms, heart failure early improves quality and life span of patients.


Asunto(s)
Insuficiencia Cardíaca , Accidente Cerebrovascular , Humanos , Inteligencia Artificial , Electrocardiografía , Pronóstico
13.
Clin Chem Lab Med ; 61(4): 535-543, 2023 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-36327445

RESUMEN

OBJECTIVES: The field of artificial intelligence (AI) has grown in the past 10 years. Despite the crucial role of laboratory diagnostics in clinical decision-making, we found that the majority of AI studies focus on surgery, radiology, and oncology, and there is little attention given to AI integration into laboratory medicine. METHODS: We dedicated a session at the 3rd annual European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) strategic conference in 2022 to the topic of AI in the laboratory of the future. The speakers collaborated on generating a concise summary of the content that is presented in this paper. RESULTS: The five key messages are (1) Laboratory specialists and technicians will continue to improve the analytical portfolio, diagnostic quality and laboratory turnaround times; (2) The modularized nature of laboratory processes is amenable to AI solutions; (3) Laboratory sub-specialization continues and from test selection to interpretation, tasks increase in complexity; (4) Expertise in AI implementation and partnerships with industry will emerge as a professional competency and require novel educational strategies for broad implementation; and (5) regulatory frameworks and guidances have to be adopted to new computational paradigms. CONCLUSIONS: In summary, the speakers opine that the ability to convert the value-proposition of AI in the laboratory will rely heavily on hands-on expertise and well designed quality improvement initiative from within laboratory for improved patient care.


Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Laboratorios , Toma de Decisiones Clínicas
15.
J Pers Med ; 12(11)2022 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-36579522

RESUMEN

One of the next frontiers in medical research, particularly in orthopaedic surgery, is personalized treatment outcome prediction. In personalized medicine, treatment choices are adjusted for the patient based on the individual's and their disease's distinct features. A high-value and patient-centered health care system requires evaluating results that integrate the patient's viewpoint. Patient-reported outcome measures (PROMs) are widely used to shed light on patients' perceptions of their health status after an intervention by using validated questionnaires. The aim of this study is to examine whether meteorological or light (night vs. day) conditions affect PROM scores and hence indirectly affect health-related outcomes. We collected scores for PROMs from questionnaires completed by patients (N = 2326) who had undergone hip and knee interventions between June 2017 and May 2020 at the IRCCS Orthopaedic Institute Galeazzi (IOG), Milan, Italy. Nearest neighbour propensity score (PS) matching was applied to ensure the similarity of the groups tested under the different weather-related conditions. The exposure PS was derived through logistic regression. The data were analysed using statistical tests (Student's t-test and Mann-Whitney U test). According to Cohen's effect size, weather conditions may affect the scores for PROMs and, indirectly, health-related outcomes via influencing the relative humidity and weather-related conditions. The findings suggest avoiding PROMs' collection in certain conditions if the odds of outcome-based underperformance are to be minimized. This would ensure a balance between costs for PROMs' collection and data availability.

16.
Sensors (Basel) ; 22(19)2022 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-36236427

RESUMEN

Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.


Asunto(s)
Actividades Humanas , Redes Neurales de la Computación , Humanos , Reconocimiento en Psicología
17.
J Pers Med ; 12(10)2022 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-36294845

RESUMEN

The rise of personalized medicine and its remarkable advancements have revealed new requirements for the availability of appropriate medical decision-making models. Computer science is an area that plays an essential role in the field of personalized medicine, where one of the goals is to provide algorithms and tools to extrapolate knowledge and improve the decision-support process. The minimum clinically important difference (MCID) is the smallest change in PROM scores that patients perceive as meaningful. Treatment that does not achieve the minimum level of improvement is considered inappropriate as well as a potential waste of resources. Using the MCID threshold to identify patients who fail to achieve the minimum change in PROM that results in a meaningful outcome may aid in pre-surgical shared decision-making. The decision tree algorithm is a method for extracting valuable information and providing further meaningful information to the domain expert that supports the decision-making. In the present study, different tools based on machine learning were developed. On the one hand, we compared three XGBoost models to predict the non-achievement of the MCID at six months post-operation in the SF-12 physical score. The prediction score threshold was set to 0.75 to provide three decision-making areas on the basis of the high confidence (HC) intervals; the minority class was re-balanced by weighting the positive class to penalize the loss function (XGBoost cost-sensitive), oversampling the minority class (XGBoost with SMOTE), and re-sampling the negative class (XGBoost with undersampling). On the other hand, we modeled the data through a decision tree (assessment tree), based on different complexity levels, to identify the hidden pattern and to provide a new way to understand possible relationships between the gathered features and the several outcomes. The results showed that all the proposed models were effective as binary classifiers, as they showed moderate predictive performance both regarding the minority or positive class (i.e., our targeted patients, those who will not benefit from surgery) and the negative class. The decision tree visualization can be exploited during the patient assessment status to better understand if those patients will benefit or not from the medical intervention. Both of these tools can come in handy for increasing knowledge about the patient's psychophysical state and for creating an increasingly specialized assessment of the individual patient.

19.
Comput Methods Programs Biomed ; 221: 106930, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35690505

RESUMEN

Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters' perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies. Results We show that our metric generalizes the well-known Net Benefit, as well as other common error-based and utility-based metrics. Through the empirical studies, we show that our metric can provide a more flexible tool for the evaluation of AI models. We also show that, compared to other optimization metrics, model optimization based on the wU can provide significantly better performance (AUC 0.862 vs 0.895, p-value <0.05), especially on cases judged to be more complex by the human annotators (AUC 0.85 vs 0.92, p-value <0.05). Conclusions We make the point for having utility as a primary concern in the evaluation and optimization of machine learning models in critical domains, like the medical one; and for the importance of a human-centred approach to assess the potential impact of AI models on human decision making also on the basis of further information that can be collected during the ground-truthing process.


Asunto(s)
Benchmarking , Aprendizaje Automático , Humanos
20.
Stud Health Technol Inform ; 294: 127-128, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612033

RESUMEN

We propose a re-calibration method for Machine Learning models, based on computing confidence intervals for the predicted confidence scores. We show the effectiveness of the proposed method on a COVID-19 diagnosis benchmark.


Asunto(s)
COVID-19 , Prueba de COVID-19 , Calibración , Intervalos de Confianza , Humanos , Aprendizaje Automático
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...