Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 15(1): 1808, 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38418453

RESUMEN

A clinical artificial intelligence (AI) system is often validated on data withheld during its development. This provides an estimate of its performance upon future deployment on data in the wild; those currently unseen but are expected to be encountered in a clinical setting. However, estimating performance on data in the wild is complicated by distribution shift between data in the wild and withheld data and the absence of ground-truth annotations. Here, we introduce SUDO, a framework for evaluating AI systems on data in the wild. Through experiments on AI systems developed for dermatology images, histopathology patches, and clinical notes, we show that SUDO can identify unreliable predictions, inform the selection of models, and allow for the previously out-of-reach assessment of algorithmic bias for data in the wild without ground-truth annotations. These capabilities can contribute to the deployment of trustworthy and ethical AI systems in medicine.


Asunto(s)
Inteligencia Artificial , Medicina
2.
J Endourol ; 2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-37905524

RESUMEN

Introduction: Automated skills assessment can provide surgical trainees with objective, personalized feedback during training. Here, we measure the efficacy of artificial intelligence (AI)-based feedback on a robotic suturing task. Materials and Methods: Forty-two participants with no robotic surgical experience were randomized to a control or feedback group and video-recorded while completing two rounds (R1 and R2) of suturing tasks on a da Vinci surgical robot. Participants were assessed on needle handling and needle driving, and feedback was provided via a visual interface after R1. For feedback group, participants were informed of their AI-based skill assessment and presented with specific video clips from R1. For control group, participants were presented with randomly selected video clips from R1 as a placebo. Participants from each group were further labeled as underperformers or innate-performers based on a median split of their technical skill scores from R1. Results: Demographic features were similar between the control (n = 20) and feedback group (n = 22) (p > 0.05). Observing the improvement from R1 to R2, the feedback group had a significantly larger improvement in needle handling score (0.30 vs -0.02, p = 0.018) when compared with the control group, although the improvement of needle driving score was not significant when compared with the control group (0.17 vs -0.40, p = 0.074). All innate-performers exhibited similar improvements across rounds, regardless of feedback (p > 0.05). In contrast, underperformers in the feedback group improved more than the control group in needle handling (p = 0.02). Conclusion: AI-based feedback facilitates surgical trainees' acquisition of robotic technical skills, especially underperformers. Future research will extend AI-based feedback to additional suturing skills, surgical tasks, and experience groups.

3.
Commun Med (Lond) ; 3(1): 42, 2023 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-36997578

RESUMEN

BACKGROUND: Surgeons who receive reliable feedback on their performance quickly master the skills necessary for surgery. Such performance-based feedback can be provided by a recently-developed artificial intelligence (AI) system that assesses a surgeon's skills based on a surgical video while simultaneously highlighting aspects of the video most pertinent to the assessment. However, it remains an open question whether these highlights, or explanations, are equally reliable for all surgeons. METHODS: Here, we systematically quantify the reliability of AI-based explanations on surgical videos from three hospitals across two continents by comparing them to explanations generated by humans experts. To improve the reliability of AI-based explanations, we propose the strategy of training with explanations -TWIX -which uses human explanations as supervision to explicitly teach an AI system to highlight important video frames. RESULTS: We show that while AI-based explanations often align with human explanations, they are not equally reliable for different sub-cohorts of surgeons (e.g., novices vs. experts), a phenomenon we refer to as an explanation bias. We also show that TWIX enhances the reliability of AI-based explanations, mitigates the explanation bias, and improves the performance of AI systems across hospitals. These findings extend to a training environment where medical students can be provided with feedback today. CONCLUSIONS: Our study informs the impending implementation of AI-augmented surgical training and surgeon credentialing programs, and contributes to the safe and fair democratization of surgery.


Surgeons aim to master skills necessary for surgery. One such skill is suturing which involves connecting objects together through a series of stitches. Mastering these surgical skills can be improved by providing surgeons with feedback on the quality of their performance. However, such feedback is often absent from surgical practice. Although performance-based feedback can be provided, in theory, by recently-developed artificial intelligence (AI) systems that use a computational model to assess a surgeon's skill, the reliability of this feedback remains unknown. Here, we compare AI-based feedback to that provided by human experts and demonstrate that they often overlap with one another. We also show that explicitly teaching an AI system to align with human feedback further improves the reliability of AI-based feedback on new videos of surgery. Our findings outline the potential of AI systems to support the training of surgeons by providing feedback that is reliable and focused on a particular skill, and guide programs that give surgeons qualifications by complementing skill assessments with explanations that increase the trustworthiness of such assessments.

4.
NPJ Digit Med ; 6(1): 54, 2023 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-36997642

RESUMEN

Artificial intelligence (AI) systems can now reliably assess surgeon skills through videos of intraoperative surgical activity. With such systems informing future high-stakes decisions such as whether to credential surgeons and grant them the privilege to operate on patients, it is critical that they treat all surgeons fairly. However, it remains an open question whether surgical AI systems exhibit bias against surgeon sub-cohorts, and, if so, whether such bias can be mitigated. Here, we examine and mitigate the bias exhibited by a family of surgical AI systems-SAIS-deployed on videos of robotic surgeries from three geographically-diverse hospitals (USA and EU). We show that SAIS exhibits an underskilling bias, erroneously downgrading surgical performance, and an overskilling bias, erroneously upgrading surgical performance, at different rates across surgeon sub-cohorts. To mitigate such bias, we leverage a strategy -TWIX-which teaches an AI system to provide a visual explanation for its skill assessment that otherwise would have been provided by human experts. We show that whereas baseline strategies inconsistently mitigate algorithmic bias, TWIX can effectively mitigate the underskilling and overskilling bias while simultaneously improving the performance of these AI systems across hospitals. We discovered that these findings carry over to the training environment where we assess medical students' skills today. Our study is a critical prerequisite to the eventual implementation of AI-augmented global surgeon credentialing programs, ensuring that all surgeons are treated fairly.

5.
Nat Biomed Eng ; 7(6): 780-796, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36997732

RESUMEN

The intraoperative activity of a surgeon has substantial impact on postoperative outcomes. However, for most surgical procedures, the details of intraoperative surgical actions, which can vary widely, are not well understood. Here we report a machine learning system leveraging a vision transformer and supervised contrastive learning for the decoding of elements of intraoperative surgical activity from videos commonly collected during robotic surgeries. The system accurately identified surgical steps, actions performed by the surgeon, the quality of these actions and the relative contribution of individual video frames to the decoding of the actions. Through extensive testing on data from three different hospitals located in two different continents, we show that the system generalizes across videos, surgeons, hospitals and surgical procedures, and that it can provide information on surgical gestures and skills from unannotated videos. Decoding intraoperative activity via accurate machine learning systems could be used to provide surgeons with feedback on their operating skills, and may allow for the identification of optimal surgical behaviour and for the study of relationships between intraoperative factors and postoperative outcomes.


Asunto(s)
Procedimientos Quirúrgicos Robotizados , Cirujanos , Humanos , Procedimientos Quirúrgicos Robotizados/métodos
6.
NPJ Digit Med ; 5(1): 187, 2022 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-36550203

RESUMEN

How well a surgery is performed impacts a patient's outcomes; however, objective quantification of performance remains an unsolved challenge. Deconstructing a procedure into discrete instrument-tissue "gestures" is a emerging way to understand surgery. To establish this paradigm in a procedure where performance is the most important factor for patient outcomes, we identify 34,323 individual gestures performed in 80 nerve-sparing robot-assisted radical prostatectomies from two international medical centers. Gestures are classified into nine distinct dissection gestures (e.g., hot cut) and four supporting gestures (e.g., retraction). Our primary outcome is to identify factors impacting a patient's 1-year erectile function (EF) recovery after radical prostatectomy. We find that less use of hot cut and more use of peel/push are statistically associated with better chance of 1-year EF recovery. Our results also show interactions between surgeon experience and gesture types-similar gesture selection resulted in different EF recovery rates dependent on surgeon experience. To further validate this framework, two teams independently constructe distinct machine learning models using gesture sequences vs. traditional clinical features to predict 1-year EF. In both models, gesture sequences are able to better predict 1-year EF (Team 1: AUC 0.77, 95% CI 0.73-0.81; Team 2: AUC 0.68, 95% CI 0.66-0.70) than traditional clinical features (Team 1: AUC 0.69, 95% CI 0.65-0.73; Team 2: AUC 0.65, 95% CI 0.62-0.68). Our results suggest that gestures provide a granular method to objectively indicate surgical performance and outcomes. Application of this methodology to other surgeries may lead to discoveries on methods to improve surgery.

7.
IEEE Rev Biomed Eng ; 15: 354-371, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-32813662

RESUMEN

Low-resource clinical settings are plagued by low physician-to-patient ratios and a shortage of high-quality medical expertise and infrastructure. Together, these phenomena lead to over-burdened healthcare systems that under-serve the needs of the community. Alleviating this burden can be undertaken by the introduction of clinical decision support systems (CDSSs); systems that support stakeholders (ranging from physicians to patients) within the clinical setting in their day-to-day activities. Such systems, which have proven to be effective in the developed world, remain to be under-explored in low-resource settings. This review attempts to summarize the research focused on clinical decision support systems that either target stakeholders within low-resource clinical settings or diseases commonly found in such environments. When categorizing our findings according to disease applications, we find that CDSSs are predominantly focused on dealing with bacterial infections and maternal care, do not leverage deep learning, and have not been evaluated prospectively. Together, these highlight the need for increased research in this domain in order to impact a diverse set of medical conditions and ultimately improve patient outcomes.


Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Atención a la Salud , Humanos
8.
Nat Commun ; 12(1): 4221, 2021 07 09.
Artículo en Inglés | MEDLINE | ID: mdl-34244504

RESUMEN

Deep learning algorithms trained on instances that violate the assumption of being independent and identically distributed (i.i.d.) are known to experience destructive interference, a phenomenon characterized by a degradation in performance. Such a violation, however, is ubiquitous in clinical settings where data are streamed temporally from different clinical sites and from a multitude of physiological sensors. To mitigate this interference, we propose a continual learning strategy, entitled CLOPS, that employs a replay buffer. To guide the storage of instances into the buffer, we propose end-to-end trainable parameters, termed task-instance parameters, that quantify the difficulty with which data points are classified by a deep-learning system. We validate the interpretation of these parameters via clinical domain knowledge. To replay instances from the buffer, we exploit uncertainty-based acquisition functions. In three of the four continual learning scenarios, reflecting transitions across diseases, time, data modalities, and healthcare institutions, we show that CLOPS outperforms the state-of-the-art methods, GEM1 and MIR2. We also conduct extensive ablation studies to demonstrate the necessity of the various components of our proposed strategy. Our framework has the potential to pave the way for diagnostic systems that remain robust over time.


Asunto(s)
Arritmias Cardíacas/diagnóstico , Toma de Decisiones Clínicas/métodos , Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Profundo , Conjuntos de Datos como Asunto , Electrocardiografía , Humanos , Modelos Cardiovasculares , Curva ROC , Estaciones del Año , Factores de Tiempo
9.
Lancet Digit Health ; 3(2): e78-e87, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33509388

RESUMEN

BACKGROUND: The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. METHODS: We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. FINDINGS: We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). INTERPRETATION: Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. FUNDING: Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.


Asunto(s)
Inteligencia Artificial , COVID-19 , Pruebas Hematológicas , Tamizaje Masivo , Valor Predictivo de las Pruebas , Triaje , Adulto , Servicio de Urgencia en Hospital , Hospitalización , Hospitales , Humanos , Persona de Mediana Edad , Estudios Prospectivos
10.
Healthc Technol Lett ; 7(2): 45-50, 2020 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-32431851

RESUMEN

Hand foot and mouth disease (HFMD) and tetanus are serious infectious diseases in low- and middle-income countries. Tetanus, in particular, has a high mortality rate and its treatment is resource-demanding. Furthermore, HFMD often affects a large number of infants and young children. As a result, its treatment consumes enormous healthcare resources, especially when outbreaks occur. Autonomic nervous system dysfunction (ANSD) is the main cause of death for both HFMD and tetanus patients. However, early detection of ANSD is a difficult and challenging problem. The authors aim to provide a proof-of-principle to detect the ANSD level automatically by applying machine learning techniques to physiological patient data, such as electrocardiogram waveforms, which can be collected using low-cost wearable sensors. Efficient features are extracted that encode variations in the waveforms in the time and frequency domains. The proposed approach is validated on multiple datasets of HFMD and tetanus patients in Vietnam. Results show that encouraging performance is achieved. Moreover, the proposed features are simple, more generalisable and outperformed the standard heart rate variability analysis. The proposed approach would facilitate both the diagnosis and treatment of infectious diseases in low- and middle-income countries, and thereby improve patient care.

11.
IEEE J Biomed Health Inform ; 24(11): 3226-3235, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32340967

RESUMEN

The paucity of physiological time-series data collected from low-resource clinical settings limits the capabilities of modern machine learning algorithms in achieving high performance. Such performance is further hindered by class imbalance; datasets where a diagnosis is much more common than others. To overcome these two issues at low-cost while preserving privacy, data augmentation methods can be employed. In the time domain, the traditional method of time-warping could alter the underlying data distribution with detrimental consequences. This is prominent when dealing with physiological conditions that influence the frequency components of data. In this paper, we propose PlethAugment; three different conditional generative adversarial networks (CGANs) with an adapted diversity term for the generation of pathological photoplethysmogram (PPG) signals in order to boost medical classification performance. To evaluate and compare the GANs, we introduce a novel metric-agnostic method; the synthetic generalization curve. We validate this approach on two proprietary and two public datasets representing a diverse set of medical conditions. Compared to training on non-augmented class-balanced datasets, training on augmented datasets leads to an improvement of the AUROC by up to 29% when using cross validation. This illustrates the potential of the proposed CGANs to significantly improve classification performance.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA