Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Mach Learn ; 113(5): 2655-2674, 2024.
Article in English | MEDLINE | ID: mdl-38708086

ABSTRACT

With the rapid growth of memory and computing power, datasets are becoming increasingly complex and imbalanced. This is especially severe in the context of clinical data, where there may be one rare event for many cases in the majority class. We introduce an imbalanced classification framework, based on reinforcement learning, for training extremely imbalanced data sets, and extend it for use in multi-class settings. We combine dueling and double deep Q-learning architectures, and formulate a custom reward function and episode-training procedure, specifically with the capability of handling multi-class imbalanced training. Using real-world clinical case studies, we demonstrate that our proposed framework outperforms current state-of-the-art imbalanced learning methods, achieving more fair and balanced classification, while also significantly improving the prediction of minority classes. Supplementary Information: The online version contains supplementary material available at 10.1007/s10994-023-06481-z.

2.
Clin Exp Dermatol ; 2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38549552

ABSTRACT

Artificial Intelligence (AI) solutions for skin cancer diagnosis continue to gain momentum, edging closer towards broad clinical use. These AI models, particularly deep learning architectures, require large digital image datasets for development. This review provides an overview of the datasets used to develop AI algorithms and highlights the importance of dataset transparency for evaluation of algorithm generalisability across varying populations and settings. Current challenges for curation of clinically valuable datasets are detailed, which include dataset shifts arising from demographic variations and differences in data collection methodologies, along with inconsistencies in labelling. These shifts can lead to differential algorithm performance, compromise of clinical utility, and the propagation of discriminatory biases when developed algorithms are implemented in mismatched populations. Limited representation of rare skin cancers and minoritised groups in existing datasets are highlighted which can further skew algorithm performance. Strategies to address these challenges are presented, which include improving transparency, representation and interoperability. Federated learning and generative methods, which may improve dataset size and diversity without compromising privacy, are also examined. Lastly, we discuss model-level techniques which may address biases entrained through the use of datasets derived from routine clinical care. As the role of AI in skin cancer diagnosis becomes more prominent, ensuring the robustness of underlying datasets is increasingly important.

3.
Lancet Digit Health ; 6(2): e93-e104, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38278619

ABSTRACT

BACKGROUND: Multicentre training could reduce biases in medical artificial intelligence (AI); however, ethical, legal, and technical considerations can constrain the ability of hospitals to share data. Federated learning enables institutions to participate in algorithm development while retaining custody of their data but uptake in hospitals has been limited, possibly as deployment requires specialist software and technical expertise at each site. We previously developed an artificial intelligence-driven screening test for COVID-19 in emergency departments, known as CURIAL-Lab, which uses vital signs and blood tests that are routinely available within 1 h of a patient's arrival. Here we aimed to federate our COVID-19 screening test by developing an easy-to-use embedded system-which we introduce as full-stack federated learning-to train and evaluate machine learning models across four UK hospital groups without centralising patient data. METHODS: We supplied a Raspberry Pi 4 Model B preloaded with our federated learning software pipeline to four National Health Service (NHS) hospital groups in the UK: Oxford University Hospitals NHS Foundation Trust (OUH; through the locally linked research University, University of Oxford), University Hospitals Birmingham NHS Foundation Trust (UHB), Bedfordshire Hospitals NHS Foundation Trust (BH), and Portsmouth Hospitals University NHS Trust (PUH). OUH, PUH, and UHB participated in federated training, training a deep neural network and logistic regressor over 150 rounds to form and calibrate a global model to predict COVID-19 status, using clinical data from patients admitted before the pandemic (COVID-19-negative) and testing positive for COVID-19 during the first wave of the pandemic. We conducted a federated evaluation of the global model for admissions during the second wave of the pandemic at OUH, PUH, and externally at BH. For OUH and PUH, we additionally performed local fine-tuning of the global model using the sites' individual training data, forming a site-tuned model, and evaluated the resultant model for admissions during the second wave of the pandemic. This study included data collected between Dec 1, 2018, and March 1, 2021; the exact date ranges used varied by site. The primary outcome was overall model performance, measured as the area under the receiver operating characteristic curve (AUROC). Removable micro secure digital (microSD) storage was destroyed on study completion. FINDINGS: Clinical data from 130 941 patients (1772 COVID-19-positive), routinely collected across three hospital groups (OUH, PUH, and UHB), were included in federated training. The evaluation step included data from 32 986 patients (3549 COVID-19-positive) attending OUH, PUH, or BH during the second wave of the pandemic. Federated training of a global deep neural network classifier improved upon performance of models trained locally in terms of AUROC by a mean of 27·6% (SD 2·2): AUROC increased from 0·574 (95% CI 0·560-0·589) at OUH and 0·622 (0·608-0·637) at PUH using the locally trained models to 0·872 (0·862-0·882) at OUH and 0·876 (0·865-0·886) at PUH using the federated global model. Performance improvement was smaller for a logistic regression model, with a mean increase in AUROC of 13·9% (0·5%). During federated external evaluation at BH, AUROC for the global deep neural network model was 0·917 (0·893-0·942), with 89·7% sensitivity (83·6-93·6) and 76·6% specificity (73·9-79·1). Site-specific tuning of the global model did not significantly improve performance (change in AUROC <0·01). INTERPRETATION: We developed an embedded system for federated learning, using microcomputing to optimise for ease of deployment. We deployed full-stack federated learning across four UK hospital groups to develop a COVID-19 screening test without centralising patient data. Federation improved model performance, and the resultant global models were generalisable. Full-stack federated learning could enable hospitals to contribute to AI development at low cost and without specialist technical expertise at each site. FUNDING: The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.


Subject(s)
COVID-19 , Secondary Care , Humans , Artificial Intelligence , Privacy , State Medicine , COVID-19/diagnosis , Hospitals , United Kingdom
4.
Nat Mach Intell ; 5(8): 884-894, 2023.
Article in English | MEDLINE | ID: mdl-37615031

ABSTRACT

As models based on machine learning continue to be developed for healthcare applications, greater effort is needed to ensure that these technologies do not reflect or exacerbate any unwanted or discriminatory biases that may be present in the data. Here we introduce a reinforcement learning framework capable of mitigating biases that may have been acquired during data collection. In particular, we evaluated our model for the task of rapidly predicting COVID-19 for patients presenting to hospital emergency departments and aimed to mitigate any site (hospital)-specific and ethnicity-based biases present in the data. Using a specialized reward function and training procedure, we show that our method achieves clinically effective screening performances, while significantly improving outcome fairness compared with current benchmarks and state-of-the-art machine learning methods. We performed external validation across three independent hospitals, and additionally tested our method on a patient intensive care unit discharge status task, demonstrating model generalizability.

5.
NPJ Digit Med ; 6(1): 55, 2023 Mar 29.
Article in English | MEDLINE | ID: mdl-36991077

ABSTRACT

Machine learning is becoming increasingly prominent in healthcare. Although its benefits are clear, growing attention is being given to how these tools may exacerbate existing biases and disparities. In this study, we introduce an adversarial training framework that is capable of mitigating biases that may have been acquired through data collection. We demonstrate this proposed framework on the real-world task of rapidly predicting COVID-19, and focus on mitigating site-specific (hospital) and demographic (ethnicity) biases. Using the statistical definition of equalized odds, we show that adversarial training improves outcome fairness, while still achieving clinically-effective screening performances (negative predictive values >0.98). We compare our method to previous benchmarks, and perform prospective and external validation across four independent hospital cohorts. Our method can be generalized to any outcomes, models, and definitions of fairness.

7.
NPJ Digit Med ; 5(1): 69, 2022 Jun 07.
Article in English | MEDLINE | ID: mdl-35672368

ABSTRACT

As patient health information is highly regulated due to privacy concerns, most machine learning (ML)-based healthcare studies are unable to test on external patient cohorts, resulting in a gap between locally reported model performance and cross-site generalizability. Different approaches have been introduced for developing models across multiple clinical sites, however less attention has been given to adopting ready-made models in new settings. We introduce three methods to do this-(1) applying a ready-made model "as-is" (2); readjusting the decision threshold on the model's output using site-specific data and (3); finetuning the model using site-specific data via transfer learning. Using a case study of COVID-19 diagnosis across four NHS Hospital Trusts, we show that all methods achieve clinically-effective performances (NPV > 0.959), with transfer learning achieving the best results (mean AUROCs between 0.870 and 0.925). Our models demonstrate that site-specific customization improves predictive performance when compared to other ready-made approaches.

8.
Lancet Digit Health ; 4(4): e266-e278, 2022 04.
Article in English | MEDLINE | ID: mdl-35279399

ABSTRACT

BACKGROUND: Uncertainty in patients' COVID-19 status contributes to treatment delays, nosocomial transmission, and operational pressures in hospitals. However, the typical turnaround time for laboratory PCR remains 12-24 h and lateral flow devices (LFDs) have limited sensitivity. Previously, we have shown that artificial intelligence-driven triage (CURIAL-1.0) can provide rapid COVID-19 screening using clinical data routinely available within 1 h of arrival to hospital. Here, we aimed to improve the time from arrival to the emergency department to the availability of a result, do external and prospective validation, and deploy a novel laboratory-free screening tool in a UK emergency department. METHODS: We optimised our previous model, removing less informative predictors to improve generalisability and speed, developing the CURIAL-Lab model with vital signs and readily available blood tests (full blood count [FBC]; urea, creatinine, and electrolytes; liver function tests; and C-reactive protein) and the CURIAL-Rapide model with vital signs and FBC alone. Models were validated externally for emergency admissions to University Hospitals Birmingham, Bedfordshire Hospitals, and Portsmouth Hospitals University National Health Service (NHS) trusts, and prospectively at Oxford University Hospitals, by comparison with PCR testing. Next, we compared model performance directly against LFDs and evaluated a combined pathway that triaged patients who had either a positive CURIAL model result or a positive LFD to a COVID-19-suspected clinical area. Lastly, we deployed CURIAL-Rapide alongside an approved point-of-care FBC analyser to provide laboratory-free COVID-19 screening at the John Radcliffe Hospital (Oxford, UK). Our primary improvement outcome was time-to-result, and our performance measures were sensitivity, specificity, positive and negative predictive values, and area under receiver operating characteristic curve (AUROC). FINDINGS: 72 223 patients met eligibility criteria across the four validating hospital groups, in a total validation period spanning Dec 1, 2019, to March 31, 2021. CURIAL-Lab and CURIAL-Rapide performed consistently across trusts (AUROC range 0·858-0·881, 95% CI 0·838-0·912, for CURIAL-Lab and 0·836-0·854, 0·814-0·889, for CURIAL-Rapide), achieving highest sensitivity at Portsmouth Hospitals (84·1%, Wilson's 95% CI 82·5-85·7, for CURIAL-Lab and 83·5%, 81·8-85·1, for CURIAL-Rapide) at specificities of 71·3% (70·9-71·8) for CURIAL-Lab and 63·6% (63·1-64·1) for CURIAL-Rapide. When combined with LFDs, model predictions improved triage sensitivity from 56·9% (51·7-62·0) for LFDs alone to 85·6% with CURIAL-Lab (81·6-88·9; AUROC 0·925) and 88·2% with CURIAL-Rapide (84·4-91·1; AUROC 0·919), thereby reducing missed COVID-19 cases by 65% with CURIAL-Lab and 72% with CURIAL-Rapide. For the prospective deployment of CURIAL-Rapide, 520 patients were enrolled for point-of-care FBC analysis between Feb 18 and May 10, 2021, of whom 436 received confirmatory PCR testing and ten (2·3%) tested positive. Median time from arrival to a CURIAL-Rapide result was 45 min (IQR 32-64), 16 min (26·3%) sooner than with LFDs (61 min, 37-99; log-rank p<0·0001), and 6 h 52 min (90·2%) sooner than with PCR (7 h 37 min, 6 h 5 min to 15 h 39 min; p<0·0001). Classification performance was high, with sensitivity of 87·5% (95% CI 52·9-97·8), specificity of 85·4% (81·3-88·7), and negative predictive value of 99·7% (98·2-99·9). CURIAL-Rapide correctly excluded infection for 31 (58·5%) of 53 patients who were triaged by a physician to a COVID-19-suspected area but went on to test negative by PCR. INTERPRETATION: Our findings show the generalisability, performance, and real-world operational benefits of artificial intelligence-driven screening for COVID-19 over standard-of-care in emergency departments. CURIAL-Rapide provided rapid, laboratory-free screening when used with near-patient FBC analysis, and was able to reduce the number of patients who tested negative for COVID-19 but were triaged to COVID-19-suspected areas. FUNDING: The Wellcome Trust, University of Oxford Medical and Life Sciences Translational Fund.


Subject(s)
COVID-19 , Triage , Artificial Intelligence , COVID-19/diagnosis , Humans , SARS-CoV-2 , State Medicine
9.
Article in English | MEDLINE | ID: mdl-37015447

ABSTRACT

Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect potential cases of coronavirus using routine clinical data (blood tests, and vital signs measurements). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this study, two machine learning techniques that aim to predict a patient's COVID-19 status are examined. Using adversarial training, robust deep learning architectures are explored with the aim to protect attributes related to demographic information about the patients. The two models examined in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals (OUH), Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth Hospitals University NHS Trust (PUH), two neural networks are trained and evaluated. These networks predict PCR test results using information from basic laboratory blood tests, and vital signs collected from a patient upon arrival to the hospital. The level of privacy each one of the models can provide is assessed and the efficacy and robustness of the proposed architectures are compared with a relevant baseline. One of the main contributions in this work is the particular focus on the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks. The results on hold-out test set and external validation confirmed that there was no impact on the generalisibility of the model using adversarial learning.

10.
Healthc Technol Lett ; 8(5): 105-117, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34221413

ABSTRACT

COVID-19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high-flow nasal oxygen, continuous positive airways pressure, non-invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub-optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.

11.
Lancet Digit Health ; 3(2): e78-e87, 2021 02.
Article in English | MEDLINE | ID: mdl-33509388

ABSTRACT

BACKGROUND: The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. METHODS: We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. FINDINGS: We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). INTERPRETATION: Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. FUNDING: Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.


Subject(s)
Artificial Intelligence , COVID-19 , Hematologic Tests , Mass Screening , Predictive Value of Tests , Triage , Adult , Emergency Service, Hospital , Hospitalization , Hospitals , Humans , Middle Aged , Prospective Studies
12.
Vision Res ; 145: 1-10, 2018 04.
Article in English | MEDLINE | ID: mdl-29608936

ABSTRACT

In human visual processing, information from the visual field passes through numerous transformations before perceptual attributes such as colour are derived. The sequence of transforms involved in constructing perceptions of colour can be approximated by colour appearance models such as the CIE (2002) colour appearance model, abbreviated as CIECAM02. In this study, we test the plausibility of CIECAM02 as a model of colour processing by looking for evidence of its cortical entrainment. The CIECAM02 model predicts that colour is split in to two opposing chromatic components, red-green and cyan-yellow (termed CIECAM02-a and CIECAM02-b respectively), and an achromatic component (termed CIECAM02-A). Entrainment of cortical activity to the outputs of these components was estimated using measurements of electro- and magnetoencephalographic (EMEG) activity, recorded while healthy subjects watched videos of dots changing colour. We find entrainment to chromatic component CIECAM02-a at approximately 35 ms latency bilaterally in occipital lobe regions, and entrainment to achromatic component CIECAM02-A at approximately 75 ms latency, also bilaterally in occipital regions. For comparison, transforms from a less physiologically plausible model (CIELAB) were also tested, with no significant entrainment found.


Subject(s)
Color Perception/physiology , Visual Cortex/physiology , Visual Pathways/physiology , Adolescent , Adult , Evoked Potentials, Visual , Female , Humans , Magnetoencephalography , Male , Models, Theoretical , Young Adult
13.
J Comput Neurosci ; 43(1): 1-4, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28643213

ABSTRACT

Describing the human brain in mathematical terms is an important ambition of neuroscience research, yet the challenges remain considerable. It was Alan Turing, writing in 1950, who first sought to demonstrate how time-consuming such an undertaking would be. Through analogy to the computer program, Turing argued that arriving at a complete mathematical description of the mind would take well over a thousand years. In this opinion piece, we argue that - despite seventy years of progress in the field - his arguments remain both prescient and persuasive.


Subject(s)
Brain/physiology , Mathematics , Models, Neurological , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...