RESUMO
INTRODUCTION: Treatment in the intensive care unit (ICU) generates complex data where machine learning (ML) modelling could be beneficial. Using routine hospital data, we evaluated the ability of multiple ML models to predict inpatient mortality in a paediatric population in a low/middle-income country. METHOD: We retrospectively analysed hospital record data from 0-59 months old children admitted to the ICU of Dhaka hospital of International Centre for Diarrhoeal Disease Research, Bangladesh. Five commonly used ML models- logistic regression, least absolute shrinkage and selection operator, elastic net, gradient boosting trees (GBT) and random forest (RF), were evaluated using the area under the receiver operating characteristic curve (AUROC). Top predictors were selected using RF mean decrease Gini scores as the feature importance values. RESULTS: Data from 5669 children was used and was reduced to 3505 patients (10% death, 90% survived) following missing data removal. The mean patient age was 10.8 months (SD=10.5). The top performing models based on the validation performance measured by mean 10-fold cross-validation AUROC on the training data set were RF and GBT. Hyperparameters were selected using cross-validation and then tested in an unseen test set. The models developed used demographic, anthropometric, clinical, biochemistry and haematological data for mortality prediction. We found RF consistently outperformed GBT and predicted the mortality with AUROC of ≥0.87 in the test set when three or more laboratory measurements were included. However, after the inclusion of a fourth laboratory measurement, very minor predictive gains (AUROC 0.87 vs 0.88) resulted. The best predictors were the biochemistry and haematological measurements, with the top predictors being total CO2, potassium, creatinine and total calcium. CONCLUSIONS: Mortality in children admitted to ICU can be predicted with high accuracy using RF ML models in a real-life data set using multiple laboratory measurements with the most important features primarily coming from patient biochemistry and haematology.
Assuntos
Aprendizado de Máquina , Humanos , Bangladesh/epidemiologia , Lactente , Estudos Retrospectivos , Feminino , Masculino , Pré-Escolar , Recém-Nascido , Curva ROC , Mortalidade Hospitalar , Unidades de Terapia Intensiva/estatística & dados numéricosRESUMO
Research in machine learning (ML) algorithms using natural behavior (i.e., text, audio, and video data) suggests that these techniques could contribute to personalization in psychology and psychiatry. However, a systematic review of the current state of the art is missing. Moreover, individual studies often target ML experts who may overlook potential clinical implications of their findings. In a narrative accessible to mental health professionals, we present a systematic review conducted in 5 psychology and 2 computer science databases. We included 128 studies that assessed the predictive power of ML algorithms using text, audio, and/or video data in the prediction of anxiety and posttraumatic stress disorder. Most studies (n = 87) were aimed at predicting anxiety, while the remainder (n = 41) focused on posttraumatic stress disorder. They were mostly published since 2019 in computer science journals and tested algorithms using text (n = 72) as opposed to audio or video. Studies focused mainly on general populations (n = 92) and less on laboratory experiments (n = 23) or clinical populations (n = 13). Methodological quality varied, as did reported metrics of the predictive power, hampering comparison across studies. Two-thirds of studies, which focused on both disorders, reported acceptable to very good predictive power (including high-quality studies only). The results of 33 studies were uninterpretable, mainly due to missing information. Research into ML algorithms using natural behavior is in its infancy but shows potential to contribute to diagnostics of mental disorders, such as anxiety and posttraumatic stress disorder, in the future if standardization of methods, reporting of results, and research in clinical populations are improved.
Assuntos
Aprendizado de Máquina , Transtornos de Estresse Pós-Traumáticos , Humanos , Transtornos de Estresse Pós-Traumáticos/diagnóstico , Transtornos de Estresse Pós-Traumáticos/psicologia , Ansiedade/diagnóstico , Ansiedade/psicologia , AlgoritmosRESUMO
BACKGROUND: Reinforcement learning (RL) holds great promise for intensive care medicine given the abundant availability of data and frequent sequential decision-making. But despite the emergence of promising algorithms, RL driven bedside clinical decision support is still far from reality. Major challenges include trust and safety. To help address these issues, we introduce cross off-policy evaluation and policy restriction and show how detailed policy analysis may increase clinical interpretability. As an example, we apply these in the setting of RL to optimise ventilator settings in intubated covid-19 patients. METHODS: With data from the Dutch ICU Data Warehouse and using an exhaustive hyperparameter grid search, we identified an optimal set of Dueling Double-Deep Q Network RL models. The state space comprised ventilator, medication, and clinical data. The action space focused on positive end-expiratory pressure (peep) and fraction of inspired oxygen (FiO2) concentration. We used gas exchange indices as interim rewards, and mortality and state duration as final rewards. We designed a novel evaluation method called cross off-policy evaluation (OPE) to assess the efficacy of models under varying weightings between the interim and terminal reward components. In addition, we implemented policy restriction to prevent potentially hazardous model actions. We introduce delta-Q to compare physician versus policy action quality and in-depth policy inspection using visualisations. RESULTS: We created trajectories for 1118 intensive care unit (ICU) admissions and trained 69,120 models using 8 model architectures with 128 hyperparameter combinations. For each model, policy restrictions were applied. In the first evaluation step, 17,182/138,240 policies had good performance, but cross-OPE revealed suboptimal performance for 44% of those by varying the reward function used for evaluation. Clinical policy inspection facilitated assessment of action decisions for individual patients, including identification of action space regions that may benefit most from optimisation. CONCLUSION: Cross-OPE can serve as a robust evaluation framework for safe RL model implementation by identifying policies with good generalisability. Policy restriction helps prevent potentially unsafe model recommendations. Finally, the novel delta-Q metric can be used to operationalise RL models in clinical practice. Our findings offer a promising pathway towards application of RL in intensive care medicine and beyond.
RESUMO
INTRODUCTION: Temporal data has numerous challenges for deep learning such as irregularity of sampling. New algorithms are being developed that can handle these temporal challenges better. However, it is unclear how the performance ranges from classical non-temporal models to newly developed algorithms. Therefore, this study compares different non-temporal and temporal algorithms for a relevant use case, the prediction of atrial fibrillation (AF) using general practitioner (GP) data. METHODS: Three datasets with a 365-day observation window and prediction windows of 14, 180 and 360 days were used. Data consisted of medication, lab, symptom, and chronic diseases codings registered by the GP. The benchmark discarded temporality and used logistic regression, XGBoost models and neural networks on the presence of codings over the whole year. Pattern data extracted common patterns of GP codings and tested using the same algorithms. LSTM and CKConv models were trained as models incorporating temporality. RESULTS: Algorithms which incorporated temporality (LSTM and CKConv, (max AUC 0.734 at 360 days prediction window) outperformed both benchmark and pattern algorithms (max AUC 0.723, with a significant improvement using the 360 days prediction window (p = 0.04). The difference between the benchmark and the LSTM or CKConv algorithm decreased with smaller prediction windows, indicating temporal importance for longer prediction windows. The CKConv and LSTM algorithm performed similarly, possibly due to limited sequence length. CONCLUSION: Temporal models outperformed non-temporal models for the prediction of AF. For temporal models, CKConv is a promising algorithm to handle temporal data using GP data as it can handle irregular data.
Assuntos
Fibrilação Atrial , Clínicos Gerais , Humanos , Fibrilação Atrial/diagnóstico , Redes Neurais de Computação , Algoritmos , Modelos LogísticosRESUMO
Reinforcement Learning (RL) has recently found many applications in the healthcare domain thanks to its natural fit to clinical decision-making and ability to learn optimal decisions from observational data. A key challenge in adopting RL-based solution in clinical practice, however, is the inclusion of existing knowledge in learning a suitable solution. Existing knowledge from e.g. medical guidelines may improve the safety of solutions, produce a better balance between short- and long-term outcomes for patients and increase trust and adoption by clinicians. We present a framework for including knowledge available from medical guidelines in RL. The framework includes components for enforcing safety constraints and an approach that alters the learning signal to better balance short- and long-term outcomes based on these guidelines. We evaluate the framework by extending an existing RL-based mechanical ventilation (MV) approach with clinically established ventilation guidelines. Results from off-policy policy evaluation indicate that our approach has the potential to decrease 90-day mortality while ensuring lung protective ventilation. This framework provides an important stepping stone towards implementations of RL in clinical practice and opens up several avenues for further research.
Assuntos
Aprendizagem , Respiração Artificial , Humanos , Reforço Psicológico , Cuidados Críticos , Tomada de Decisão ClínicaRESUMO
OBJECTIVE: Reinforcement learning (RL) is a machine learning technique uniquely effective at sequential decision-making, which makes it potentially relevant to ICU treatment challenges. We set out to systematically review, assess level-of-readiness and meta-analyze the effect of RL on outcomes for critically ill patients. DATA SOURCES: A systematic search was performed in PubMed, Embase.com, Clarivate Analytics/Web of Science Core Collection, Elsevier/SCOPUS and the Institute of Electrical and Electronics Engineers Xplore Digital Library from inception to March 25, 2022, with subsequent citation tracking. DATA EXTRACTION: Journal articles that used an RL technique in an ICU population and reported on patient health-related outcomes were included for full analysis. Conference papers were included for level-of-readiness assessment only. Descriptive statistics, characteristics of the models, outcome compared with clinician's policy and level-of-readiness were collected. RL-health risk of bias and applicability assessment was performed. DATA SYNTHESIS: A total of 1,033 articles were screened, of which 18 journal articles and 18 conference papers, were included. Thirty of those were prototyping or modeling articles and six were validation articles. All articles reported RL algorithms to outperform clinical decision-making by ICU professionals, but only in retrospective data. The modeling techniques for the state-space, action-space, reward function, RL model training, and evaluation varied widely. The risk of bias was high in all articles, mainly due to the evaluation procedure. CONCLUSION: In this first systematic review on the application of RL in intensive care medicine we found no studies that demonstrated improved patient outcomes from RL-based technologies. All studies reported that RL-agent policies outperformed clinician policies, but such assessments were all based on retrospective off-policy evaluation.
Assuntos
Cuidados Críticos , Estado Terminal , Humanos , Estado Terminal/terapia , Estudos RetrospectivosRESUMO
PURPOSE: Non-pharmacological interventions (NPIs) play an important role in the management of older people receiving homecare. However, little is known about how often specific NPIs are being used and to what extent usage varies between countries. The aim of the current study was to investigate the prevalence of NPIs in older homecare recipients in six European countries. METHODS: This is a cross-sectional study of older homecare recipients (65+) using baseline data from the longitudinal cohort study 'Identifying best practices for care-dependent elderly by Benchmarking Costs and outcomes of community care' (IBenC). The analyzed NPIs are based on the interRAI Home Care instrument, a comprehensive geriatric assessment instrument. The prevalence of 24 NPIs was analyzed in Belgium, Germany, Finland, Iceland, Italy and the Netherlands. NPIs from seven groups were considered: psychosocial interventions, physical activity, regular care interventions, special therapies, preventive measures, special aids and environmental interventions. RESULTS: A total of 2884 homecare recipients were included. The mean age at baseline was 82.9 years and of all participants, 66.9% were female. The intervention with the highest prevalence in the study sample was 'emergency assistance available' (74%). Two other highly prevalent interventions were 'physical activity' (69%) and 'home nurse' (62%). Large differences between countries in the use of NPIs were observed and included, for example, 'going outside' (range 7-82%), 'home health aids' (range 12-93%), and 'physician visit' (range 24-94%). CONCLUSIONS: The use of NPIs varied considerably between homecare users in different European countries. It is important to better understand the barriers and facilitators of use of these potentially beneficial interventions in order to design successful uptake strategies.
Assuntos
Estudos Longitudinais , Humanos , Feminino , Idoso , Masculino , Prevalência , Estudos Transversais , Europa (Continente)/epidemiologia , Estudos de CoortesRESUMO
Objective: Congenital hypothyroidism (CH) is an inborn thyroid hormone (TH) deficiency mostly caused by thyroidal (primary CH) or hypothalamic/pituitary (central CH) disturbances. Most CH newborn screening (NBS) programs are thyroid-stimulating-hormone (TSH) based, thereby only detecting primary CH. The Dutch NBS is based on measuring total thyroxine (T4) from dried blood spots, aiming to detect primary and central CH at the cost of more false-positive referrals (FPRs) (positive predictive value (PPV) of 21% in 2007-2017). An artificial PPV of 26% was yielded when using a machine learning-based model on the adjusted dataset described based on the Dutch CH NBS. Recently, amino acids (AAs) and acylcarnitines (ACs) have been shown to be associated with TH concentration. We therefore aimed to investigate whether AAs and ACs measured during NBS can contribute to better performance of the CH screening in the Netherlands by using a revised machine learning-based model. Methods: Dutch NBS data between 2007 and 2017 (CH screening results, AAs and ACs) from 1079 FPRs, 515 newborns with primary (431) and central CH (84) and data from 1842 healthy controls were used. A random forest model including these data was developed. Results: The random forest model with an artificial sensitivity of 100% yielded a PPV of 48% and AUROC of 0.99. Besides T4 and TSH, tyrosine, and succinylacetone were the main parameters contributing to the model's performance. Conclusions: The PPV improved significantly (26-48%) by adding several AAs and ACs to our machine learning-based model, suggesting that adding these parameters benefits the current algorithm.
Assuntos
Hipotireoidismo Congênito , Recém-Nascido , Humanos , Hipotireoidismo Congênito/diagnóstico , Triagem Neonatal/métodos , Aminoácidos , TireotropinaRESUMO
OBJECTIVE: This work explores the perceptions of obstetrical clinicians about artificial intelligence (AI) in order to bridge the gap in uptake of AI between research and medical practice. Identifying potential areas where AI can contribute to clinical practice, enables AI research to align with the needs of clinicians and ultimately patients. DESIGN: Qualitative interview study. SETTING: A national study conducted in the Netherlands between November 2022 and February 2023. PARTICIPANTS: Dutch clinicians working in obstetrics with varying relevant work experience, gender and age. ANALYSIS: Thematic analysis of qualitative interview transcripts. RESULTS: Thirteen gynaecologists were interviewed about hypothetical scenarios of an implemented AI model. Thematic analysis identified two major themes: perceived usefulness and trust. Usefulness involved AI extending human brain capacity in complex pattern recognition and information processing, reducing contextual influence and saving time. Trust required validation, explainability and successful personal experience. This result shows two paradoxes: first, AI is expected to provide added value by surpassing human capabilities, yet also a need to understand the parameters and their influence on predictions for trust and adoption was expressed. Second, participants recognised the value of incorporating numerous parameters into a model, but they also believed that certain contextual factors should only be considered by humans, as it would be undesirable for AI models to use that information. CONCLUSIONS: Obstetricians' opinions on the potential value of AI highlight the need for clinician-AI researcher collaboration. Trust can be built through conventional means like randomised controlled trials and guidelines. Holistic impact metrics, such as changes in workflow, not just clinical outcomes, should guide AI model development. Further research is needed for evaluating evolving AI systems beyond traditional validation methods.
Assuntos
Inteligência Artificial , Obstetrícia , Feminino , Gravidez , Humanos , Pessoal de Saúde , Obstetra , Atenção à SaúdeRESUMO
INTRODUCTION: With the advent of artificial intelligence, the secondary use of routinely collected medical data from electronic healthcare records (EHR) has become increasingly popular. However, different EHR systems typically use different names for the same medical concepts. This obviously hampers scalable model development and subsequent clinical implementation for decision support. Therefore, converting original parameter names to a so-called ontology, a standardized set of predefined concepts, is necessary but time-consuming and labor-intensive. We therefore propose an augmented intelligence approach to facilitate ontology alignment by predicting correct concepts based on parameter names from raw electronic health record data exports. METHODS: We used the manually mapped parameter names from the multicenter "Dutch ICU data warehouse against COVID-19" sourced from three types of EHR systems to train machine learning models for concept mapping. Data from 29 intensive care units on 38,824 parameters mapped to 1,679 relevant and unique concepts and 38,069 parameters labeled as irrelevant were used for model development and validation. We used the Natural Language Toolkit (NLTK) to preprocess the parameter names based on WordNet cognitive synonyms transformed by term-frequency inverse document frequency (TF-IDF), yielding numeric features. We then trained linear classifiers using stochastic gradient descent for multi-class prediction. Finally, we fine-tuned these predictions using information on distributions of the data associated with each parameter name through similarity score and skewness comparisons. RESULTS: The initial model, trained using data from one hospital organization for each of three EHR systems, scored an overall top 1 precision of 0.744, recall of 0.771, and F1-score of 0.737 on a total of 58,804 parameters. Leave-one-hospital-out analysis returned an average top 1 recall of 0.680 for relevant parameters, which increased to 0.905 for the top 5 predictions. When reducing the training dataset to only include relevant parameters, top 1 recall was 0.811 and top 5 recall was 0.914 for relevant parameters. Performance improvement based on similarity score or skewness comparisons affected at most 5.23% of numeric parameters. CONCLUSION: Augmented intelligence is a promising method to improve concept mapping of parameter names from raw electronic health record data exports. We propose a robust method for mapping data across various domains, facilitating the integration of diverse data sources. However, recall is not perfect, and therefore manual validation of mapping remains essential.
RESUMO
INTRODUCTION: In ageing societies, the number of older adults with complex chronic conditions (CCCs) is rapidly increasing. Care for older persons with CCCs is challenging, due to interactions between multiple conditions and their treatments. In home care and nursing homes, where most older persons with CCCs receive care, professionals often lack appropriate decision support suitable and sufficient to address the medical and functional complexity of persons with CCCs. This EU-funded project aims to develop decision support systems using high-quality, internationally standardised, routine care data to support better prognostication of health trajectories and treatment impact among older persons with CCCs. METHODS AND ANALYSIS: Real-world data from older persons aged ≥60 years in home care and nursing homes, based on routinely performed comprehensive geriatric assessments using interRAI systems collected in the past 20 years, will be linked with administrative repositories on mortality and care use. These include potentially up to 51 million care recipients from eight countries: Italy, the Netherlands, Finland, Belgium, Canada, USA, Hong Kong and New Zealand. Prognostic algorithms will be developed and validated to better predict various health outcomes. In addition, the modifying impact of pharmacological and non-pharmacological interventions will be examined. A variety of analytical methods will be used, including techniques from the field of artificial intelligence such as machine learning. Based on the results, decision support tools will be developed and pilot tested among health professionals working in home care and nursing homes. ETHICS AND DISSEMINATION: The study was approved by authorised medical ethical committees in each of the participating countries, and will comply with both local and EU legislation. Study findings will be shared with relevant stakeholders, including publications in peer-reviewed journals and presentations at national and international meetings.
Assuntos
Inteligência Artificial , Serviços de Assistência Domiciliar , Humanos , Idoso , Idoso de 80 Anos ou mais , Envelhecimento , Algoritmos , Doença Crônica , Estudos Observacionais como AssuntoRESUMO
OBJECTIVE: The Dutch Congenital hypothyroidism (CH) Newborn Screening (NBS) algorithm for thyroidal and central congenital hypothyroidism (CH-T and CH-C, respectively) is primarily based on determination of thyroxine (T4) concentrations in dried blood spots, followed by thyroid-stimulating hormone (TSH) and thyroxine-binding globulin (TBG) measurements enabling detection of both CH-T and CH-C, with a positive predictive value (PPV) of 21%. A calculated T4/TBG ratio serves as an indirect measure for free T4. The aim of this study is to investigate whether machine learning techniques can help to improve the PPV of the algorithm without missing the positive cases that should have been detected with the current algorithm. DESIGN & METHODS: NBS data and parameters of CH patients and false-positive referrals in the period 2007-2017 and of a healthy reference population were included in the study. A random forest model was trained and tested using a stratified split and improved using synthetic minority oversampling technique (SMOTE). NBS data of 4668 newborns were included, containing 458 CH-T and 82 CH-C patients, 2332 false-positive referrals and 1670 healthy newborns. RESULTS: Variables determining identification of CH were (in order of importance) TSH, T4/TBG ratio, gestational age, TBG, T4 and age at NBS sampling. In a Receiver-Operating Characteristic (ROC) analysis on the test set, current sensitivity could be maintained, while increasing the PPV to 26%. CONCLUSIONS: Machine learning techniques have the potential to improve the PPV of the Dutch CH NBS. However, improved detection of currently missed cases is only possible with new, better predictors of especially CH-C and a better registration and inclusion of these cases in future models.
Assuntos
Hipotireoidismo Congênito , Aprendizado de Máquina , Triagem Neonatal , Algoritmo Florestas Aleatórias , Humanos , Hipotireoidismo Congênito/diagnóstico , Tiroxina/análise , Subunidade alfa de Hormônios Glicoproteicos/análise , Globulina de Ligação a Tiroxina/análise , Reações Falso-Positivas , Algoritmos , Idade Gestacional , Recém-NascidoRESUMO
BACKGROUND: Identification of clinical phenotypes in critically ill COVID-19 patients could improve understanding of the disease heterogeneity and enable prognostic and predictive enrichment. However, previous attempts did not take into account temporal dynamics with high granularity. By including the dimension of time, we aim to gain further insights into the heterogeneity of COVID-19. METHODS: We used granular data from 3202 adult COVID patients in the Dutch Data Warehouse that were admitted to one of 25 Dutch ICUs between February 2020 and March 2021. Parameters including demographics, clinical observations, medications, laboratory values, vital signs, and data from life support devices were selected. Twenty-one datasets were created that each covered 24 h of ICU data for each day of ICU treatment. Clinical phenotypes in each dataset were identified by performing cluster analyses. Both evolution of the clinical phenotypes over time and patient allocation to these clusters over time were tracked. RESULTS: The final patient cohort consisted of 2438 COVID-19 patients with a ICU mortality outcome. Forty-one parameters were chosen for cluster analysis. On admission, both a mild and a severe clinical phenotype were found. After day 4, the severe phenotype split into an intermediate and a severe phenotype for 11 consecutive days. Heterogeneity between phenotypes appears to be driven by inflammation and dead space ventilation. During the 21-day period, only 8.2% and 4.6% of patients in the initial mild and severe clusters remained assigned to the same phenotype respectively. The clinical phenotype half-life was between 5 and 6 days for the mild and severe phenotypes, and about 3 days for the medium severe phenotype. CONCLUSIONS: Patients typically do not remain in the same cluster throughout intensive care treatment. This may have important implications for prognostic or predictive enrichment. Prominent dissimilarities between clinical phenotypes are predominantly driven by inflammation and dead space ventilation.
Assuntos
COVID-19 , Humanos , COVID-19/terapia , SARS-CoV-2 , Aprendizado de Máquina não Supervisionado , Cuidados Críticos , Unidades de Terapia Intensiva , Inflamação , Fenótipo , Estado Terminal/terapiaRESUMO
BACKGROUND: For mechanically ventilated critically ill COVID-19 patients, prone positioning has quickly become an important treatment strategy, however, prone positioning is labor intensive and comes with potential adverse effects. Therefore, identifying which critically ill intubated COVID-19 patients will benefit may help allocate labor resources. METHODS: From the multi-center Dutch Data Warehouse of COVID-19 ICU patients from 25 hospitals, we selected all 3619 episodes of prone positioning in 1142 invasively mechanically ventilated patients. We excluded episodes longer than 24 h. Berlin ARDS criteria were not formally documented. We used supervised machine learning algorithms Logistic Regression, Random Forest, Naive Bayes, K-Nearest Neighbors, Support Vector Machine and Extreme Gradient Boosting on readily available and clinically relevant features to predict success of prone positioning after 4 h (window of 1 to 7 h) based on various possible outcomes. These outcomes were defined as improvements of at least 10% in PaO2/FiO2 ratio, ventilatory ratio, respiratory system compliance, or mechanical power. Separate models were created for each of these outcomes. Re-supination within 4 h after pronation was labeled as failure. We also developed models using a 20 mmHg improvement cut-off for PaO2/FiO2 ratio and using a combined outcome parameter. For all models, we evaluated feature importance expressed as contribution to predictive performance based on their relative ranking. RESULTS: The median duration of prone episodes was 17 h (11-20, median and IQR, N = 2632). Despite extensive modeling using a plethora of machine learning techniques and a large number of potentially clinically relevant features, discrimination between responders and non-responders remained poor with an area under the receiver operator characteristic curve of 0.62 for PaO2/FiO2 ratio using Logistic Regression, Random Forest and XGBoost. Feature importance was inconsistent between models for different outcomes. Notably, not even being a previous responder to prone positioning, or PEEP-levels before prone positioning, provided any meaningful contribution to predicting a successful next proning episode. CONCLUSIONS: In mechanically ventilated COVID-19 patients, predicting the success of prone positioning using clinically relevant and readily available parameters from electronic health records is currently not feasible. Given the current evidence base, a liberal approach to proning in all patients with severe COVID-19 ARDS is therefore justified and in particular regardless of previous results of proning.
RESUMO
OBJECTIVES: Heart failure (HF) is a commonly occurring health problem with high mortality and morbidity. If potential cases could be detected earlier, it may be possible to intervene earlier, which may slow progression in some patients. Preferably, it is desired to reuse already measured data for screening of all persons in an age group, such as general practitioner (GP) data. Furthermore, it is essential to evaluate the number of people needed to screen to find one patient using true incidence rates, as this indicates the generalisability in the true population. Therefore, we aim to create a machine learning model for the prediction of HF using GP data and evaluate the number needed to screen with true incidence rates. DESIGN, SETTINGS AND PARTICIPANTS: GP data from 8543 patients (-2 to -1 year before diagnosis) and controls aged 70+ years were obtained retrospectively from 01 January 2012 to 31 December 2019 from the Nivel Primary Care Database. Codes about chronic illness, complaints, diagnostics and medication were obtained. Data were split in a train/test set. Datasets describing demographics, the presence of codes (non-sequential) and upon each other following codes (sequential) were created. Logistic regression, random forest and XGBoost models were trained. Predicted outcome was the presence of HF after 1 year. The ratio case:control in the test set matched true incidence rates (1:45). RESULTS: Sole demographics performed average (area under the curve (AUC) 0.692, CI 0.677 to 0.706). Adding non-sequential information combined with a logistic regression model performed best and significantly improved performance (AUC 0.772, CI 0.759 to 0.785, p<0.001). Further adding sequential information did not alter performance significantly (AUC 0.767, CI 0.754 to 0.780, p=0.07). The number needed to screen dropped from 14.11 to 5.99 false positives per true positive. CONCLUSION: This study created a model able to identify patients with pending HF a year before diagnosis.
Assuntos
Clínicos Gerais , Insuficiência Cardíaca , Algoritmos , Estudos de Casos e Controles , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Humanos , Aprendizado de Máquina , Estudos RetrospectivosRESUMO
BACKGROUND AND OBJECTIVES: Outcome prediction of preterm birth is important for neonatal care, yet prediction performance using conventional statistical models remains insufficient. Machine learning has a high potential for complex outcome prediction. In this scoping review, we provide an overview of the current applications of machine learning models in the prediction of neurodevelopmental outcomes in preterm infants, assess the quality of the developed models, and provide guidance for future application of machine learning models to predict neurodevelopmental outcomes of preterm infants. METHODS: A systematic search was performed using PubMed. Studies were included if they reported on neurodevelopmental outcome prediction in preterm infants using predictors from the neonatal period and applying machine learning techniques. Data extraction and quality assessment were independently performed by 2 reviewers. RESULTS: Fourteen studies were included, focusing mainly on very or extreme preterm infants, predicting neurodevelopmental outcome before age 3 years, and mostly assessing outcomes using the Bayley Scales of Infant Development. Predictors were most often based on MRI. The most prevalent machine learning techniques included linear regression and neural networks. None of the studies met all newly developed quality assessment criteria. Studies least prone to inflated performance showed promising results, with areas under the curve up to 0.86 for classification and R2 values up to 91% in continuous prediction. A limitation was that only 1 data source was used for the literature search. CONCLUSIONS: Studies least prone to inflated prediction results are the most promising. The provided evaluation framework may contribute to improved quality of future machine learning models.
Assuntos
Recém-Nascido Prematuro , Nascimento Prematuro , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Recém-Nascido , Aprendizado de Máquina , Imageamento por Ressonância MagnéticaRESUMO
In ever more pressured health-care systems, technological solutions offering scalability of care and better resource targeting are appealing. Research on machine learning as a technique for identifying individuals at risk of suicidal ideation, suicide attempts, and death has grown rapidly. This research often places great emphasis on the promise of machine learning for preventing suicide, but overlooks the practical, clinical implementation issues that might preclude delivering on such a promise. In this Review, we synthesise the broad empirical and review literature on electronic health record-based machine learning in suicide research, and focus on matters of crucial importance for implementation of machine learning in clinical practice. The challenge of preventing statistically rare outcomes is well known; progress requires tackling data quality, transparency, and ethical issues. In the future, machine learning models might be explored as methods to enable targeting of interventions to specific individuals depending upon their level of need-ie, for precision medicine. Primarily, however, the promise of machine learning for suicide prevention is limited by the scarcity of high-quality scalable interventions available to individuals identified by machine learning as being at risk of suicide.
Assuntos
Aprendizado de Máquina , Tentativa de Suicídio/prevenção & controle , Técnicas de Apoio para a Decisão , Humanos , Projetos de Pesquisa , Ideação SuicidaRESUMO
In population pharmacokinetic (PK) models, interindividual variability is explained by implementation of covariates in the model. The widely used forward stepwise selection method is sensitive to bias, which may lead to an incorrect inclusion of covariates. Alternatives, such as the full fixed effects model, reduce this bias but are dependent on the chosen implementation of each covariate. As the correct functional forms are unknown, this may still lead to an inaccurate selection of covariates. Machine learning (ML) techniques can potentially be used to learn the optimal functional forms for implementing covariates directly from data. A recent study suggested that using ML resulted in an improved selection of influential covariates. However, how do we select the appropriate functional form for including these covariates? In this work, we use SHapley Additive exPlanations (SHAP) to infer the relationship between covariates and PK parameters from ML models. As a case-study, we use data from 119 patients with hemophilia A receiving clotting factor VIII concentrate peri-operatively. We fit both a random forest and a XGBoost model to predict empirical Bayes estimated clearance and central volume from a base nonlinear mixed effects model. Next, we show that SHAP reveals covariate relationships which match previous findings. In addition, we can reveal subtle effects arising from combinations of covariates difficult to obtain using other methods of covariate analysis. We conclude that the proposed method can be used to extend ML-based covariate selection, and holds potential as a complete full model alternative to classical covariate analyses.
Assuntos
Fator VIII , Hemofilia A , Humanos , Teorema de Bayes , Hemofilia A/tratamento farmacológico , Cinética , Aprendizado de MáquinaRESUMO
Unexpected ICU readmission is associated with longer length of stay and increased mortality. To prevent ICU readmission and death after ICU discharge, our team of intensivists and data scientists aimed to use AmsterdamUMCdb to develop an explainable machine learning-based real-time bedside decision support tool. DERIVATION COHORT: Data from patients admitted to a mixed surgical-medical academic medical center ICU from 2004 to 2016. VALIDATION COHORT: Data from 2016 to 2019 from the same center. PREDICTION MODEL: Patient characteristics, clinical observations, physiologic measurements, laboratory studies, and treatment data were considered as model features. Different supervised learning algorithms were trained to predict ICU readmission and/or death, both within 7 days from ICU discharge, using 10-fold cross-validation. Feature importance was determined using SHapley Additive exPlanations, and readmission probability-time curves were constructed to identify subgroups. Explainability was established by presenting individualized risk trends and feature importance. RESULTS: Our final derivation dataset included 14,105 admissions. The combined readmission/mortality rate within 7 days of ICU discharge was 5.3%. Using Gradient Boosting, the model achieved an area under the receiver operating characteristic curve of 0.78 (95% CI, 0.75-0.81) and an area under the precision-recall curve of 0.19 on the validation cohort (n = 3,929). The most predictive features included common physiologic parameters but also less apparent variables like nutritional support. At a 6% risk threshold, the model showed a sensitivity (recall) of 0.72, specificity of 0.70, and a positive predictive value (precision) of 0.15. Impact analysis using probability-time curves and the 6% risk threshold identified specific patient groups at risk and the potential of a change in discharge management to reduce relative risk by 14%. CONCLUSIONS: We developed an explainable machine learning model that may aid in identifying patients at high risk for readmission and mortality after ICU discharge using the first freely available European critical care database, AmsterdamUMCdb. Impact analysis showed that a relative risk reduction of 14% could be achievable, which might have significant impact on patients and society. ICU data sharing facilitates collaboration between intensivists and data scientists to accelerate model development.
RESUMO
INTRODUCTION: Early recognition of individuals with increased risk of sudden cardiac arrest (SCA) remains challenging. SCA research so far has used data from cardiologist care, but missed most SCA victims, since they were only in general practitioner (GP) care prior to SCA. Studying individuals with type 2 diabetes (T2D) in GP care may help solve this problem, as they have increased risk for SCA, and rich clinical datasets, since they regularly visit their GP for check-up measurements. This information can be further enriched with extensive genetic and metabolic information. AIM: To describe the study protocol of the REcognition of Sudden Cardiac arrest vUlnErability in Diabetes (RESCUED) project, which aims at identifying clinical, genetic and metabolic factors contributing to SCA risk in individuals with T2D, and to develop a prognostic model for the risk of SCA. METHODS: The RESCUED project combines data from dedicated SCA and T2D cohorts, and GP data, from the same region in the Netherlands. Clinical data, genetic data (common and rare variant analysis) and metabolic data (metabolomics) will be analysed (using classical analysis techniques and machine learning methods) and combined into a prognostic model for risk of SCA. CONCLUSION: The RESCUED project is designed to increase our ability at early recognition of elevated SCA risk through an innovative strategy of focusing on GP data and a multidimensional methodology including clinical, genetic and metabolic analyses.