Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Med Inform Decis Mak ; 24(1): 62, 2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38438861

RESUMEN

BACKGROUND: Variation in laboratory healthcare data due to seasonal changes is a widely accepted phenomenon. Seasonal variation is generally not systematically accounted for in healthcare settings. This study applies a newly developed adjustment method for seasonal variation to analyze the effect seasonality has on machine learning model classification of diagnoses. METHODS: Machine learning methods were trained and tested on ~ 22 million unique records from ~ 575,000 unique patients admitted to Danish hospitals. Four machine learning models (adaBoost, decision tree, neural net, and random forest) classifying 35 diseases of the circulatory system (ICD-10 diagnosis codes, chapter IX) were run before and after seasonal adjustment of 23 laboratory reference intervals (RIs). The effect of the adjustment was benchmarked via its contribution to machine learning models trained using hyperparameter optimization and assessed quantitatively using performance metrics (AUROC and AUPRC). RESULTS: Seasonally adjusted RIs significantly improved cardiovascular disease classification in 24 of the 35 tested cases when using neural net models. Features with the highest average feature importance (via SHAP explainability) across all disease models were sex, C- reactive protein, and estimated glomerular filtration. Classification of diseases of the vessels, such as thrombotic diseases and other atherosclerotic diseases consistently improved after seasonal adjustment. CONCLUSIONS: As data volumes increase and data-driven methods are becoming more advanced, it is essential to improve data quality at the pre-processing level. This study presents a method that makes it feasible to introduce seasonally adjusted RIs into the clinical research space in any disease domain. Seasonally adjusted RIs generally improve diagnoses classification and thus, ought to be considered and adjusted for in clinical decision support methods.


Asunto(s)
Enfermedades Cardiovasculares , Humanos , Enfermedades Cardiovasculares/diagnóstico , Laboratorios , Instituciones de Salud , Exactitud de los Datos , Aprendizaje Automático
2.
Lancet Digit Health ; 6(6): e396-e406, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38789140

RESUMEN

BACKGROUND: Health care is experiencing a drive towards digitisation, and many countries are implementing national health data resources. Although a range of cancer risk models exists, the utility on a population level for risk stratification across cancer types has not been fully explored. We aimed to close this gap by evaluating pan-cancer risk models built on electronic health records across the Danish population with validation in the UK Biobank. METHODS: In this retrospective modelling and validation study, data for model development and internal validation were derived from the following Danish health registries: the Central Person Registry, the Danish National Patient Registry, the death registry, the cancer registry, and full-text medical records from secondary care records in the capital region. The development data included adults aged 16-86 years without previous malignant cancers in the time period from Jan 1, 1995, to Dec 31, 2014. The internal validation period was from Jan 1, 2015, to April 10, 2018, and the data included all adults without a previous indication of cancer aged 16-75 years on Dec 31, 2014. The external validation cohort from the UK Biobank included all adults without a previous indication of cancer aged 50-75 years. We used time-dependent Bayesian Cox hazard models built on the combined medical history of Danish individuals. A set of 1392 covariates from available clinical disease trajectories, text-mined basic health factors, and family histories were used to train predictive models of 20 major cancer types. The models were validated on cancer incidence between 2015 and 2018 across Denmark and on individuals in the UK Biobank. The primary outcomes were discrimination and calibration performance. FINDINGS: From the Danish registries, we included 6 732 553 individuals covering 60 million hospital visits, 90 million diagnoses, and a total of 193 million life-years between Jan 1, 1978, and April 10, 2018. Danish registry data covering the period from Jan 1, 2015, to April 10, 2018, were used to internally validate risk models, containing a total of 4 248 491 individuals who remained at risk of a primary malignant cancer diagnosis and 67 401 cancer cases recorded. For the external validation, we evaluated the same time period in the UK Biobank covering 377 004 individuals with 11 486 cancer cases. The predictive performance of the models on Danish data showed good discrimination (concordance index 0·81 [SD 0·08], ranging from 0·66 [95% CI 0·65-0·67] for cervix uteri cancer to 0·91 [0·90-0·92] for liver cancer). Performance was similar on the UK Biobank in a direct transfer when controlling for shifts in the age distribution (concordance index 0·66 [SD 0·08], ranging from 0·55 [95% CI 0·44-0·66] for cervix uteri cancer to 0·78 [0·77-0·79] for lung cancer). Cancer risks were associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system (oesophageal, stomach, colorectal, liver, and pancreatic) but also thyroid, kidney, and uterine cancers. INTERPRETATION: Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health-care systems. With the emergence of multi-cancer early detection tests, electronic health record-based risk models could supplement screening efforts. FUNDING: Novo Nordisk Foundation and the Danish Innovation Foundation.


Asunto(s)
Registros Electrónicos de Salud , Neoplasias , Humanos , Persona de Mediana Edad , Anciano , Adulto , Dinamarca/epidemiología , Femenino , Estudios Retrospectivos , Masculino , Neoplasias/epidemiología , Adolescente , Medición de Riesgo/métodos , Adulto Joven , Anciano de 80 o más Años , Reino Unido/epidemiología , Sistema de Registros , Teorema de Bayes , Modelos de Riesgos Proporcionales , Factores de Riesgo
3.
PLOS Digit Health ; 2(6): e0000116, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37294826

RESUMEN

Frequent assessment of the severity of illness for hospitalized patients is essential in clinical settings to prevent outcomes such as in-hospital mortality and unplanned admission to the intensive care unit (ICU). Classical severity scores have been developed typically using relatively few patient features. Recently, deep learning-based models demonstrated better individualized risk assessments compared to classic risk scores, thanks to the use of aggregated and more heterogeneous data sources for dynamic risk prediction. We investigated to what extent deep learning methods can capture patterns of longitudinal change in health status using time-stamped data from electronic health records. We developed a deep learning model based on embedded text from multiple data sources and recurrent neural networks to predict the risk of the composite outcome of unplanned ICU transfer and in-hospital death. The risk was assessed at regular intervals during the admission for different prediction windows. Input data included medical history, biochemical measurements, and clinical notes from a total of 852,620 patients admitted to non-intensive care units in 12 hospitals in Denmark's Capital Region and Region Zealand during 2011-2016 (with a total of 2,241,849 admissions). We subsequently explained the model using the Shapley algorithm, which provides the contribution of each feature to the model outcome. The best model used all data modalities with an assessment rate of 6 hours, a prediction window of 14 days and an area under the receiver operating characteristic curve of 0.898. The discrimination and calibration obtained with this model make it a viable clinical support tool to detect patients at higher risk of clinical deterioration, providing clinicians insights into both actionable and non-actionable patient features.

4.
Nat Med ; 29(5): 1113-1122, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37156936

RESUMEN

Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection. In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet). For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Cross-application of the Danish model to US-VA data had lower performance (AUROC = 0.71), and retraining was needed to improve performance (AUROC = 0.78, AUROC (3m) = 0.76). These results improve the ability to design realistic surveillance programs for patients at elevated risk, potentially benefiting lifespan and quality of life by early detection of this aggressive cancer.


Asunto(s)
Aprendizaje Profundo , Neoplasias Pancreáticas , Humanos , Persona de Mediana Edad , Inteligencia Artificial , Calidad de Vida , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/epidemiología , Algoritmos , Neoplasias Pancreáticas
5.
Clin Epidemiol ; 14: 213-223, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35228820

RESUMEN

PURPOSE: Dosing of renally cleared drugs in patients with kidney failure often deviates from clinical guidelines, so we sought to elicit predictors of receiving inappropriate doses of renal risk drugs. PATIENTS AND METHODS: We combined data from the Danish National Patient Register and in-hospital data on drug administrations and estimated glomerular filtration rates for admissions between 1 October 2009 and 1 June 2016, from a pool of about 2.6 million persons. We trained artificial neural network and linear logistic ridge regression models to predict the risk of five outcomes (>0, ≥1, ≥2, ≥3 and ≥5 inappropriate doses daily) with index set 24 hours after admission. We used time-series validation for evaluating discrimination, calibration, clinical utility and explanations. RESULTS: Of 52,451 admissions included, 42,250 (81%) were used for model development. The median age was 77 years; 50% of admissions were of women. ≥5 drugs were used between admission start and index in 23,124 admissions (44%); the most common drug classes were analgesics, systemic antibacterials, diuretics, antithrombotics, and antacids. The neural network models had better discriminative power (all AUROCs between 0.77 and 0.81) and were better calibrated than their linear counterparts. The main prediction drivers were use of anti-inflammatory, antidiabetic and anti-Parkinson's drugs as well as having a diagnosis of chronic kidney failure. Sex and age affected predictions but slightly. CONCLUSION: Our models can flag patients at high risk of receiving at least one inappropriate dose daily in a controlled in-silico setting. A prospective clinical study may confirm that this holds in real-life settings and translates into benefits in hard endpoints.

6.
Basic Clin Pharmacol Toxicol ; 131(4): 282-293, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35834334

RESUMEN

We sought to craft a drug safety signalling pipeline associating latent information in clinical free text with exposures to single drugs and drug pairs. Data arose from 12 secondary and tertiary public hospitals in two Danish regions, comprising approximately half the Danish population. Notes were operationalised with a fastText embedding, based on which we trained 10 270 neural-network models (one for each distinct single-drug/drug-pair exposure) predicting the risk of exposure given an embedding vector. We included 2 905 251 admissions between May 2008 and June 2016, with 13 740 564 distinct drug prescriptions; the median number of prescriptions was 5 (IQR: 3-9) and in 1 184 340 (41%) admissions patients used ≥5 drugs concomitantly. A total of 10 788 259 clinical notes were included, with 179 441 739 tokens retained after pruning. Of 345 single-drug signals reviewed, 28 (8.1%) represented possibly undescribed relationships; 186 (54%) signals were clinically meaningful. Sixteen (14%) of the 115 drug-pair signals were possible interactions, and two (1.7%) were known. In conclusion, we built a language-agnostic pipeline for mining associations between free-text information and medication exposure without manual curation, predicting not the likely outcome of a range of exposures but also the likely exposures for outcomes of interest. Our approach may help overcome limitations of text mining methods relying on curated data in English and can help leverage non-English free text for pharmacovigilance.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Procesamiento de Lenguaje Natural , Minería de Datos/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/etiología , Registros Electrónicos de Salud , Hospitales , Humanos , Lenguaje
7.
NPJ Digit Med ; 5(1): 142, 2022 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-36104486

RESUMEN

Prediction of survival for patients in intensive care units (ICUs) has been subject to intense research. However, no models exist that embrace the multiverse of data in ICUs. It is an open question whether deep learning methods using automated data integration with minimal pre-processing of mixed data domains such as free text, medical history and high-frequency data can provide discrete-time survival estimates for individual ICU patients. We trained a deep learning model on data from patients admitted to ten ICUs in the Capital Region of Denmark and the Region of Southern Denmark between 2011 and 2018. Inspired by natural language processing we mapped the electronic patient record data to an embedded representation and fed the data to a recurrent neural network with a multi-label output layer representing the chance of survival at different follow-up times. We evaluated the performance using the time-dependent concordance index. In addition, we quantified and visualized the drivers of survival predictions using the SHAP methodology. We included 37,355 admissions of 29,417 patients in our study. Our deep learning models outperformed traditional Cox proportional-hazard models with concordance index in the ranges 0.72-0.73, 0.71-0.72, 0.71, and 0.69-0.70, for models applied at baseline 0, 24, 48, and 72 h, respectively. Deep learning models based on a combination of entity embeddings and survival modelling is a feasible approach to obtain individualized survival estimates in data-rich settings such as the ICU. The interpretable nature of the models enables us to understand the impact of the different data domains.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA