Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Comput Biol Med ; 175: 108548, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38718666

RESUMO

The aim of this work is to develop and evaluate a deep classifier that can effectively prioritize Emergency Medical Call Incidents (EMCI) according to their life-threatening level under the presence of dataset shifts. We utilized a dataset consisting of 1982746 independent EMCI instances obtained from the Health Services Department of the Region of Valencia (Spain), with a time span from 2009 to 2019 (excluding 2013). The dataset includes free text dispatcher observations recorded during the call, as well as a binary variable indicating whether the event was life-threatening. To evaluate the presence of dataset shifts, we examined prior probability shifts, covariate shifts, and concept shifts. Subsequently, we designed and implemented four deep Continual Learning (CL) strategies-cumulative learning, continual fine-tuning, experience replay, and synaptic intelligence-alongside three deep CL baselines-joint training, static approach, and single fine-tuning-based on DistilBERT models. Our results demonstrated evidence of prior probability shifts, covariate shifts, and concept shifts in the data. Applying CL techniques had a statistically significant (α=0.05) positive impact on both backward and forward knowledge transfer, as measured by the F1-score, compared to non-continual approaches. We can argue that the utilization of CL techniques in the context of EMCI is effective in adapting deep learning classifiers to changes in data distributions, thereby maintaining the stability of model performance over time. To our knowledge, this study represents the first exploration of a CL approach using real EMCI data.


Assuntos
Aprendizado Profundo , Humanos , Bases de Dados Factuais , Espanha , Serviços Médicos de Emergência
2.
J Clin Med ; 13(2)2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38276097

RESUMO

(1) Background: Our aim was to determine changes in the prevalence of physical activity (PA) in adults with asthma between 2014 and 2020 in Spain, investigate sex differences and the effect of other variables on adherence to PA, and compare the prevalence of PA between individuals with and without asthma. (2) Methods: This study was a cross-sectional, population-based, matched, case-control study using European Health Interview Surveys for Spain (EHISS) for 2014 and 2020. (3) Results: We identified 1262 and 1103 patients with asthma in the 2014 and 2020 EHISS, respectively. The prevalence of PA remained stable (57.2% vs. 55.7%, respectively), while the percentage of persons who reported walking continuously for at least 2 days a week increased from 73.9% to 82.2% (p < 0.001). Male sex, younger age, better self-rated health, and lower body mass index (BMI) were significantly associated with greater PA. From 2014 to 2020, the number of walking days ≥2 increased by 64% (OR1.64 95%CI 1.34-2.00). Asthma was associated with less PA (OR0.87 95%CI 0.47-0.72) and a lower number of walking days ≥2 (OR0.84 95%0.72-0.97). (4) Conclusions: Walking frequency improved over time among people with asthma. Differences in PA were detected by age, sex, self-rated health status, and BMI. Asthma was associated with less LTPA and a lower number of walking days ≥2.

3.
Respir Med ; 220: 107458, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37951312

RESUMO

OBJECTIVES: To evaluate trends in the prevalence of physical activity (PA) from 2014 to 2020; to identify sex differences and sociodemographic and health-related factors associated with PA in individuals with chronic obstructive pulmonary disease (COPD); and to compare PA between individuals with and without COPD. METHODS: Cross-sectional and case-control study. SOURCE: European Health Interview Surveys for Spain (EHISS) conducted in 2014 and 2020. We included sociodemographic and health-related covariates. We compared individuals with and without COPD after matching for age and sex. RESULTS: The number of adults with COPD was 1086 and 910 in EHISS2014 and EHISS2020, respectively. In this population, self-reported "Medium or high frequency of PA" remained stable (42.9% in 2014 and 43.5% in 2020; p = 0.779). However, the percentage who walked on two or more days per week rose significantly over time (63.4%-69.9%; p = 0.004). Men with COPD reported more PA than women with COPD in both surveys. After matching, significantly lower levels of PA were recorded in COPD patients than in adults without COPD. Multivariable logistic regression confirmed this trend in COPD patients and showed that male sex, younger age, higher educational level, very good/good self-perceived health, and absence of comorbidities, obesity, and smoking were associated with more frequent PA. CONCLUSIONS: The temporal trend in PA among Spanish adults with COPD is favorable, although there is much room for improvement. Insufficient PA is more prevalent in these patients than in the general population. Sex differences were found, with significantly more frequent PA among males with COPD.


Assuntos
Doença Pulmonar Obstrutiva Crônica , Caracteres Sexuais , Adulto , Humanos , Masculino , Feminino , Estudos de Casos e Controles , Espanha/epidemiologia , Estudos Transversais , Doença Pulmonar Obstrutiva Crônica/epidemiologia , Doença Pulmonar Obstrutiva Crônica/complicações , Exercício Físico
4.
EClinicalMedicine ; 64: 102212, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37745025

RESUMO

Background: Multisystem inflammatory syndrome in children (MIS-C) is a severe complication of SARS-CoV-2 infection. It remains unclear how MIS-C phenotypes vary across SARS-CoV-2 variants. We aimed to investigate clinical characteristics and outcomes of MIS-C across SARS-CoV-2 eras. Methods: We performed a multicentre observational retrospective study including seven paediatric hospitals in four countries (France, Spain, U.K., and U.S.). All consecutive confirmed patients with MIS-C hospitalised between February 1st, 2020, and May 31st, 2022, were included. Electronic Health Records (EHR) data were used to calculate pooled risk differences (RD) and effect sizes (ES) at site level, using Alpha as reference. Meta-analysis was used to pool data across sites. Findings: Of 598 patients with MIS-C (61% male, 39% female; mean age 9.7 years [SD 4.5]), 383 (64%) were admitted in the Alpha era, 111 (19%) in the Delta era, and 104 (17%) in the Omicron era. Compared with patients admitted in the Alpha era, those admitted in the Delta era were younger (ES -1.18 years [95% CI -2.05, -0.32]), had fewer respiratory symptoms (RD -0.15 [95% CI -0.33, -0.04]), less frequent non-cardiogenic shock or systemic inflammatory response syndrome (SIRS) (RD -0.35 [95% CI -0.64, -0.07]), lower lymphocyte count (ES -0.16 × 109/uL [95% CI -0.30, -0.01]), lower C-reactive protein (ES -28.5 mg/L [95% CI -46.3, -10.7]), and lower troponin (ES -0.14 ng/mL [95% CI -0.26, -0.03]). Patients admitted in the Omicron versus Alpha eras were younger (ES -1.6 years [95% CI -2.5, -0.8]), had less frequent SIRS (RD -0.18 [95% CI -0.30, -0.05]), lower lymphocyte count (ES -0.39 × 109/uL [95% CI -0.52, -0.25]), lower troponin (ES -0.16 ng/mL [95% CI -0.30, -0.01]) and less frequently received anticoagulation therapy (RD -0.19 [95% CI -0.37, -0.04]). Length of hospitalization was shorter in the Delta versus Alpha eras (-1.3 days [95% CI -2.3, -0.4]). Interpretation: Our study suggested that MIS-C clinical phenotypes varied across SARS-CoV-2 eras, with patients in Delta and Omicron eras being younger and less sick. EHR data can be effectively leveraged to identify rare complications of pandemic diseases and their variation over time. Funding: None.

5.
Comput Methods Programs Biomed ; 242: 107803, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37703700

RESUMO

BACKGROUND AND OBJECTIVE: Reusing Electronic Health Records (EHRs) for Machine Learning (ML) leads on many occasions to extremely incomplete and sparse tabular datasets, which can hinder the model development processes and limit their performance and generalization. In this study, we aimed to characterize the most effective data imputation techniques and ML models for dealing with highly missing numerical data in EHRs, in the case where only a very limited number of data are complete, as opposed to the usual case of having a reduced number of missing values. METHODS: We used a case study including full blood count laboratory data, demographic and survival data in the context of COVID-19 hospital admissions and evaluated 30 processing pipelines combining imputation methods with ML classifiers. The imputation methods included missing mask, translation and encoding, mean imputation, k-nearest neighbors' imputation, Bayesian ridge regression imputation and generative adversarial imputation networks. The classifiers included k-nearest neighbors, logistic regression, random forest, gradient boosting and deep multilayer perceptron. RESULTS: Our results suggest that in the presence of highly missing data, combining translation and encoding imputation-which considers informative missingness-with tree ensemble classifiers-random forest and gradient boosting-is a sensible choice when aiming to maximize performance, in terms of area under curve. CONCLUSIONS: Based on our findings, we recommend the consideration of this imputer-classifier configuration when constructing models in the presence of extremely incomplete numerical data in EHR.


Assuntos
Algoritmos , COVID-19 , Humanos , Registros Eletrônicos de Saúde , Teorema de Bayes , Aprendizado de Máquina
6.
J Clin Med ; 12(6)2023 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-36983443

RESUMO

(1) Background: We aim to assess the time trend from 2014 to 2020 in the prevalence of physical activity (PA), identify gender differences and sociodemographic and health-related factors associated with PA among people with diabetes, and compare PA between people with and without diabetes. (2) Methods: We conducted a cross-sectional and a case-control study using as data source the European Health Interview Surveys for Spain (EHISS) conducted in years 2014 and 2020. The presence of diabetes and PA were self-reported. Covariates included socio-demographic characteristics, health-related variables, and lifestyles. To compare people with and without diabetes, we matched individuals by age and sex. (3) Results: The number of participants aged ≥18 years with self-reported diabetes were 1852 and 1889 in the EHISS2014 and EHISS2020, respectively. The proportion of people with diabetes that had a medium or high frequency of PA improved from 48.3% in 2014 to 52.6% in 2020 (p = 0.009), with 68.5% in 2014 and 77.7% in 2020 being engaged in two or more days of PA (p < 0.001). Males with diabetes reported more PA than females with diabetes in both surveys. After matching by age and gender, participants with diabetes showed significantly lower engagement in PA than those without diabetes. Among adults with diabetes, multivariable logistic regression showed confirmation that PA improved significantly from 2014 to 2020 and that male sex, higher educational level, and better self-rated health were variables associated to more PA. However, self-reported comorbidities, smoking, or BMI > 30 were associated to less PA. (4) Conclusions: The time trend of PA among Spanish adults with diabetes is favorable but insufficient. The prevalence of PA in this diabetes population is low and does not reach the levels of the general population. Gender differences were found with significantly more PA among males with diabetes. Our result could help to improve the design and implementation of public health strategies to improve PA among people with diabetes.

7.
JAMA Netw Open ; 5(12): e2246548, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-36512353

RESUMO

Importance: The COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of the increase, particularly for severe cases requiring hospitalization, has not been well characterized. Large-scale federated informatics approaches provide the ability to efficiently and securely query health care data sets to assess and monitor hospitalization patterns for mental health conditions among adolescents. Objective: To estimate changes in the proportion of hospitalizations associated with mental health conditions among adolescents following onset of the COVID-19 pandemic. Design, Setting, and Participants: This retrospective, multisite cohort study of adolescents 11 to 17 years of age who were hospitalized with at least 1 mental health condition diagnosis between February 1, 2019, and April 30, 2021, used patient-level data from electronic health records of 8 children's hospitals in the US and France. Main Outcomes and Measures: Change in the monthly proportion of mental health condition-associated hospitalizations between the prepandemic (February 1, 2019, to March 31, 2020) and pandemic (April 1, 2020, to April 30, 2021) periods using interrupted time series analysis. Results: There were 9696 adolescents hospitalized with a mental health condition during the prepandemic period (5966 [61.5%] female) and 11 101 during the pandemic period (7603 [68.5%] female). The mean (SD) age in the prepandemic cohort was 14.6 (1.9) years and in the pandemic cohort, 14.7 (1.8) years. The most prevalent diagnoses during the pandemic were anxiety (6066 [57.4%]), depression (5065 [48.0%]), and suicidality or self-injury (4673 [44.2%]). There was an increase in the proportions of monthly hospitalizations during the pandemic for anxiety (0.55%; 95% CI, 0.26%-0.84%), depression (0.50%; 95% CI, 0.19%-0.79%), and suicidality or self-injury (0.38%; 95% CI, 0.08%-0.68%). There was an estimated 0.60% increase (95% CI, 0.31%-0.89%) overall in the monthly proportion of mental health-associated hospitalizations following onset of the pandemic compared with the prepandemic period. Conclusions and Relevance: In this cohort study, onset of the COVID-19 pandemic was associated with increased hospitalizations with mental health diagnoses among adolescents. These findings support the need for greater resources within children's hospitals to care for adolescents with mental health conditions during the pandemic and beyond.


Assuntos
COVID-19 , Pandemias , Criança , Adolescente , Feminino , Humanos , Masculino , COVID-19/epidemiologia , Saúde Mental , SARS-CoV-2 , Estudos de Coortes , Estudos Retrospectivos , Hospitalização
8.
J Biomed Inform ; 136: 104242, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36372346

RESUMO

BACKGROUND: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. METHODS: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. MAIN FINDINGS: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the Diagnosis-Related Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. CONCLUSIONS: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.


Assuntos
Confiabilidade dos Dados , Infarto do Miocárdio , Humanos , Portugal , Hospitais , Hospitalização
9.
Stud Health Technol Inform ; 294: 755-759, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612198

RESUMO

The pharmaceutical industry is a data-intensive environment and a heavily-regulated sector, where exhaustive audits and inspections are performed to ensure the safety of drugs. In this context, processing and evaluating the data generated in the manufacturing lines is a relevant challenge since it requires compliance with pharma regulations. This work combines data integrity metrics and blockchain technology to evaluate the compliance-degree of ALCOA+ principles among different levels of drug manufacturing data. We propose the DIALCOA tool, a software to assess the compliance-degree for each ALCOA+ principle, based on the assessment of data from manufacturing batch reports and its different levels of information.


Assuntos
Blockchain , Indústria Farmacêutica , Comércio , Tecnologia
10.
Stud Health Technol Inform ; 294: 859-863, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612226

RESUMO

The objective of this work was to discover key topics latent in free text dispatcher observations registered during emergency medical calls. We used a total of 1374931 independent retrospective cases from the Valencian emergency medical dispatch service in Spain, from 2014 to 2019. Text fields were preprocessed to reduce vocabulary size and filter noise, removing accent and punctuation marks, along with uninformative and infrequent words. Key topics were inferred from the multinomial probabilities over words conditioned on each topic from a Latent Dirichlet Allocation model, trained following an online mini-batch variational approach. The optimal number of topics was set analyzing the values of a topic coherence measure, based on the normalized pointwise mutual information, across multiple validation K-folds. Our results support the presence of 15 key topics latent in free text dispatcher observations, related with: ambulance request; chest pain and heart attack; respiratory distress; head falls and blows; fever, chills, vomiting and diarrhea; heart failure; syncope; limb injuries; public service body request; thoracic and abdominal pain; stroke and blood pressure abnormalities; pill intake; diabetes; bleeding; consciousness. The discovery of these topics implies the automatic characterization of a huge volume of complex unstructured data containing relevant information linked to emergency medical call incidents. Hence, results from this work could lead to the update of structured emergency triage algorithms to directly include this latent information in the triage process, resulting in a positive impact in patient wellbeing and health services sustainability.


Assuntos
Despacho de Emergência Médica , Serviços Médicos de Emergência , Ambulâncias , Sistemas de Comunicação entre Serviços de Emergência , Humanos , Estudos Retrospectivos , Triagem
11.
JMIR Public Health Surveill ; 8(3): e30032, 2022 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-35144239

RESUMO

BACKGROUND: The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes-the division of populations of patients into more meaningful subgroups driven by clinical features-and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources. OBJECTIVE: We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes. METHODS: We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance. RESULTS: In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities. CONCLUSIONS: The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.


Assuntos
COVID-19 , Idoso , COVID-19/epidemiologia , Criança , Análise por Conglomerados , Humanos , Unidades de Terapia Intensiva , Pandemias , SARS-CoV-2
12.
J Am Med Inform Assoc ; 29(2): 230-238, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34405856

RESUMO

OBJECTIVE: To identify differences related to sex and define autism spectrum disorder (ASD) comorbidities female-enriched through a comprehensive multi-PheWAS intersection approach on big, real-world data. Although sex difference is a consistent and recognized feature of ASD, additional clinical correlates could help to identify potential disease subgroups, based on sex and age. MATERIALS AND METHODS: We performed a systematic comorbidity analysis on 1860 groups of comorbidities exploring all spectrum of known disease, in 59 140 individuals (11 440 females) with ASD from 4 age groups. We explored ASD sex differences in 2 independent real-world datasets, across all potential comorbidities by comparing (1) females with ASD vs males with ASD and (2) females with ASD vs females without ASD. RESULTS: We identified 27 different comorbidities that appeared significantly more frequently in females with ASD. The comorbidities were mostly neurological (eg, epilepsy, odds ratio [OR] > 1.8, 3-18 years of age), congenital (eg, chromosomal anomalies, OR > 2, 3-18 years of age), and mental disorders (eg, intellectual disability, OR > 1.7, 6-18 years of age). Novel comorbidities included endocrine metabolic diseases (eg, failure to thrive, OR = 2.5, ages 0-2), digestive disorders (gastroesophageal reflux disease: OR = 1.7, 6-11 years of age; and constipation: OR > 1.6, 3-11 years of age), and sense organs (strabismus: OR > 1.8, 3-18 years of age). DISCUSSION: A multi-PheWAS intersection approach on real-world data as presented in this study uniquely contributes to the growing body of research regarding sex-based comorbidity analysis in ASD population. CONCLUSIONS: Our findings provide insights into female-enriched ASD comorbidities that are potentially important in diagnosis, as well as the identification of distinct comorbidity patterns influencing anticipatory treatment or referrals. The code is publicly available (https://github.com/hms-dbmi/sexDifferenceInASD).


Assuntos
Transtorno do Espectro Autista , Caracteres Sexuais , Transtorno do Espectro Autista/epidemiologia , Criança , Pré-Escolar , Comorbidade , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Razão de Chances , Prevalência
13.
JMIR Med Inform ; 9(8): e27842, 2021 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-34346902

RESUMO

BACKGROUND: There is increasing recognition that health care providers need to focus attention, and be judged against, the impact they have on the health outcomes experienced by patients. The measurement of health outcomes as a routine part of clinical documentation is probably the only scalable way of collecting outcomes evidence, since secondary data collection is expensive and error-prone. However, there is uncertainty about whether routinely collected clinical data within electronic health record (EHR) systems includes the data most relevant to measuring and comparing outcomes and if those items are collected to a good enough data quality to be relied upon for outcomes assessment, since several studies have pointed out significant issues regarding EHR data availability and quality. OBJECTIVE: In this paper, we first describe a practical approach to data quality assessment of health outcomes, based on a literature review of existing frameworks for quality assessment of health data and multistakeholder consultation. Adopting this approach, we performed a pilot study on a subset of 21 International Consortium for Health Outcomes Measurement (ICHOM) outcomes data items from patients with congestive heart failure. METHODS: All available registries compatible with the diagnosis of heart failure within an EHR data repository of a general hospital (142,345 visits and 12,503 patients) were extracted and mapped to the ICHOM format. We focused our pilot assessment on 5 commonly used data quality dimensions: completeness, correctness, consistency, uniqueness, and temporal stability. RESULTS: We found high scores (>95%) for the consistency, completeness, and uniqueness dimensions. Temporal stability analyses showed some changes over time in the reported use of medication to treat heart failure, as well as in the recording of past medical conditions. Finally, the investigation of data correctness suggested several issues concerning the characterization of missing data values. Many of these issues appear to be introduced while mapping the IMASIS-2 relational database contents to the ICHOM format, as the latter requires a level of detail that is not explicitly available in the coded data of an EHR. CONCLUSIONS: Overall, results of this pilot study revealed good data quality for the subset of heart failure outcomes collected at the Hospital del Mar. Nevertheless, some important data errors were identified that were caused by fundamentally different data collection practices in routine clinical care versus research, for which the ICHOM standard set was originally developed. To truly examine to what extent hospitals today are able to routinely collect the evidence of their success in achieving good health outcomes, future research would benefit from performing more extensive data quality assessments, including all data items from the ICHOM standards set and across multiple hospitals.

14.
Artif Intell Med ; 117: 102088, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34127234

RESUMO

The objective of this work was to develop a predictive model to aid non-clinical dispatchers to classify emergency medical call incidents by their life-threatening level (yes/no), admissible response delay (undelayable, minutes, hours, days) and emergency system jurisdiction (emergency system/primary care) in real time. We used a total of 1 244 624 independent incidents from the Valencian emergency medical dispatch service in Spain, compiled in retrospective from 2009 to 2012, including clinical features, demographics, circumstantial factors and free text dispatcher observations. Based on them, we designed and developed DeepEMC2, a deep ensemble multitask model integrating four subnetworks: three specialized to context, clinical and text data, respectively, and another to ensemble the former. The four subnetworks are composed in turn by multi-layer perceptron modules, bidirectional long short-term memory units and a bidirectional encoding representations from transformers module. DeepEMC2 showed a macro F1-score of 0.759 in life-threatening classification, 0.576 in admissible response delay and 0.757 in emergency system jurisdiction. These results show a substantial performance increase of 12.5 %, 17.5 % and 5.1 %, respectively, with respect to the current in-house triage protocol of the Valencian emergency medical dispatch service. Besides, DeepEMC2 significantly outperformed a set of baseline machine learning models, including naive bayes, logistic regression, random forest and gradient boosting (α = 0.05). Hence, DeepEMC2 is able to: 1) capture information present in emergency medical calls not considered by the existing triage protocol, and 2) model complex data dependencies not feasible by the tested baseline models. Likewise, our results suggest that most of this unconsidered information is present in the free text dispatcher observations. To our knowledge, this study describes the first deep learning model undertaking emergency medical call incidents classification. Its adoption in medical dispatch centers would potentially improve emergency dispatch processes, resulting in a positive impact in patient wellbeing and health services sustainability.


Assuntos
Despacho de Emergência Médica , Teorema de Bayes , Sistemas de Comunicação entre Serviços de Emergência , Serviço Hospitalar de Emergência , Humanos , Estudos Retrospectivos
15.
J Biomed Inform ; 120: 103837, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34119690

RESUMO

Patient Trajectories (PTs) are a method of representing the temporal evolution of patients. They can include information from different sources and be used in socio-medical or clinical domains. PTs have generally been used to generate and study the most common trajectories in, for instance, the development of a disease. On the other hand, healthcare predictive models generally rely on static snapshots of patient information. Only a few works about prediction in healthcare have been found that use PTs, and therefore benefit from their temporal dimension. All of them, however, have used PTs created from single-source information. Therefore, the use of longitudinal multi-scale data to build PTs and use them to obtain predictions about health conditions is yet to be explored. Our hypothesis is that local similarities on small chunks of PTs can identify similar patients concerning their future morbidities. The objectives of this work are (1) to develop a methodology to identify local similarities between PTs before the occurrence of morbidities to predict these on new query individuals; and (2) to validate this methodology on risk prediction of cardiovascular diseases (CVD) occurrence in patients with diabetes. We have proposed a novel formal definition of PTs based on sequences of longitudinal multi-scale data. Moreover, a dynamic programming methodology to identify local alignments on PTs for predicting future morbidities is proposed. Both the proposed methodology for PT definition and the alignment algorithm are generic to be applied on any clinical domain. We validated this solution for predicting CVD in patients with diabetes and we achieved a precision of 0.33, a recall of 0.72 and a specificity of 0.38. Therefore, the proposed solution in the diabetes use case can result of utmost utility to secondary screening.


Assuntos
Algoritmos , Doenças Cardiovasculares , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Humanos , Morbidade
16.
Comput Methods Programs Biomed ; 207: 106147, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34020376

RESUMO

BACKGROUND AND OBJECTIVE: The Baby-Friendly Hospital Initiative (BFHI) is an international strategy aimed at improving breastfeeding practices in health care services. Regular monitoring of indicators is key for BFHI implementation and maintenance. Currently, routine data collected from electronic health records (EHR) is an excellent source for infant feeding monitoring, however data quality (DQ) assessment should be undertaken. The aim of this research is to enable robust estimations of infant feeding indicators through DQ assessment of routine EHR data. MATERIALS AND METHODS: We use the longitudinal series of healthcare contacts belonging to 6427 children born from 2009 to 2018 in the Health Area V of Murcia (Spain). Longitudinal data came from EHR at hospital discharge and community infant health reviews up to 18 months. The data of each healthcare contact contained a 24-h recall of infant feeding. We perform a DQ process in three phases: (1) an assessment of each-single-contact and the definition of their infant feeding status; (2) a longitudinal DQ assessment of completeness and consistency of the series of contacts to obtain meta-information that guides the duration calculus, for each case, of the different types of breastfeeding: exclusive breastfeeding (EBF), full breastfeeding (FBF) and any breastfeeding (ABF); and finally (3) a robust estimation of indicators and description of DQ of each indicator. RESULTS: We found deficiencies of DQ in 30.42% of single contacts for EBF, 19.02% for FBF and 22.50% for ABF that were used to establish the infant feeding status. However, after longitudinal DQ assessment, we obtained valid and reliable data rates for most indicators such as "median duration of breastfeeding" nearly 90%, both for FBF and ABF, not so for EBF. CONCLUSIONS: Despite the DQ deficiencies found in raw data, the DQ assurance approach by indicators proposed in this work, allowed us to obtain a robust estimation of indicators with a significant percentage of subjects with valid information for ABF and FBF monitoring. The estimations were consistent with results previously published. The methodology provided with this study allows a continuous and reliable population monitoring of infant feeding indicators of BFHI from routine EHR data.


Assuntos
Confiabilidade dos Dados , Registros Eletrônicos de Saúde , Aleitamento Materno , Criança , Feminino , Promoção da Saúde , Hospitais , Humanos , Lactente , Espanha
17.
J Am Med Inform Assoc ; 28(2): 360-364, 2021 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-33027509

RESUMO

OBJECTIVE: The lack of representative coronavirus disease 2019 (COVID-19) data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, in which source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. MATERIALS AND METHODS: We used the publicly available nCov2019 dataset, including patient-level data from several countries. We aimed to the discovery and classification of severity subgroups using symptoms and comorbidities. RESULTS: Cases from the 2 countries with the highest prevalence were divided into separate subgroups with distinct severity manifestations. This variability can reduce the representativeness of training data with respect the model target populations and increase model complexity at risk of overfitting. CONCLUSIONS: Data source variability is a potential contributor to bias in distributed research networks. We call for systematic assessment and reporting of data source variability and data quality in COVID-19 data sharing, as key information for reliable and generalizable machine learning.


Assuntos
COVID-19 , Confiabilidade dos Dados , Conjuntos de Dados como Assunto , Disseminação de Informação , Aprendizado de Máquina , Adulto , Idoso , COVID-19/classificação , Redes de Comunicação de Computadores , Conjuntos de Dados como Assunto/normas , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Gravidade do Paciente
19.
Gigascience ; 9(8)2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32729900

RESUMO

BACKGROUND: Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS: EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS: EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface.Availability: https://github.com/hms-dbmi/EHRtemporalVariability/Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.htmlOnline demo: http://ehrtemporalvariability.upv.es/.


Assuntos
Pesquisa Biomédica , Registros Eletrônicos de Saúde , Confiabilidade dos Dados , Reprodutibilidade dos Testes , Software
20.
BMJ Open ; 10(2): e034396, 2020 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-32060159

RESUMO

OBJECTIVES: To demonstrate how data-driven variability methods can be used to identify changes in disease recording in two English electronic health records databases between 2001 and 2015. DESIGN: Repeated cross-sectional analysis that applied data-driven temporal variability methods to assess month-by-month changes in routinely collected medical data. A measure of difference between months was calculated based on joint distributions of age, gender, socioeconomic status and recorded cardiovascular diseases. Distances between months were used to identify temporal trends in data recording. SETTING: 400 English primary care practices from the Clinical Practice Research Datalink (CPRD GOLD) and 451 hospital providers from the Hospital Episode Statistics (HES). MAIN OUTCOMES: The proportion of patients (CPRD GOLD) and hospital admissions (HES) with a recorded cardiovascular disease (CPRD GOLD: coronary heart disease, heart failure, peripheral arterial disease, stroke; HES: International Classification of Disease codes I20-I69/G45). RESULTS: Both databases showed gradual changes in cardiovascular disease recording between 2001 and 2008. The recorded prevalence of included cardiovascular diseases in CPRD GOLD increased by 47%-62%, which partially reversed after 2008. For hospital records in HES, there was a relative decrease in angina pectoris (-34.4%) and unspecified stroke (-42.3%) over the same time period, with a concomitant increase in chronic coronary heart disease (+14.3%). Multiple abrupt changes in the use of myocardial infarction codes in hospital were found in March/April 2010, 2012 and 2014, possibly linked to updates of clinical coding guidelines. CONCLUSIONS: Identified temporal variability could be related to potentially non-medical causes such as updated coding guidelines. These artificial changes may introduce temporal correlation among diagnoses inferred from routine data, violating the assumptions of frequently used statistical methods. Temporal variability measures provide an objective and robust technique to identify, and subsequently account for, those changes in electronic health records studies without any prior knowledge of the data collection process.


Assuntos
Doenças Cardiovasculares , Codificação Clínica/tendências , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Doenças Cardiovasculares/epidemiologia , Estudos Transversais , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...