ABSTRACT
Artificial intelligence (AI)-based clinical decision support systems are gaining momentum by relying on a greater volume and variety of secondary use data. However, the uncertainty, variability, and biases in real-world data environments still pose significant challenges to the development of health AI, its routine clinical use, and its regulatory frameworks. Health AI should be resilient against real-world environments throughout its lifecycle, including the training and prediction phases and maintenance during production, and health AI regulations should evolve accordingly. Data quality issues, variability over time or across sites, information uncertainty, human-computer interaction, and fundamental rights assurance are among the most relevant challenges. If health AI is not designed resiliently with regard to these real-world data effects, potentially biased data-driven medical decisions can risk the safety and fundamental rights of millions of people. In this viewpoint, we review the challenges, requirements, and methods for resilient AI in health and provide a research framework to improve the trustworthiness of next-generation AI-based clinical decision support.
Subject(s)
Artificial Intelligence , Decision Support Systems, Clinical , HumansABSTRACT
BACKGROUND: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. METHODS: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. MAIN FINDINGS: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the Diagnosis-Related Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. CONCLUSIONS: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.
Subject(s)
Data Accuracy , Myocardial Infarction , Humans , Portugal , Hospitals , HospitalizationABSTRACT
Patient Trajectories (PTs) are a method of representing the temporal evolution of patients. They can include information from different sources and be used in socio-medical or clinical domains. PTs have generally been used to generate and study the most common trajectories in, for instance, the development of a disease. On the other hand, healthcare predictive models generally rely on static snapshots of patient information. Only a few works about prediction in healthcare have been found that use PTs, and therefore benefit from their temporal dimension. All of them, however, have used PTs created from single-source information. Therefore, the use of longitudinal multi-scale data to build PTs and use them to obtain predictions about health conditions is yet to be explored. Our hypothesis is that local similarities on small chunks of PTs can identify similar patients concerning their future morbidities. The objectives of this work are (1) to develop a methodology to identify local similarities between PTs before the occurrence of morbidities to predict these on new query individuals; and (2) to validate this methodology on risk prediction of cardiovascular diseases (CVD) occurrence in patients with diabetes. We have proposed a novel formal definition of PTs based on sequences of longitudinal multi-scale data. Moreover, a dynamic programming methodology to identify local alignments on PTs for predicting future morbidities is proposed. Both the proposed methodology for PT definition and the alignment algorithm are generic to be applied on any clinical domain. We validated this solution for predicting CVD in patients with diabetes and we achieved a precision of 0.33, a recall of 0.72 and a specificity of 0.38. Therefore, the proposed solution in the diabetes use case can result of utmost utility to secondary screening.
Subject(s)
Algorithms , Cardiovascular Diseases , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/epidemiology , Humans , MorbidityABSTRACT
BACKGROUND: Glioblastoma (GBM) is the most aggressive primary brain tumor, characterized by a heterogeneous and abnormal vascularity. Subtypes of vascular habitats within the tumor and edema can be distinguished: high angiogenic tumor (HAT), low angiogenic tumor (LAT), infiltrated peripheral edema (IPE), and vasogenic peripheral edema (VPE). PURPOSE: To validate the association between hemodynamic markers from vascular habitats and overall survival (OS) in glioblastoma patients, considering the intercenter variability of acquisition protocols. STUDY TYPE: Multicenter retrospective study. POPULATION: In all, 184 glioblastoma patients from seven European centers participating in the NCT03439332 clinical study. FIELD STRENGTH/SEQUENCE: 1.5T (for 54 patients) or 3.0T (for 130 patients). Pregadolinium and postgadolinium-based contrast agent-enhanced T1 -weighted MRI, T2 - and FLAIR T2 -weighted, and dynamic susceptibility contrast (DSC) T2 * perfusion. ASSESSMENT: We analyzed preoperative MRIs to establish the association between the maximum relative cerebral blood volume (rCBVmax ) at each habitat with OS. Moreover, the stratification capabilities of the markers to divide patients into "vascular" groups were tested. The variability in the markers between individual centers was also assessed. STATISTICAL TESTS: Uniparametric Cox regression; Kaplan-Meier test; Mann-Whitney test. RESULTS: The rCBVmax derived from the HAT, LAT, and IPE habitats were significantly associated with patient OS (P < 0.05; hazard ratio [HR]: 1.05, 1.11, 1.28, respectively). Moreover, these markers can stratify patients into "moderate-" and "high-vascular" groups (P < 0.05). The Mann-Whitney test did not find significant differences among most of the centers in markers (HAT: P = 0.02-0.685; LAT: P = 0.010-0.769; IPE: P = 0.093-0.939; VPE: P = 0.016-1.000). DATA CONCLUSION: The rCBVmax calculated in HAT, LAT, and IPE habitats have been validated as clinically relevant prognostic biomarkers for glioblastoma patients in the pretreatment stage. This study demonstrates the robustness of the hemodynamic tissue signature (HTS) habitats to assess the GBM vascular heterogeneity and their association with patient prognosis independently of intercenter variability. LEVEL OF EVIDENCE: 3 Technical Efficacy Stage: 2 J. Magn. Reson. Imaging 2020;51:1478-1486.
Subject(s)
Brain Neoplasms , Glioblastoma , Brain Neoplasms/diagnostic imaging , Contrast Media , Glioblastoma/diagnostic imaging , Humans , Magnetic Resonance Imaging , Prognosis , Retrospective StudiesABSTRACT
BACKGROUND: Migraine is a heterogeneous condition with multiple clinical manifestations. Machine learning algorithms permit the identification of population groups, providing analytical advantages over other modeling techniques. OBJECTIVE: The aim of this study was to analyze critical features that permit the differentiation of subgroups of patients with migraine according to the intensity and frequency of attacks by using machine learning algorithms. METHODS: Sixty-seven women with migraine participated. Clinical features of migraine, related disability (Migraine Disability Assessment Scale), anxiety/depressive levels (Hospital Anxiety and Depression Scale), anxiety state/trait levels (State-Trait Anxiety Inventory), and pressure pain thresholds (PPTs) over the temporalis, neck, second metacarpal, and tibialis anterior were collected. Physical examination included the flexion-rotation test, cervical range of cervical motion, forward head position while sitting and standing, passive accessory intervertebral movements (PAIVMs) with headache reproduction, and joint positioning sense error. Subgrouping was based on machine learning algorithms by using the nearest neighbors algorithm, multisource variability assessment, and random forest model. RESULTS: For migraine intensity, group 2 (women with a regular migraine headache intensity score of 7 on an 11-point Numeric Pain Rating Scale [where 0 = no pain and 10 = maximum pain]) were younger and had lower joint positioning sense error in cervical rotation, greater cervical mobility in rotation and flexion, lower flexion-rotation test scores, positive PAIVMs reproducing migraine, normal PPTs over the tibialis anterior, shorter migraine history, and lower cranio-vertebral angles while standing than the remaining migraine intensity subgroups. The most discriminative variable was the flexion-rotation test score of the symptomatic side. For migraine frequency, no model was able to identify differences between groups (ie, patients with episodic or chronic migraine). CONCLUSIONS: A subgroup of women with migraine who had common migraine intensity was identified with machine learning algorithms.
Subject(s)
Machine Learning , Migraine Disorders/classification , Physical Examination/methods , Adult , Disability Evaluation , Female , Humans , Middle Aged , Migraine Disorders/physiopathologyABSTRACT
The aim of this work was to develop a new unsupervised exploratory method of characterizing feature extraction and detecting similarity of movement during sleep through actigraphy signals. We here propose some algorithms, based on signal bispectrum and bispectral entropy, to determine the unique features of independent actigraphy signals. Experiments were carried out on 20 randomly chosen actigraphy samples of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) database, with no information other than their aperiodicity. The Pearson correlation coefficient matrix and the histogram correlation matrix were computed to study the similarity of movements during sleep. The results obtained allowed us to explore the connections between certain sleep actigraphy patterns and certain pathologies.
Subject(s)
Actigraphy/methods , Algorithms , Entropy , Hispanic or Latino , Movement , Sleep/physiology , Adolescent , Adult , Humans , Middle Aged , Signal Processing, Computer-Assisted , Young AdultABSTRACT
Background: The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement. Objective: This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data. Methods: A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020. Results: The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined. Discussion: Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality. Conclusion: This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains.
ABSTRACT
The aim of this work is to develop and evaluate a deep classifier that can effectively prioritize Emergency Medical Call Incidents (EMCI) according to their life-threatening level under the presence of dataset shifts. We utilized a dataset consisting of 1982746 independent EMCI instances obtained from the Health Services Department of the Region of Valencia (Spain), with a time span from 2009 to 2019 (excluding 2013). The dataset includes free text dispatcher observations recorded during the call, as well as a binary variable indicating whether the event was life-threatening. To evaluate the presence of dataset shifts, we examined prior probability shifts, covariate shifts, and concept shifts. Subsequently, we designed and implemented four deep Continual Learning (CL) strategies-cumulative learning, continual fine-tuning, experience replay, and synaptic intelligence-alongside three deep CL baselines-joint training, static approach, and single fine-tuning-based on DistilBERT models. Our results demonstrated evidence of prior probability shifts, covariate shifts, and concept shifts in the data. Applying CL techniques had a statistically significant (α=0.05) positive impact on both backward and forward knowledge transfer, as measured by the F1-score, compared to non-continual approaches. We can argue that the utilization of CL techniques in the context of EMCI is effective in adapting deep learning classifiers to changes in data distributions, thereby maintaining the stability of model performance over time. To our knowledge, this study represents the first exploration of a CL approach using real EMCI data.
Subject(s)
Deep Learning , Humans , Databases, Factual , Spain , Emergency Medical ServicesABSTRACT
(1) Background: Our aim was to determine changes in the prevalence of physical activity (PA) in adults with asthma between 2014 and 2020 in Spain, investigate sex differences and the effect of other variables on adherence to PA, and compare the prevalence of PA between individuals with and without asthma. (2) Methods: This study was a cross-sectional, population-based, matched, case-control study using European Health Interview Surveys for Spain (EHISS) for 2014 and 2020. (3) Results: We identified 1262 and 1103 patients with asthma in the 2014 and 2020 EHISS, respectively. The prevalence of PA remained stable (57.2% vs. 55.7%, respectively), while the percentage of persons who reported walking continuously for at least 2 days a week increased from 73.9% to 82.2% (p < 0.001). Male sex, younger age, better self-rated health, and lower body mass index (BMI) were significantly associated with greater PA. From 2014 to 2020, the number of walking days ≥2 increased by 64% (OR1.64 95%CI 1.34-2.00). Asthma was associated with less PA (OR0.87 95%CI 0.47-0.72) and a lower number of walking days ≥2 (OR0.84 95%0.72-0.97). (4) Conclusions: Walking frequency improved over time among people with asthma. Differences in PA were detected by age, sex, self-rated health status, and BMI. Asthma was associated with less LTPA and a lower number of walking days ≥2.
ABSTRACT
BACKGROUND AND OBJECTIVE: Reusing Electronic Health Records (EHRs) for Machine Learning (ML) leads on many occasions to extremely incomplete and sparse tabular datasets, which can hinder the model development processes and limit their performance and generalization. In this study, we aimed to characterize the most effective data imputation techniques and ML models for dealing with highly missing numerical data in EHRs, in the case where only a very limited number of data are complete, as opposed to the usual case of having a reduced number of missing values. METHODS: We used a case study including full blood count laboratory data, demographic and survival data in the context of COVID-19 hospital admissions and evaluated 30 processing pipelines combining imputation methods with ML classifiers. The imputation methods included missing mask, translation and encoding, mean imputation, k-nearest neighbors' imputation, Bayesian ridge regression imputation and generative adversarial imputation networks. The classifiers included k-nearest neighbors, logistic regression, random forest, gradient boosting and deep multilayer perceptron. RESULTS: Our results suggest that in the presence of highly missing data, combining translation and encoding imputation-which considers informative missingness-with tree ensemble classifiers-random forest and gradient boosting-is a sensible choice when aiming to maximize performance, in terms of area under curve. CONCLUSIONS: Based on our findings, we recommend the consideration of this imputer-classifier configuration when constructing models in the presence of extremely incomplete numerical data in EHR.
Subject(s)
Algorithms , COVID-19 , Humans , Electronic Health Records , Bayes Theorem , Machine LearningABSTRACT
OBJECTIVES: To evaluate trends in the prevalence of physical activity (PA) from 2014 to 2020; to identify sex differences and sociodemographic and health-related factors associated with PA in individuals with chronic obstructive pulmonary disease (COPD); and to compare PA between individuals with and without COPD. METHODS: Cross-sectional and case-control study. SOURCE: European Health Interview Surveys for Spain (EHISS) conducted in 2014 and 2020. We included sociodemographic and health-related covariates. We compared individuals with and without COPD after matching for age and sex. RESULTS: The number of adults with COPD was 1086 and 910 in EHISS2014 and EHISS2020, respectively. In this population, self-reported "Medium or high frequency of PA" remained stable (42.9% in 2014 and 43.5% in 2020; p = 0.779). However, the percentage who walked on two or more days per week rose significantly over time (63.4%-69.9%; p = 0.004). Men with COPD reported more PA than women with COPD in both surveys. After matching, significantly lower levels of PA were recorded in COPD patients than in adults without COPD. Multivariable logistic regression confirmed this trend in COPD patients and showed that male sex, younger age, higher educational level, very good/good self-perceived health, and absence of comorbidities, obesity, and smoking were associated with more frequent PA. CONCLUSIONS: The temporal trend in PA among Spanish adults with COPD is favorable, although there is much room for improvement. Insufficient PA is more prevalent in these patients than in the general population. Sex differences were found, with significantly more frequent PA among males with COPD.
Subject(s)
Pulmonary Disease, Chronic Obstructive , Sex Characteristics , Adult , Humans , Male , Female , Case-Control Studies , Spain/epidemiology , Cross-Sectional Studies , Pulmonary Disease, Chronic Obstructive/epidemiology , Pulmonary Disease, Chronic Obstructive/complications , ExerciseABSTRACT
(1) Background: We aim to assess the time trend from 2014 to 2020 in the prevalence of physical activity (PA), identify gender differences and sociodemographic and health-related factors associated with PA among people with diabetes, and compare PA between people with and without diabetes. (2) Methods: We conducted a cross-sectional and a case-control study using as data source the European Health Interview Surveys for Spain (EHISS) conducted in years 2014 and 2020. The presence of diabetes and PA were self-reported. Covariates included socio-demographic characteristics, health-related variables, and lifestyles. To compare people with and without diabetes, we matched individuals by age and sex. (3) Results: The number of participants aged ≥18 years with self-reported diabetes were 1852 and 1889 in the EHISS2014 and EHISS2020, respectively. The proportion of people with diabetes that had a medium or high frequency of PA improved from 48.3% in 2014 to 52.6% in 2020 (p = 0.009), with 68.5% in 2014 and 77.7% in 2020 being engaged in two or more days of PA (p < 0.001). Males with diabetes reported more PA than females with diabetes in both surveys. After matching by age and gender, participants with diabetes showed significantly lower engagement in PA than those without diabetes. Among adults with diabetes, multivariable logistic regression showed confirmation that PA improved significantly from 2014 to 2020 and that male sex, higher educational level, and better self-rated health were variables associated to more PA. However, self-reported comorbidities, smoking, or BMI > 30 were associated to less PA. (4) Conclusions: The time trend of PA among Spanish adults with diabetes is favorable but insufficient. The prevalence of PA in this diabetes population is low and does not reach the levels of the general population. Gender differences were found with significantly more PA among males with diabetes. Our result could help to improve the design and implementation of public health strategies to improve PA among people with diabetes.
ABSTRACT
Background: Multisystem inflammatory syndrome in children (MIS-C) is a severe complication of SARS-CoV-2 infection. It remains unclear how MIS-C phenotypes vary across SARS-CoV-2 variants. We aimed to investigate clinical characteristics and outcomes of MIS-C across SARS-CoV-2 eras. Methods: We performed a multicentre observational retrospective study including seven paediatric hospitals in four countries (France, Spain, U.K., and U.S.). All consecutive confirmed patients with MIS-C hospitalised between February 1st, 2020, and May 31st, 2022, were included. Electronic Health Records (EHR) data were used to calculate pooled risk differences (RD) and effect sizes (ES) at site level, using Alpha as reference. Meta-analysis was used to pool data across sites. Findings: Of 598 patients with MIS-C (61% male, 39% female; mean age 9.7 years [SD 4.5]), 383 (64%) were admitted in the Alpha era, 111 (19%) in the Delta era, and 104 (17%) in the Omicron era. Compared with patients admitted in the Alpha era, those admitted in the Delta era were younger (ES -1.18 years [95% CI -2.05, -0.32]), had fewer respiratory symptoms (RD -0.15 [95% CI -0.33, -0.04]), less frequent non-cardiogenic shock or systemic inflammatory response syndrome (SIRS) (RD -0.35 [95% CI -0.64, -0.07]), lower lymphocyte count (ES -0.16 × 109/uL [95% CI -0.30, -0.01]), lower C-reactive protein (ES -28.5 mg/L [95% CI -46.3, -10.7]), and lower troponin (ES -0.14 ng/mL [95% CI -0.26, -0.03]). Patients admitted in the Omicron versus Alpha eras were younger (ES -1.6 years [95% CI -2.5, -0.8]), had less frequent SIRS (RD -0.18 [95% CI -0.30, -0.05]), lower lymphocyte count (ES -0.39 × 109/uL [95% CI -0.52, -0.25]), lower troponin (ES -0.16 ng/mL [95% CI -0.30, -0.01]) and less frequently received anticoagulation therapy (RD -0.19 [95% CI -0.37, -0.04]). Length of hospitalization was shorter in the Delta versus Alpha eras (-1.3 days [95% CI -2.3, -0.4]). Interpretation: Our study suggested that MIS-C clinical phenotypes varied across SARS-CoV-2 eras, with patients in Delta and Omicron eras being younger and less sick. EHR data can be effectively leveraged to identify rare complications of pandemic diseases and their variation over time. Funding: None.
ABSTRACT
Low biomedical Data Quality (DQ) leads into poor decisions which may affect the care process or the result of evidence-based studies. Most of the current approaches for DQ leave unattended the shifting behaviour of data underlying concepts and its relation to DQ. There is also no agreement on a common set of DQ dimensions and how they interact and relate to these shifts. In this paper we propose an organization of biomedical DQ assessment based on these concepts, identifying characteristics and requirements which will facilitate future research. As a result, we define the Data Quality Vector compiling a unified set of DQ dimensions (completeness, consistency, duplicity, correctness, timeliness, spatial stability, contextualization, predictive value and reliability), as the foundations to the further development of DQ assessment algorithms and platforms.
Subject(s)
Databases, Factual , Forms and Records Control/standards , Health Information Systems/standards , Information Storage and Retrieval/standards , Quality Assurance, Health Care/methods , Quality Assurance, Health Care/standards , Research Design/standards , SpainABSTRACT
The objective of this work was to discover key topics latent in free text dispatcher observations registered during emergency medical calls. We used a total of 1374931 independent retrospective cases from the Valencian emergency medical dispatch service in Spain, from 2014 to 2019. Text fields were preprocessed to reduce vocabulary size and filter noise, removing accent and punctuation marks, along with uninformative and infrequent words. Key topics were inferred from the multinomial probabilities over words conditioned on each topic from a Latent Dirichlet Allocation model, trained following an online mini-batch variational approach. The optimal number of topics was set analyzing the values of a topic coherence measure, based on the normalized pointwise mutual information, across multiple validation K-folds. Our results support the presence of 15 key topics latent in free text dispatcher observations, related with: ambulance request; chest pain and heart attack; respiratory distress; head falls and blows; fever, chills, vomiting and diarrhea; heart failure; syncope; limb injuries; public service body request; thoracic and abdominal pain; stroke and blood pressure abnormalities; pill intake; diabetes; bleeding; consciousness. The discovery of these topics implies the automatic characterization of a huge volume of complex unstructured data containing relevant information linked to emergency medical call incidents. Hence, results from this work could lead to the update of structured emergency triage algorithms to directly include this latent information in the triage process, resulting in a positive impact in patient wellbeing and health services sustainability.
Subject(s)
Emergency Medical Dispatch , Emergency Medical Services , Ambulances , Emergency Medical Service Communication Systems , Humans , Retrospective Studies , TriageABSTRACT
The pharmaceutical industry is a data-intensive environment and a heavily-regulated sector, where exhaustive audits and inspections are performed to ensure the safety of drugs. In this context, processing and evaluating the data generated in the manufacturing lines is a relevant challenge since it requires compliance with pharma regulations. This work combines data integrity metrics and blockchain technology to evaluate the compliance-degree of ALCOA+ principles among different levels of drug manufacturing data. We propose the DIALCOA tool, a software to assess the compliance-degree for each ALCOA+ principle, based on the assessment of data from manufacturing batch reports and its different levels of information.
Subject(s)
Blockchain , Drug Industry , Commerce , TechnologyABSTRACT
BACKGROUND: The COVID-19 pandemic has led to an unprecedented global health care challenge for both medical institutions and researchers. Recognizing different COVID-19 subphenotypes-the division of populations of patients into more meaningful subgroups driven by clinical features-and their severity characterization may assist clinicians during the clinical course, the vaccination process, research efforts, the surveillance system, and the allocation of limited resources. OBJECTIVE: We aimed to discover age-sex unbiased COVID-19 patient subphenotypes based on easily available phenotypical data before admission, such as pre-existing comorbidities, lifestyle habits, and demographic features, to study the potential early severity stratification capabilities of the discovered subgroups through characterizing their severity patterns, including prognostic, intensive care unit (ICU), and morbimortality outcomes. METHODS: We used the Mexican Government COVID-19 open data, including 778,692 SARS-CoV-2 population-based patient-level data as of September 2020. We applied a meta-clustering technique that consists of a 2-stage clustering approach combining dimensionality reduction (ie, principal components analysis and multiple correspondence analysis) and hierarchical clustering using the Ward minimum variance method with Euclidean squared distance. RESULTS: In the independent age-sex clustering analyses, 56 clusters supported 11 clinically distinguishable meta-clusters (MCs). MCs 1-3 showed high recovery rates (90.27%-95.22%), including healthy patients of all ages, children with comorbidities and priority in receiving medical resources (ie, higher rates of hospitalization, intubation, and ICU admission) compared with other adult subgroups that have similar conditions, and young obese smokers. MCs 4-5 showed moderate recovery rates (81.30%-82.81%), including patients with hypertension or diabetes of all ages and obese patients with pneumonia, hypertension, and diabetes. MCs 6-11 showed low recovery rates (53.96%-66.94%), including immunosuppressed patients with high comorbidity rates, patients with chronic kidney disease with a poor survival length and probability of recovery, older smokers with chronic obstructive pulmonary disease, older adults with severe diabetes and hypertension, and the oldest obese smokers with chronic obstructive pulmonary disease and mild cardiovascular disease. Group outcomes conformed to the recent literature on dedicated age-sex groups. Mexican states and several types of clinical institutions showed relevant heterogeneity regarding severity, potentially linked to socioeconomic or health inequalities. CONCLUSIONS: The proposed 2-stage cluster analysis methodology produced a discriminative characterization of the sample and explainability over age and sex. These results can potentially help in understanding the clinical patient and their stratification for automated early triage before further tests and laboratory results are available and even in locations where additional tests are not available or to help decide resource allocation among vulnerable subgroups such as to prioritize vaccination or treatments.
Subject(s)
COVID-19 , Aged , COVID-19/epidemiology , Child , Cluster Analysis , Humans , Intensive Care Units , Pandemics , SARS-CoV-2ABSTRACT
OBJECTIVE: To identify differences related to sex and define autism spectrum disorder (ASD) comorbidities female-enriched through a comprehensive multi-PheWAS intersection approach on big, real-world data. Although sex difference is a consistent and recognized feature of ASD, additional clinical correlates could help to identify potential disease subgroups, based on sex and age. MATERIALS AND METHODS: We performed a systematic comorbidity analysis on 1860 groups of comorbidities exploring all spectrum of known disease, in 59 140 individuals (11 440 females) with ASD from 4 age groups. We explored ASD sex differences in 2 independent real-world datasets, across all potential comorbidities by comparing (1) females with ASD vs males with ASD and (2) females with ASD vs females without ASD. RESULTS: We identified 27 different comorbidities that appeared significantly more frequently in females with ASD. The comorbidities were mostly neurological (eg, epilepsy, odds ratio [OR] > 1.8, 3-18 years of age), congenital (eg, chromosomal anomalies, OR > 2, 3-18 years of age), and mental disorders (eg, intellectual disability, OR > 1.7, 6-18 years of age). Novel comorbidities included endocrine metabolic diseases (eg, failure to thrive, OR = 2.5, ages 0-2), digestive disorders (gastroesophageal reflux disease: OR = 1.7, 6-11 years of age; and constipation: OR > 1.6, 3-11 years of age), and sense organs (strabismus: OR > 1.8, 3-18 years of age). DISCUSSION: A multi-PheWAS intersection approach on real-world data as presented in this study uniquely contributes to the growing body of research regarding sex-based comorbidity analysis in ASD population. CONCLUSIONS: Our findings provide insights into female-enriched ASD comorbidities that are potentially important in diagnosis, as well as the identification of distinct comorbidity patterns influencing anticipatory treatment or referrals. The code is publicly available (https://github.com/hms-dbmi/sexDifferenceInASD).
Subject(s)
Autism Spectrum Disorder , Sex Characteristics , Autism Spectrum Disorder/epidemiology , Child , Child, Preschool , Comorbidity , Female , Humans , Infant , Infant, Newborn , Male , Odds Ratio , PrevalenceABSTRACT
Importance: The COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of the increase, particularly for severe cases requiring hospitalization, has not been well characterized. Large-scale federated informatics approaches provide the ability to efficiently and securely query health care data sets to assess and monitor hospitalization patterns for mental health conditions among adolescents. Objective: To estimate changes in the proportion of hospitalizations associated with mental health conditions among adolescents following onset of the COVID-19 pandemic. Design, Setting, and Participants: This retrospective, multisite cohort study of adolescents 11 to 17 years of age who were hospitalized with at least 1 mental health condition diagnosis between February 1, 2019, and April 30, 2021, used patient-level data from electronic health records of 8 children's hospitals in the US and France. Main Outcomes and Measures: Change in the monthly proportion of mental health condition-associated hospitalizations between the prepandemic (February 1, 2019, to March 31, 2020) and pandemic (April 1, 2020, to April 30, 2021) periods using interrupted time series analysis. Results: There were 9696 adolescents hospitalized with a mental health condition during the prepandemic period (5966 [61.5%] female) and 11â¯101 during the pandemic period (7603 [68.5%] female). The mean (SD) age in the prepandemic cohort was 14.6 (1.9) years and in the pandemic cohort, 14.7 (1.8) years. The most prevalent diagnoses during the pandemic were anxiety (6066 [57.4%]), depression (5065 [48.0%]), and suicidality or self-injury (4673 [44.2%]). There was an increase in the proportions of monthly hospitalizations during the pandemic for anxiety (0.55%; 95% CI, 0.26%-0.84%), depression (0.50%; 95% CI, 0.19%-0.79%), and suicidality or self-injury (0.38%; 95% CI, 0.08%-0.68%). There was an estimated 0.60% increase (95% CI, 0.31%-0.89%) overall in the monthly proportion of mental health-associated hospitalizations following onset of the pandemic compared with the prepandemic period. Conclusions and Relevance: In this cohort study, onset of the COVID-19 pandemic was associated with increased hospitalizations with mental health diagnoses among adolescents. These findings support the need for greater resources within children's hospitals to care for adolescents with mental health conditions during the pandemic and beyond.
Subject(s)
COVID-19 , Pandemics , Child , Adolescent , Female , Humans , Male , COVID-19/epidemiology , Mental Health , SARS-CoV-2 , Cohort Studies , Retrospective Studies , HospitalizationABSTRACT
OBJECT: This study demonstrates that 3T SV-MRS data can be used with the currently available automatic brain tumour diagnostic classifiers which were trained on databases of 1.5T spectra. This will allow the existing large databases of 1.5T MRS data to be used for diagnostic classification of 3T spectra, and perhaps also the combination of 1.5T and 3T databases. MATERIALS AND METHODS: Brain tumour classifiers trained with 154 1.5T spectra to discriminate among high grade malignant tumours and common grade II glial tumours were evaluated with a subsequently-acquired set of 155 1.5T and 37 3T spectra. A similarity study between spectra and main brain tumour metabolite ratios for both field strengths (1.5T and 3T) was also performed. RESULTS: Our results showed that classifiers trained with 1.5T samples had similar accuracy for both test datasets (0.87 ± 0.03 for 1.5T and 0.88 ± 0.03 for 3.0T). Moreover, non-significant differences were observed with most metabolite ratios and spectral patterns. CONCLUSION: These results encourage the use of existing classifiers based on 1.5T datasets for diagnosis with 3T (1)H SV-MRS. The large 1.5T databases compiled throughout many years and the prediction models based on 1.5T acquisitions can therefore continue to be used with data from the new 3T instruments.