Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
1.
Lancet Respir Med ; 12(3): 225-236, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38219763

ABSTRACT

BACKGROUND: Although vaccines have proved effective to prevent severe COVID-19, their effect on preventing long-term symptoms is not yet fully understood. We aimed to evaluate the overall effect of vaccination to prevent long COVID symptoms and assess comparative effectiveness of the most used vaccines (ChAdOx1 and BNT162b2). METHODS: We conducted a staggered cohort study using primary care records from the UK (Clinical Practice Research Datalink [CPRD] GOLD and AURUM), Catalonia, Spain (Information System for Research in Primary Care [SIDIAP]), and national health insurance claims from Estonia (CORIVA database). All adults who were registered for at least 180 days as of Jan 4, 2021 (the UK), Feb 20, 2021 (Spain), and Jan 28, 2021 (Estonia) comprised the source population. Vaccination status was used as a time-varying exposure, staggered by vaccine rollout period. Vaccinated people were further classified by vaccine brand according to their first dose received. The primary outcome definition of long COVID was defined as having at least one of 25 WHO-listed symptoms between 90 and 365 days after the date of a PCR-positive test or clinical diagnosis of COVID-19, with no history of that symptom 180 days before SARS-Cov-2 infection. Propensity score overlap weighting was applied separately for each cohort to minimise confounding. Sub-distribution hazard ratios (sHRs) were calculated to estimate vaccine effectiveness against long COVID, and empirically calibrated using negative control outcomes. Random effects meta-analyses across staggered cohorts were conducted to pool overall effect estimates. FINDINGS: A total of 1 618 395 (CPRD GOLD), 5 729 800 (CPRD AURUM), 2 744 821 (SIDIAP), and 77 603 (CORIVA) vaccinated people and 1 640 371 (CPRD GOLD), 5 860 564 (CPRD AURUM), 2 588 518 (SIDIAP), and 302 267 (CORIVA) unvaccinated people were included. Compared with unvaccinated people, overall HRs for long COVID symptoms in people vaccinated with a first dose of any COVID-19 vaccine were 0·54 (95% CI 0·44-0·67) in CPRD GOLD, 0·48 (0·34-0·68) in CPRD AURUM, 0·71 (0·55-0·91) in SIDIAP, and 0·59 (0·40-0·87) in CORIVA. A slightly stronger preventative effect was seen for the first dose of BNT162b2 than for ChAdOx1 (sHR 0·85 [0·60-1·20] in CPRD GOLD and 0·84 [0·74-0·94] in CPRD AURUM). INTERPRETATION: Vaccination against COVID-19 consistently reduced the risk of long COVID symptoms, which highlights the importance of vaccination to prevent persistent COVID-19 symptoms, particularly in adults. FUNDING: National Institute for Health and Care Research.


Subject(s)
COVID-19 Vaccines , COVID-19 , Adult , Humans , BNT162 Vaccine , Cohort Studies , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines/therapeutic use , Estonia , Post-Acute COVID-19 Syndrome , SARS-CoV-2 , Spain , United Kingdom/epidemiology
2.
Stat Med ; 43(2): 395-418, 2024 01 30.
Article in English | MEDLINE | ID: mdl-38010062

ABSTRACT

Postmarket safety surveillance is an integral part of mass vaccination programs. Typically relying on sequential analysis of real-world health data as they accrue, safety surveillance is challenged by sequential multiple testing and by biases induced by residual confounding in observational data. The current standard approach based on the maximized sequential probability ratio test (MaxSPRT) fails to satisfactorily address these practical challenges and it remains a rigid framework that requires prespecification of the surveillance schedule. We develop an alternative Bayesian surveillance procedure that addresses both aforementioned challenges using a more flexible framework. To mitigate bias, we jointly analyze a large set of negative control outcomes that are adverse events with no known association with the vaccines in order to inform an empirical bias distribution, which we then incorporate into estimating the effect of vaccine exposure on the adverse event of interest through a Bayesian hierarchical model. To address multiple testing and improve on flexibility, at each analysis timepoint, we update a posterior probability in favor of the alternative hypothesis that vaccination induces higher risks of adverse events, and then use it for sequential detection of safety signals. Through an empirical evaluation using six US observational healthcare databases covering more than 360 million patients, we benchmark the proposed procedure against MaxSPRT on testing errors and estimation accuracy, under two epidemiological designs, the historical comparator and the self-controlled case series. We demonstrate that our procedure substantially reduces Type 1 error rates, maintains high statistical power and fast signal detection, and provides considerably more accurate estimation than MaxSPRT. Given the extensiveness of the empirical study which yields more than 7 million sets of results, we present all results in a public R ShinyApp. As an effort to promote open science, we provide full implementation of our method in the open-source R package EvidenceSynthesis.


Subject(s)
Adverse Drug Reaction Reporting Systems , Product Surveillance, Postmarketing , Vaccines , Humans , Bayes Theorem , Bias , Probability , Vaccines/adverse effects
3.
Nat Commun ; 14(1): 7449, 2023 11 17.
Article in English | MEDLINE | ID: mdl-37978296

ABSTRACT

Persistent symptoms following the acute phase of COVID-19 present a major burden to both the affected and the wider community. We conducted a cohort study including over 856,840 first COVID-19 cases, 72,422 re-infections and more than 3.1 million first negative-test controls from primary care electronic health records from Spain and the UK (Sept 2020 to Jan 2022 (UK)/March 2022 (Spain)). We characterised post-acute COVID-19 symptoms and identified key symptoms associated with persistent disease. We estimated incidence rates of persisting symptoms in the general population and among COVID-19 patients over time. Subsequently, we investigated which WHO-listed symptoms were particularly differential by comparing their frequency in COVID-19 cases vs. matched test-negative controls. Lastly, we compared persistent symptoms after first infections vs. reinfections.Our study shows that the proportion of COVID-19 cases affected by persistent post-acute COVID-19 symptoms declined over the study period. Risk for altered smell/taste was consistently higher in patients with COVID-19 vs test-negative controls. Persistent symptoms were more common after reinfection than following a first infection. More research is needed into the definition of long COVID, and the effect of interventions to minimise the risk and impact of persistent symptoms.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Cohort Studies , COVID-19/epidemiology , Electronic Health Records , Reinfection
4.
BMC Med ; 21(1): 58, 2023 02 16.
Article in English | MEDLINE | ID: mdl-36793086

ABSTRACT

BACKGROUND: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of long COVID are still in flux, and the deployment of an ICD-10-CM code for long COVID in the USA took nearly 2 years after patients had begun to describe their condition. Here, we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." METHODS: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code (n = 33,782), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. RESULTS: We established the diagnoses most commonly co-occurring with U09.9 and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty and low unemployment. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. CONCLUSIONS: This work offers insight into potential subtypes and current practice patterns around long COVID and speaks to the existence of disparities in the diagnosis of patients with long COVID. This latter finding in particular requires further research and urgent remediation.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Female , International Classification of Diseases , Pandemics , COVID-19/diagnosis , COVID-19/epidemiology , SARS-CoV-2
5.
EBioMedicine ; 87: 104413, 2023 Jan.
Article in English | MEDLINE | ID: mdl-36563487

ABSTRACT

BACKGROUND: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Disease Progression , SARS-CoV-2
6.
J Asthma ; 60(1): 76-86, 2023 01.
Article in English | MEDLINE | ID: mdl-35012410

ABSTRACT

Objective: Large international comparisons describing the clinical characteristics of patients with COVID-19 are limited. The aim of the study was to perform a large-scale descriptive characterization of COVID-19 patients with asthma.Methods: We included nine databases contributing data from January to June 2020 from the US, South Korea (KR), Spain, UK and the Netherlands. We defined two cohorts of COVID-19 patients ('diagnosed' and 'hospitalized') based on COVID-19 disease codes. We followed patients from COVID-19 index date to 30 days or death. We performed descriptive analysis and reported the frequency of characteristics and outcomes in people with asthma defined by codes and prescriptions.Results: The diagnosed and hospitalized cohorts contained 666,933 and 159,552 COVID-19 patients respectively. Exacerbation in people with asthma was recorded in 1.6-8.6% of patients at presentation. Asthma prevalence ranged from 6.2% (95% CI 5.7-6.8) to 18.5% (95% CI 18.2-18.8) in the diagnosed cohort and 5.2% (95% CI 4.0-6.8) to 20.5% (95% CI 18.6-22.6) in the hospitalized cohort. Asthma patients with COVID-19 had high prevalence of comorbidity including hypertension, heart disease, diabetes and obesity. Mortality ranged from 2.1% (95% CI 1.8-2.4) to 16.9% (95% CI 13.8-20.5) and similar or lower compared to COVID-19 patients without asthma. Acute respiratory distress syndrome occurred in 15-30% of hospitalized COVID-19 asthma patients.Conclusion: The prevalence of asthma among COVID-19 patients varies internationally. Asthma patients with COVID-19 have high comorbidity. The prevalence of asthma exacerbation at presentation was low. Whilst mortality was similar among COVID-19 patients with and without asthma, this could be confounded by differences in clinical characteristics. Further research could help identify high-risk asthma patients.[Box: see text]Supplemental data for this article is available online at https://doi.org/10.1080/02770903.2021.2025392 .


Subject(s)
Asthma , COVID-19 , Diabetes Mellitus , Humans , United States/epidemiology , COVID-19/epidemiology , Asthma/epidemiology , SARS-CoV-2 , Comorbidity , Diabetes Mellitus/epidemiology , Hospitalization
7.
AMIA Annu Symp Proc ; 2023: 634-640, 2023.
Article in English | MEDLINE | ID: mdl-38222379

ABSTRACT

Obtaining reliable data on patient mortality is a critical challenge facing observational researchers seeking to conduct studies using real-world data. As these analyses are conducted more broadly using newly-available sources of real-world evidence, missing data can serve as a rate-limiting factor. We conducted a comparison of mortality data sources from different stakeholder perspectives - academic medical center (AMC) informatics service providers, AMC research coordinators, industry analytics professionals, and academics - to understand the strengths and limitations of differing mortality data sources: locally generated data from sites conducting research, data provided by governmental sources, and commercially available data sets. Researchers seeking to conduct observational studies using extant data should consider these factors in sourcing outcomes data for their populations of interest.


Subject(s)
Academic Medical Centers , Information Sources , Humans
8.
Front Pharmacol ; 13: 945592, 2022.
Article in English | MEDLINE | ID: mdl-36188566

ABSTRACT

Purpose: Alpha-1 blockers, often used to treat benign prostatic hyperplasia (BPH), have been hypothesized to prevent COVID-19 complications by minimising cytokine storm release. The proposed treatment based on this hypothesis currently lacks support from reliable real-world evidence, however. We leverage an international network of large-scale healthcare databases to generate comprehensive evidence in a transparent and reproducible manner. Methods: In this international cohort study, we deployed electronic health records from Spain (SIDIAP) and the United States (Department of Veterans Affairs, Columbia University Irving Medical Center, IQVIA OpenClaims, Optum DOD, Optum EHR). We assessed association between alpha-1 blocker use and risks of three COVID-19 outcomes-diagnosis, hospitalization, and hospitalization requiring intensive services-using a prevalent-user active-comparator design. We estimated hazard ratios using state-of-the-art techniques to minimize potential confounding, including large-scale propensity score matching/stratification and negative control calibration. We pooled database-specific estimates through random effects meta-analysis. Results: Our study overall included 2.6 and 0.46 million users of alpha-1 blockers and of alternative BPH medications. We observed no significant difference in their risks for any of the COVID-19 outcomes, with our meta-analytic HR estimates being 1.02 (95% CI: 0.92-1.13) for diagnosis, 1.00 (95% CI: 0.89-1.13) for hospitalization, and 1.15 (95% CI: 0.71-1.88) for hospitalization requiring intensive services. Conclusion: We found no evidence of the hypothesized reduction in risks of the COVID-19 outcomes from the prevalent-use of alpha-1 blockers-further research is needed to identify effective therapies for this novel disease.

9.
medRxiv ; 2022 Sep 02.
Article in English | MEDLINE | ID: mdl-36093345

ABSTRACT

Background: Naming a newly discovered disease is a difficult process; in the context of the COVID-19 pandemic and the existence of post-acute sequelae of SARS-CoV-2 infection (PASC), which includes Long COVID, it has proven especially challenging. Disease definitions and assignment of a diagnosis code are often asynchronous and iterative. The clinical definition and our understanding of the underlying mechanisms of Long COVID are still in flux, and the deployment of an ICD-10-CM code for Long COVID in the US took nearly two years after patients had begun to describe their condition. Here we leverage the largest publicly available HIPAA-limited dataset about patients with COVID-19 in the US to examine the heterogeneity of adoption and use of U09.9, the ICD-10-CM code for "Post COVID-19 condition, unspecified." Methods: We undertook a number of analyses to characterize the N3C population with a U09.9 diagnosis code ( n = 21,072), including assessing person-level demographics and a number of area-level social determinants of health; diagnoses commonly co-occurring with U09.9, clustered using the Louvain algorithm; and quantifying medications and procedures recorded within 60 days of U09.9 diagnosis. We stratified all analyses by age group in order to discern differing patterns of care across the lifespan. Results: We established the diagnoses most commonly co-occurring with U09.9, and algorithmically clustered them into four major categories: cardiopulmonary, neurological, gastrointestinal, and comorbid conditions. Importantly, we discovered that the population of patients diagnosed with U09.9 is demographically skewed toward female, White, non-Hispanic individuals, as well as individuals living in areas with low poverty, high education, and high access to medical care. Our results also include a characterization of common procedures and medications associated with U09.9-coded patients. Conclusions: This work offers insight into potential subtypes and current practice patterns around Long COVID, and speaks to the existence of disparities in the diagnosis of patients with Long COVID. This latter finding in particular requires further research and urgent remediation.

10.
medRxiv ; 2022 Jul 20.
Article in English | MEDLINE | ID: mdl-35665012

ABSTRACT

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

11.
Lancet Infect Dis ; 22(8): 1142-1152, 2022 08.
Article in English | MEDLINE | ID: mdl-35576963

ABSTRACT

BACKGROUND: There are few data on the incidence of thrombosis among COVID-19 cases, with most research concentrated on hospitalised patients. We aimed to estimate the incidence of venous thromboembolism, arterial thromboembolism, and death among COVID-19 cases and to assess the impact of these events on the risks of hospitalisation and death. METHODS: We conducted a distributed network cohort study using primary care records from the Netherlands, Italy, Spain, and the UK, and outpatient specialist records from Germany. The Spanish database was linked to hospital admissions. Participants were followed up from the date of a diagnosis of COVID-19 or positive RT-PCR test for SARS-CoV-2 (index date) for 90 days. The primary study outcomes were venous thromboembolic events, arterial thromboembolic events, and death, all over the 90 days from the index date. We estimated cumulative incidences for the study outcomes. Multistate models were used to calculate adjusted hazard ratios (HRs) for the association between venous thromboembolism or arterial thromboembolism occurrence and risks of hospitalisation or COVID-19 fatality. FINDINGS: Overall, 909 473 COVID-19 cases and 32 329 patients hospitalised with COVID-19 on or after Sept 1, 2020, were studied. The latest index dates across the databases ranged from Jan 30, 2021, to July 31, 2021. Cumulative 90-day incidence of venous thromboembolism ranged from 0·2% to 0·8% among COVID-19 cases, and up to 4·5% for those hospitalised. For arterial thromboembolism, estimates ranged from 0·1% to 0·8% among COVID-19 cases, increasing to 3·1% among those hospitalised. Case fatality ranged from 1·1% to 2·0% among patients with COVID-19, rising to 14·6% for hospitalised patients. The occurrence of venous thromboembolism in patients with COVID-19 was associated with an increased risk of death (adjusted HRs 4·42 [3·07-6·36] for those not hospitalised and 1·63 [1·39-1·90] for those hospitalised), as was the occurrence of arterial thromboembolism (3·16 [2·65-3·75] and 1·93 [1·57-2·37]). INTERPRETATION: Risks of venous thromboembolism and arterial thromboembolism were up to 1% among COVID-19 cases, and increased with age, among males, and in those who were hospitalised. Their occurrence was associated with excess mortality, underlying the importance of developing effective treatment strategies that reduce their frequency. FUNDING: European Medicines Agency.


Subject(s)
COVID-19 , Venous Thromboembolism , Venous Thrombosis , COVID-19/epidemiology , Cohort Studies , Humans , Male , SARS-CoV-2 , Venous Thromboembolism/complications , Venous Thromboembolism/epidemiology , Venous Thrombosis/complications
12.
Lancet Digit Health ; 4(7): e532-e541, 2022 07.
Article in English | MEDLINE | ID: mdl-35589549

ABSTRACT

BACKGROUND: Post-acute sequelae of SARS-CoV-2 infection, known as long COVID, have severely affected recovery from the COVID-19 pandemic for patients and society alike. Long COVID is characterised by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous definition. Studies of electronic health records are a crucial element of the US National Institutes of Health's RECOVER Initiative, which is addressing the urgent need to understand long COVID, identify treatments, and accurately identify who has it-the latter is the aim of this study. METHODS: Using the National COVID Cohort Collaborative's (N3C) electronic health record repository, we developed XGBoost machine learning models to identify potential patients with long COVID. We defined our base population (n=1 793 604) as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date. We examined demographics, health-care utilisation, diagnoses, and medications for 97 995 adults with COVID-19. We used data on these features and 597 patients from a long COVID clinic to train three machine learning models to identify potential long COVID among all patients with COVID-19, patients hospitalised with COVID-19, and patients who had COVID-19 but were not hospitalised. Feature importance was determined via Shapley values. We further validated the models on data from a fourth site. FINDINGS: Our models identified, with high accuracy, patients who potentially have long COVID, achieving areas under the receiver operator characteristic curve of 0·92 (all patients), 0·90 (hospitalised), and 0·85 (non-hospitalised). Important features, as defined by Shapley values, include rate of health-care utilisation, patient age, dyspnoea, and other diagnosis and medication information available within the electronic health record. INTERPRETATION: Patients identified by our models as potentially having long COVID can be interpreted as patients warranting care at a specialty clinic for long COVID, which is an essential proxy for long COVID diagnosis as its definition continues to evolve. We also achieve the urgent goal of identifying potential long COVID in patients for clinical trials. As more data sources are identified, our models can be retrained and tuned based on the needs of individual studies. FUNDING: US National Institutes of Health and National Center for Advancing Translational Sciences through the RECOVER Initiative.


Subject(s)
COVID-19 , Adolescent , Adult , COVID-19/complications , COVID-19/diagnosis , COVID-19/epidemiology , COVID-19 Testing , Humans , Machine Learning , Pandemics , SARS-CoV-2 , United States/epidemiology , Post-Acute COVID-19 Syndrome
13.
Clin Epidemiol ; 14: 369-384, 2022.
Article in English | MEDLINE | ID: mdl-35345821

ABSTRACT

Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

14.
J Clin Transl Sci ; 6(1): e10, 2022.
Article in English | MEDLINE | ID: mdl-35211336

ABSTRACT

Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g., Patient First Name, Patient Phone Number), institutions may start using them in clinical data capture (and in some cases, they already have). Once in use, PII-containing values associated with these terms may find their way into laboratory or observation data tables via extract-transform-load jobs intended to process structured data, putting institutions at risk of unintended disclosure. Here we aim to inform the informatics community of these findings, as well as put out a call to action for remediation by the community.

16.
BMC Med Res Methodol ; 22(1): 35, 2022 01 30.
Article in English | MEDLINE | ID: mdl-35094685

ABSTRACT

BACKGROUND: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. METHODS: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. RESULTS: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. CONCLUSIONS: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.


Subject(s)
COVID-19 , Influenza, Human , Pneumonia , COVID-19 Testing , Humans , Influenza, Human/epidemiology , SARS-CoV-2 , United States
17.
J Am Med Inform Assoc ; 29(4): 609-618, 2022 03 15.
Article in English | MEDLINE | ID: mdl-34590684

ABSTRACT

OBJECTIVE: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.


Subject(s)
COVID-19 , Cohort Studies , Data Accuracy , Health Insurance Portability and Accountability Act , Humans , United States
18.
Wellcome Open Res ; 7: 22, 2022.
Article in English | MEDLINE | ID: mdl-36845321

ABSTRACT

Background: Characterization studies of COVID-19 patients with chronic obstructive pulmonary disease (COPD) are limited in size and scope. The aim of the study is to provide a large-scale characterization of COVID-19 patients with COPD. Methods: We included thirteen databases contributing data from January-June 2020 from North America (US), Europe and Asia. We defined two cohorts of patients with COVID-19 namely a 'diagnosed' and 'hospitalized' cohort. We followed patients from COVID-19 index date to 30 days or death. We performed descriptive analysis and reported the frequency of characteristics and outcomes among COPD patients with COVID-19. Results: The study included 934,778 patients in the diagnosed COVID-19 cohort and 177,201 in the hospitalized COVID-19 cohort. Observed COPD prevalence in the diagnosed cohort ranged from 3.8% (95%CI 3.5-4.1%) in French data to 22.7% (95%CI 22.4-23.0) in US data, and from 1.9% (95%CI 1.6-2.2) in South Korean to 44.0% (95%CI 43.1-45.0) in US data, in the hospitalized cohorts. COPD patients in the hospitalized cohort had greater comorbidity than those in the diagnosed cohort, including hypertension, heart disease, diabetes and obesity. Mortality was higher in COPD patients in the hospitalized cohort and ranged from 7.6% (95%CI 6.9-8.4) to 32.2% (95%CI 28.0-36.7) across databases. ARDS, acute renal failure, cardiac arrhythmia and sepsis were the most common outcomes among hospitalized COPD patients.   Conclusion: COPD patients with COVID-19 have high levels of COVID-19-associated comorbidities and poor COVID-19 outcomes. Further research is required to identify patients with COPD at high risk of worse outcomes.

19.
BMJ Open ; 11(12): e057632, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34937726

ABSTRACT

OBJECTIVE: To characterise patients with and without prevalent hypertension and COVID-19 and to assess adverse outcomes in both inpatients and outpatients. DESIGN AND SETTING: This is a retrospective cohort study using 15 healthcare databases (primary and secondary electronic healthcare records, insurance and national claims data) from the USA, Europe and South Korea, standardised to the Observational Medical Outcomes Partnership common data model. Data were gathered from 1 March to 31 October 2020. PARTICIPANTS: Two non-mutually exclusive cohorts were defined: (1) individuals diagnosed with COVID-19 (diagnosed cohort) and (2) individuals hospitalised with COVID-19 (hospitalised cohort), and stratified by hypertension status. Follow-up was from COVID-19 diagnosis/hospitalisation to death, end of the study period or 30 days. OUTCOMES: Demographics, comorbidities and 30-day outcomes (hospitalisation and death for the 'diagnosed' cohort and adverse events and death for the 'hospitalised' cohort) were reported. RESULTS: We identified 2 851 035 diagnosed and 563 708 hospitalised patients with COVID-19. Hypertension was more prevalent in the latter (ranging across databases from 17.4% (95% CI 17.2 to 17.6) to 61.4% (95% CI 61.0 to 61.8) and from 25.6% (95% CI 24.6 to 26.6) to 85.9% (95% CI 85.2 to 86.6)). Patients in both cohorts with hypertension were predominantly >50 years old and female. Patients with hypertension were frequently diagnosed with obesity, heart disease, dyslipidaemia and diabetes. Compared with patients without hypertension, patients with hypertension in the COVID-19 diagnosed cohort had more hospitalisations (ranging from 1.3% (95% CI 0.4 to 2.2) to 41.1% (95% CI 39.5 to 42.7) vs from 1.4% (95% CI 0.9 to 1.9) to 15.9% (95% CI 14.9 to 16.9)) and increased mortality (ranging from 0.3% (95% CI 0.1 to 0.5) to 18.5% (95% CI 15.7 to 21.3) vs from 0.2% (95% CI 0.2 to 0.2) to 11.8% (95% CI 10.8 to 12.8)). Patients in the COVID-19 hospitalised cohort with hypertension were more likely to have acute respiratory distress syndrome (ranging from 0.1% (95% CI 0.0 to 0.2) to 65.6% (95% CI 62.5 to 68.7) vs from 0.1% (95% CI 0.0 to 0.2) to 54.7% (95% CI 50.5 to 58.9)), arrhythmia (ranging from 0.5% (95% CI 0.3 to 0.7) to 45.8% (95% CI 42.6 to 49.0) vs from 0.4% (95% CI 0.3 to 0.5) to 36.8% (95% CI 32.7 to 40.9)) and increased mortality (ranging from 1.8% (95% CI 0.4 to 3.2) to 25.1% (95% CI 23.0 to 27.2) vs from 0.7% (95% CI 0.5 to 0.9) to 10.9% (95% CI 10.4 to 11.4)) than patients without hypertension. CONCLUSIONS: COVID-19 patients with hypertension were more likely to suffer severe outcomes, hospitalisations and deaths compared with those without hypertension.


Subject(s)
COVID-19 , Hypertension , COVID-19 Testing , Cohort Studies , Comorbidity , Female , Hospitalization , Humans , Hypertension/epidemiology , Middle Aged , Retrospective Studies , SARS-CoV-2
20.
JAMA Netw Open ; 4(7): e2116901, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34255046

ABSTRACT

Importance: The National COVID Cohort Collaborative (N3C) is a centralized, harmonized, high-granularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy. Objectives: To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. Design, Setting, and Participants: In a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). Main Outcomes and Measures: Patient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. Results: The cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, age (odds ratio [OR], 1.03 per year; 95% CI, 1.03-1.04), male sex (OR, 1.60; 95% CI, 1.51-1.69), liver disease (OR, 1.20; 95% CI, 1.08-1.34), dementia (OR, 1.26; 95% CI, 1.13-1.41), African American (OR, 1.12; 95% CI, 1.05-1.20) and Asian (OR, 1.33; 95% CI, 1.12-1.57) race, and obesity (OR, 1.36; 95% CI, 1.27-1.46) were independently associated with higher clinical severity. Conclusions and Relevance: This cohort study found that COVID-19 mortality decreased over time during 2020 and that patient demographic characteristics and comorbidities were associated with higher clinical severity. The machine learning models accurately predicted ultimate clinical severity using commonly collected clinical data from the first 24 hours of a hospital admission.


Subject(s)
COVID-19 , Databases, Factual , Forecasting , Hospitalization , Models, Biological , Severity of Illness Index , Adult , Aged , Aged, 80 and over , COVID-19/ethnology , COVID-19/mortality , Comorbidity , Ethnicity , Extracorporeal Membrane Oxygenation , Female , Humans , Hydrogen-Ion Concentration , Male , Middle Aged , Pandemics , Respiration, Artificial , Retrospective Studies , Risk Factors , SARS-CoV-2 , United States , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...