Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
Add more filters

Country/Region as subject
Publication year range
1.
J Biomed Inform ; 139: 104295, 2023 03.
Article in English | MEDLINE | ID: mdl-36716983

ABSTRACT

Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.


Subject(s)
COVID-19 , Humans , Algorithms , Research Design , Bias , Probability
2.
Glia ; 69(7): 1767-1781, 2021 07.
Article in English | MEDLINE | ID: mdl-33704822

ABSTRACT

The characterization of the tumor microenvironment (TME) in high grade gliomas (HGG) has generated significant interest in an effort to understand how neoplastic lesions in the central nervous system (CNS) are supported and to devise novel therapeutic targets. The TME of the CNS contains unique and specialized cells, including the resident myeloid cells, microglia. Myeloid involvement in HGG, such as glioblastoma, is associated with poor outcomes. Glioma-associated microglia and infiltrating monocytes/macrophages (GAM) accumulate within the neoplastic lesion where they facilitate tumor growth and drive immunosuppression. However, it has been difficult to differentiate whether microglia and macrophages have similar or distinct roles in pathology, and if the spatial organization of these cells informs outcomes. Here, we characterize the tumor-stroma border and identify peritumoral GAM (PGAM) as a unique subpopulation of GAM. Using data mining and analyses of samples derived from both murine and human sources we show that PGAM exhibit a pro-inflammatory and chemotactic phenotype that is associated with peripheral monocyte recruitment, and decreased overall survival. PGAM act as a unique subset of GAM at the tumor-stroma interface. We define a novel gene signature to identify these cells and suggest that PGAM constitute a cellular target of the TME.


Subject(s)
Brain Neoplasms , Glioblastoma , Glioma , Animals , Brain Neoplasms/pathology , Glioblastoma/pathology , Glioma/pathology , Macrophages/pathology , Mice , Microglia/pathology , Tumor Microenvironment
3.
Am J Pathol ; 190(7): 1491-1504, 2020 07.
Article in English | MEDLINE | ID: mdl-32277893

ABSTRACT

Quantitative assessment of spatial relations between tumor and tumor-infiltrating lymphocytes (TIL) is increasingly important in both basic science and clinical aspects of breast cancer research. We have developed and evaluated convolutional neural network analysis pipelines to generate combined maps of cancer regions and TILs in routine diagnostic breast cancer whole slide tissue images. The combined maps provide insight about the structural patterns and spatial distribution of lymphocytic infiltrates and facilitate improved quantification of TILs. Both tumor and TIL analyses were evaluated by using three convolutional neural network networks (34-layer ResNet, 16-layer VGG, and Inception v4); the results compared favorably with those obtained by using the best published methods. We have produced open-source tools and a public data set consisting of tumor/TIL maps for 1090 invasive breast cancer images from The Cancer Genome Atlas. The maps can be downloaded for further downstream analyses.


Subject(s)
Breast Neoplasms/pathology , Deep Learning , Lymphocytes, Tumor-Infiltrating/pathology , Breast Neoplasms/immunology , Female , Humans , Lymphocytes, Tumor-Infiltrating/immunology , SEER Program
4.
Br J Cancer ; 123(3): 495, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32393850

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

5.
Br J Cancer ; 120(1): 88-96, 2019 01.
Article in English | MEDLINE | ID: mdl-30377341

ABSTRACT

BACKGROUND: Pancreatic cancer (PC) hijacks innate cellular processes to promote cancer growth. We hypothesized that PC exploits PD-1/PD-L1 not only to avoid immune responses, but to directly enhance growth. We also hypothesized that immune checkpoint inhibitors (ICIs) have direct cytotoxicity in PC. We sought to elucidate therapeutic targeting of PD-1/PD-L1. METHODS: PD-1 was assessed in PC cells, patient-derived organoids (PDOs), and clinical tissues. Then, PC cells were exposed to PD-L1 to evaluate proliferation. To test PD-1/PD-L1 signaling, cells were exposed to PD-L1 and MAPK was examined. Radio-immunoconjugates with anti-PD-1 drugs were developed to test uptake in patient-derived tumor xenografts (PDTXs). Next, PD-1 function was assessed by xenografting PD-1-knockdown cells. Finally, PC models were exposed to ICIs. RESULTS: PD-1 expression was demonstrated in PCs. PD-L1 exposure increased proliferation and activated MAPK. Imaging PDTXs revealed uptake of radio-immunoconjugates. PD-1 knockdown in vivo revealed 67% smaller volumes than controls. Finally, ICI treatment of both PDOs/PDTXs demonstrated cytotoxicity and anti-MEK1/2 combined with anti-PD-1 drugs produced highest cytotoxicity in PDOs/PDTXs. CONCLUSIONS: Our data reveal PCs innately express PD-1 and activate druggable oncogenic pathways supporting PDAC growth. Strategies directly targeting PC with novel ICI regimens may work with adaptive immune responses for optimal cytotoxicity.


Subject(s)
B7-H1 Antigen/immunology , Immunotherapy , Pancreatic Neoplasms/drug therapy , Programmed Cell Death 1 Receptor/immunology , Animals , B7-H1 Antigen/antagonists & inhibitors , Cell Proliferation/drug effects , Female , Humans , Male , Mice , Organoids/drug effects , Organoids/immunology , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/immunology , Pancreatic Neoplasms/pathology , Primary Cell Culture , Programmed Cell Death 1 Receptor/antagonists & inhibitors , Signal Transduction/drug effects , Xenograft Model Antitumor Assays
6.
J Immunother Cancer ; 12(1)2024 01 08.
Article in English | MEDLINE | ID: mdl-38191243

ABSTRACT

BACKGROUND: Pancreatic ductal adenocarcinoma (PDAC) is an aggressive tumor. Prognosis is poor and survival is low in patients diagnosed with this disease, with a survival rate of ~12% at 5 years. Immunotherapy, including adoptive T cell transfer therapy, has not impacted the outcomes in patients with PDAC, due in part to the hostile tumor microenvironment (TME) which limits T cell trafficking and persistence. We posit that murine models serve as useful tools to study the fate of T cell therapy. Currently, genetically engineered mouse models (GEMMs) for PDAC are considered a "gold-standard" as they recapitulate many aspects of human disease. However, these models have limitations, including marked tumor variability across individual mice and the cost of colony maintenance. METHODS: Using flow cytometry and immunohistochemistry, we characterized the immunological features and trafficking patterns of adoptively transferred T cells in orthotopic PDAC (C57BL/6) models using two mouse cell lines, KPC-Luc and MT-5, isolated from C57BL/6 KPC-GEMM (KrasLSL-G12D/+p53-/- and KrasLSL-G12D/+p53LSL-R172H/+, respectively). RESULTS: The MT-5 orthotopic model best recapitulates the cellular and stromal features of the TME in the PDAC GEMM. In contrast, far more host immune cells infiltrate the KPC-Luc tumors, which have less stroma, although CD4+ and CD8+ T cells were similarly detected in the MT-5 tumors compared with KPC-GEMM in mice. Interestingly, we found that chimeric antigen receptor (CAR) T cells redirected to recognize mesothelin on these tumors that signal via CD3ζ and 41BB (Meso-41BBζ-CAR T cells) infiltrated the tumors of mice bearing stroma-devoid KPC-Luc orthotopic tumors, but not MT-5 tumors. CONCLUSIONS: Our data establish for the first time a reproducible and realistic clinical system useful for modeling stroma-rich and stroma-devoid PDAC tumors. These models shall serve an indepth study of how to overcome barriers that limit antitumor activity of adoptively transferred T cells.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Animals , Mice , Mice, Inbred C57BL , Proto-Oncogene Proteins p21(ras) , CD8-Positive T-Lymphocytes , Tumor Suppressor Protein p53 , Pancreatic Neoplasms/therapy , Carcinoma, Pancreatic Ductal/therapy , Tumor Microenvironment
7.
medRxiv ; 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38947087

ABSTRACT

Post-Acute Sequelae of SARS-CoV-2 infection (PASC), also known as Long-COVID, encompasses a variety of complex and varied outcomes following COVID-19 infection that are still poorly understood. We clustered over 600 million condition diagnoses from 14 million patients available through the National COVID Cohort Collaborative (N3C), generating hundreds of highly detailed clinical phenotypes. Assessing patient clinical trajectories using these clusters allowed us to identify individual conditions and phenotypes strongly increased after acute infection. We found many conditions increased in COVID-19 patients compared to controls, and using a novel method to associate patients with clusters over time, we additionally found phenotypes specific to patient sex, age, wave of infection, and PASC diagnosis status. While many of these results reflect known PASC symptoms, the resolution provided by this unprecedented data scale suggests avenues for improved diagnostics and mechanistic understanding of this multifaceted disease.

8.
Article in English | MEDLINE | ID: mdl-37467096

ABSTRACT

Gene expression analysis of samples with mixed cell types only provides limited insight to the characteristics of specific tissues. In silico deconvolution can be applied to extract cell type specific expression, thus avoiding prohibitively expensive techniques such as cell sorting or single-cell sequencing. Non-negative matrix factorization (NMF) is a deconvolution method shown to be useful for gene expression data, in part due to its constraint of non-negativity. Unlike other methods, NMF provides the capability to deconvolve without prior knowledge of the components of the model. However, NMF is not guaranteed to provide a globally unique solution. In this work, we present FaStaNMF, a method that balances achieving global stability of the NMF results, which is essential for inter-experiment and inter-lab reproducibility, with accuracy and speed. Results: FaStaNMF was applied to four datasets with known ground truth, created based on publicly available data or by using our simulation infrastructure, RNAGinesis. We assessed FaStaNMF on three criteria - speed, accuracy, and stability, and it favorably compared to the standard approach of achieving reproduceable results with NMF. We expect that FaStaNMF can be applied successfully to a wide array of biological data, such as different tumor/immune and other disease microenvironments.

9.
Commun Biol ; 6(1): 163, 2023 02 10.
Article in English | MEDLINE | ID: mdl-36765128

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive disease for which potent therapies have limited efficacy. Several studies have described the transcriptomic landscape of PDAC tumors to provide insight into potentially actionable gene expression signatures to improve patient outcomes. Despite centralization efforts from multiple organizations and increased transparency requirements from funding agencies and publishers, analysis of public PDAC data remains difficult. Bioinformatic pitfalls litter public transcriptomic data, such as subtle inclusion of low-purity and non-adenocarcinoma cases. These pitfalls can introduce non-specificity to gene signatures without appropriate data curation, which can negatively impact findings. To reduce barriers to analysis, we have created pdacR ( http://pdacR.bmi.stonybrook.edu , github.com/rmoffitt/pdacR), an open-source software package and web-tool with annotated datasets from landmark studies and an interface for user-friendly analysis in clustering, differential expression, survival, and dimensionality reduction. Using this tool, we present a multi-dataset analysis of PDAC transcriptomics that confirms the basal-like/classical model over alternatives.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Prognosis , Pancreatic Neoplasms/pathology , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/pathology , Gene Expression Profiling , Pancreatic Neoplasms
10.
Nat Commun ; 14(1): 5226, 2023 08 26.
Article in English | MEDLINE | ID: mdl-37633924

ABSTRACT

Bulk analyses of pancreatic ductal adenocarcinoma (PDAC) samples are complicated by the tumor microenvironment (TME), i.e. signals from fibroblasts, endocrine, exocrine, and immune cells. Despite this, we and others have established tumor and stroma subtypes with prognostic significance. However, understanding of underlying signals driving distinct immune and stromal landscapes is still incomplete. Here we integrate 92 single cell RNA-seq samples from seven independent studies to build a reproducible PDAC atlas with a focus on tumor-TME interdependence. Patients with activated stroma are synonymous with higher myofibroblastic and immunogenic fibroblasts, and furthermore show increased M2-like macrophages and regulatory T-cells. Contrastingly, patients with 'normal' stroma show M1-like recruitment, elevated effector and exhausted T-cells. To aid interoperability of future studies, we provide a pretrained cell type classifier and an atlas of subtype-based signaling factors that we also validate in mouse data. Ultimately, this work leverages the heterogeneity among single-cell studies to create a comprehensive view of the orchestra of signaling interactions governing PDAC.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Animals , Mice , Tumor Microenvironment , Pancreatic Neoplasms/genetics , Carcinoma, Pancreatic Ductal/genetics , Fibroblasts
11.
J Am Med Inform Assoc ; 30(7): 1305-1312, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37218289

ABSTRACT

Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID. Supported by RECOVER, N3C and NIH's All of Us study partnered to reproduce the output of N3C's trained model in the All of Us data enclave, demonstrating model extensibility in multiple environments. This case study in ML-based phenotype reuse illustrates how open-source software best practices and cross-site collaboration can de-black-box phenotyping algorithms, prevent unnecessary rework, and promote open science in informatics.


Subject(s)
Boxing , COVID-19 , Population Health , Humans , Electronic Health Records , Post-Acute COVID-19 Syndrome , Reproducibility of Results , Machine Learning , Phenotype
12.
Nat Commun ; 14(1): 2914, 2023 05 22.
Article in English | MEDLINE | ID: mdl-37217471

ABSTRACT

Long COVID, or complications arising from COVID-19 weeks after infection, has become a central concern for public health experts. The United States National Institutes of Health founded the RECOVER initiative to better understand long COVID. We used electronic health records available through the National COVID Cohort Collaborative to characterize the association between SARS-CoV-2 vaccination and long COVID diagnosis. Among patients with a COVID-19 infection between August 1, 2021 and January 31, 2022, we defined two cohorts using distinct definitions of long COVID-a clinical diagnosis (n = 47,404) or a previously described computational phenotype (n = 198,514)-to compare unvaccinated individuals to those with a complete vaccine series prior to infection. Evidence of long COVID was monitored through June or July of 2022, depending on patients' data availability. We found that vaccination was consistently associated with lower odds and rates of long COVID clinical diagnosis and high-confidence computationally derived diagnosis after adjusting for sex, demographics, and medical history.


Subject(s)
COVID-19 , Post-Acute COVID-19 Syndrome , United States/epidemiology , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines , Cohort Studies , SARS-CoV-2 , Vaccination
13.
Clin J Am Soc Nephrol ; 18(8): 1006-1018, 2023 08 01.
Article in English | MEDLINE | ID: mdl-37131278

ABSTRACT

BACKGROUND: AKI is associated with mortality in patients hospitalized with coronavirus disease 2019 (COVID-19); however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied. METHODS: Electronic health record data were obtained from 53 health systems in the United States in the National COVID Cohort Collaborative. We selected hospitalized adults diagnosed with COVID-19 between March 6, 2020, and January 6, 2022. AKI was determined with serum creatinine and diagnosis codes. Time was divided into 16-week periods (P1-6) and geographical regions into Northeast, Midwest, South, and West. Multivariable models were used to analyze the risk factors for AKI or mortality. RESULTS: Of a total cohort of 336,473, 129,176 (38%) patients had AKI. Fifty-six thousand three hundred and twenty-two (17%) lacked a diagnosis code but had AKI based on the change in serum creatinine. Similar to patients coded for AKI, these patients had higher mortality compared with those without AKI. The incidence of AKI was highest in P1 (47%; 23,097/48,947), lower in P2 (37%; 12,102/32,513), and relatively stable thereafter. Compared with the Midwest, the Northeast, South, and West had higher adjusted odds of AKI in P1. Subsequently, the South and West regions continued to have the highest relative AKI odds. In multivariable models, AKI defined by either serum creatinine or diagnostic code and the severity of AKI was associated with mortality. CONCLUSIONS: The incidence and distribution of COVID-19-associated AKI changed since the first wave of the pandemic in the United States. PODCAST: This article contains a podcast at https://dts.podtrac.com/redirect.mp3/www.asn-online.org/media/podcast/CJASN/2023_08_08_CJN0000000000000192.mp3.


Subject(s)
Acute Kidney Injury , COVID-19 , Adult , Humans , COVID-19/complications , COVID-19/epidemiology , Retrospective Studies , Creatinine , Risk Factors , Acute Kidney Injury/diagnosis , Hospital Mortality
14.
J Clin Transl Sci ; 7(1): e175, 2023.
Article in English | MEDLINE | ID: mdl-37745933

ABSTRACT

Introduction: With persistent incidence, incomplete vaccination rates, confounding respiratory illnesses, and few therapeutic interventions available, COVID-19 continues to be a burden on the pediatric population. During a surge, it is difficult for hospitals to direct limited healthcare resources effectively. While the overwhelming majority of pediatric infections are mild, there have been life-threatening exceptions that illuminated the need to proactively identify pediatric patients at risk of severe COVID-19 and other respiratory infectious diseases. However, a nationwide capability for developing validated computational tools to identify pediatric patients at risk using real-world data does not exist. Methods: HHS ASPR BARDA sought, through the power of competition in a challenge, to create computational models to address two clinically important questions using the National COVID Cohort Collaborative: (1) Of pediatric patients who test positive for COVID-19 in an outpatient setting, who are at risk for hospitalization? (2) Of pediatric patients who test positive for COVID-19 and are hospitalized, who are at risk for needing mechanical ventilation or cardiovascular interventions? Results: This challenge was the first, multi-agency, coordinated computational challenge carried out by the federal government as a response to a public health emergency. Fifty-five computational models were evaluated across both tasks and two winners and three honorable mentions were selected. Conclusion: This challenge serves as a framework for how the government, research communities, and large data repositories can be brought together to source solutions when resources are strapped during a pandemic.

15.
Sleep ; 46(9)2023 09 08.
Article in English | MEDLINE | ID: mdl-37166330

ABSTRACT

STUDY OBJECTIVES: Obstructive sleep apnea (OSA) has been associated with more severe acute coronavirus disease-2019 (COVID-19) outcomes. We assessed OSA as a potential risk factor for Post-Acute Sequelae of SARS-CoV-2 (PASC). METHODS: We assessed the impact of preexisting OSA on the risk for probable PASC in adults and children using electronic health record data from multiple research networks. Three research networks within the REsearching COVID to Enhance Recovery initiative (PCORnet Adult, PCORnet Pediatric, and the National COVID Cohort Collaborative [N3C]) employed a harmonized analytic approach to examine the risk of probable PASC in COVID-19-positive patients with and without a diagnosis of OSA prior to pandemic onset. Unadjusted odds ratios (ORs) were calculated as well as ORs adjusted for age group, sex, race/ethnicity, hospitalization status, obesity, and preexisting comorbidities. RESULTS: Across networks, the unadjusted OR for probable PASC associated with a preexisting OSA diagnosis in adults and children ranged from 1.41 to 3.93. Adjusted analyses found an attenuated association that remained significant among adults only. Multiple sensitivity analyses with expanded inclusion criteria and covariates yielded results consistent with the primary analysis. CONCLUSIONS: Adults with preexisting OSA were found to have significantly elevated odds of probable PASC. This finding was consistent across data sources, approaches for identifying COVID-19-positive patients, and definitions of PASC. Patients with OSA may be at elevated risk for PASC after SARS-CoV-2 infection and should be monitored for post-acute sequelae.


Subject(s)
COVID-19 , Sleep Apnea, Obstructive , Adult , Humans , Child , COVID-19/complications , COVID-19/diagnosis , COVID-19/epidemiology , Electronic Health Records , Post-Acute COVID-19 Syndrome , SARS-CoV-2 , Disease Progression , Risk Factors , Sleep Apnea, Obstructive/complications , Sleep Apnea, Obstructive/diagnosis , Sleep Apnea, Obstructive/epidemiology
16.
Cancers (Basel) ; 14(9)2022 Apr 26.
Article in English | MEDLINE | ID: mdl-35565277

ABSTRACT

Tumor-infiltrating lymphocytes (TILs) have been established as a robust prognostic biomarker in breast cancer, with emerging utility in predicting treatment response in the adjuvant and neoadjuvant settings. In this study, the role of TILs in predicting overall survival and progression-free interval was evaluated in two independent cohorts of breast cancer from the Cancer Genome Atlas (TCGA BRCA) and the Carolina Breast Cancer Study (UNC CBCS). We utilized machine learning and computer vision algorithms to characterize TIL infiltrates in digital whole-slide images (WSIs) of breast cancer stained with hematoxylin and eosin (H&E). Multiple parameters were used to characterize the global abundance and spatial features of TIL infiltrates. Univariate and multivariate analyses show that large aggregates of peritumoral and intratumoral TILs (forests) were associated with longer survival, whereas the absence of intratumoral TILs (deserts) is associated with increased risk of recurrence. Patients with two or more high-risk spatial features were associated with significantly shorter progression-free interval (PFI). This study demonstrates the practical utility of Pathomics in evaluating the clinical significance of the abundance and spatial patterns of distribution of TIL infiltrates as important biomarkers in breast cancer.

17.
medRxiv ; 2022 Oct 07.
Article in English | MEDLINE | ID: mdl-36238713

ABSTRACT

Importance: Characterizing the effect of vaccination on long COVID allows for better healthcare recommendations. Objective: To determine if, and to what degree, vaccination prior to COVID-19 is associated with eventual long COVID onset, among those a documented COVID-19 infection. Design Settings and Participants: Retrospective cohort study of adults with evidence of COVID-19 between August 1, 2021 and January 31, 2022 based on electronic health records from eleven healthcare institutions taking part in the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, a project of the National Covid Cohort Collaborative (N3C). Exposures: Pre-COVID-19 receipt of a complete vaccine series versus no pre-COVID-19 vaccination. Main Outcomes and Measures: Two approaches to the identification of long COVID were used. In the clinical diagnosis cohort (n=47,752), ICD-10 diagnosis codes or evidence of a healthcare encounter at a long COVID clinic were used. In the model-based cohort (n=199,498), a computable phenotype was used. The association between pre-COVID vaccination and long COVID was estimated using IPTW-adjusted logistic regression and Cox proportional hazards. Results: In both cohorts, when adjusting for demographics and medical history, pre-COVID vaccination was associated with a reduced risk of long COVID (clinic-based cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; model-based cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75). Conclusions and Relevance: Long COVID has become a central concern for public health experts. Prior studies have considered the effect of vaccination on the prevalence of future long COVID symptoms, but ours is the first to thoroughly characterize the association between vaccination and clinically diagnosed or computationally derived long COVID. Our results bolster the growing consensus that vaccines retain protective effects against long COVID even in breakthrough infections. Key Points: Question: Does vaccination prior to COVID-19 onset change the risk of long COVID diagnosis?Findings: Four observational analyses of EHRs showed a statistically significant reduction in long COVID risk associated with pre-COVID vaccination (first cohort: HR, 0.66; 95% CI, 0.55-0.80; OR, 0.69; 95% CI, 0.59-0.82; second cohort: HR, 0.62; 95% CI, 0.56-0.69; OR, 0.70; 95% CI, 0.65-0.75).Meaning: Vaccination prior to COVID onset has a protective association with long COVID even in the case of breakthrough infections.

18.
J Am Med Inform Assoc ; 29(7): 1172-1182, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35435957

ABSTRACT

OBJECTIVE: The goals of this study were to harmonize data from electronic health records (EHRs) into common units, and impute units that were missing. MATERIALS AND METHODS: The National COVID Cohort Collaborative (N3C) table of laboratory measurement data-over 3.1 billion patient records and over 19 000 unique measurement concepts in the Observational Medical Outcomes Partnership (OMOP) common-data-model format from 55 data partners. We grouped ontologically similar OMOP concepts together for 52 variables relevant to COVID-19 research, and developed a unit-harmonization pipeline comprised of (1) selecting a canonical unit for each measurement variable, (2) arriving at a formula for conversion, (3) obtaining clinical review of each formula, (4) applying the formula to convert data values in each unit into the target canonical unit, and (5) removing any harmonized value that fell outside of accepted value ranges for the variable. For data with missing units for all the results within a lab test for a data partner, we compared values with pooled values of all data partners, using the Kolmogorov-Smirnov test. RESULTS: Of the concepts without missing values, we harmonized 88.1% of the values, and imputed units for 78.2% of records where units were absent (41% of contributors' records lacked units). DISCUSSION: The harmonization and inference methods developed herein can serve as a resource for initiatives aiming to extract insight from heterogeneous EHR collections. Unique properties of centralized data are harnessed to enable unit inference. CONCLUSION: The pipeline we developed for the pooled N3C data enables use of measurements that would otherwise be unavailable for analysis.


Subject(s)
COVID-19 , Electronic Health Records , Cohort Studies , Data Collection , Humans
19.
medRxiv ; 2022 Sep 02.
Article in English | MEDLINE | ID: mdl-36093355

ABSTRACT

Background: Acute kidney injury (AKI) is associated with mortality in patients hospitalized with COVID-19, however, its incidence, geographic distribution, and temporal trends since the start of the pandemic are understudied. Methods: Electronic health record data were obtained from 53 health systems in the United States (US) in the National COVID Cohort Collaborative (N3C). We selected hospitalized adults diagnosed with COVID-19 between March 6th, 2020, and January 6th, 2022. AKI was determined with serum creatinine (SCr) and diagnosis codes. Time were divided into 16-weeks (P1-6) periods and geographical regions into Northeast, Midwest, South, and West. Multivariable models were used to analyze the risk factors for AKI or mortality. Results: Out of a total cohort of 306,061, 126,478 (41.0 %) patients had AKI. Among these, 17.9% lacked a diagnosis code but had AKI based on the change in SCr. Similar to patients coded for AKI, these patients had higher mortality compared to those without AKI. The incidence of AKI was highest in P1 (49.3%), reduced in P2 (40.6%), and relatively stable thereafter. Compared to the Midwest, the Northeast, South, and West had higher adjusted AKI incidence in P1, subsequently, the South and West regions continued to have the highest relative incidence. In multivariable models, AKI defined by either SCr or diagnostic code, and the severity of AKI was associated with mortality. Conclusions: Uncoded cases of COVID-19-associated AKI are common and associated with mortality. The incidence and distribution of COVID-19-associated AKI have changed since the first wave of the pandemic in the US.

20.
J Am Med Inform Assoc ; 29(4): 609-618, 2022 03 15.
Article in English | MEDLINE | ID: mdl-34590684

ABSTRACT

OBJECTIVE: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations. MATERIALS AND METHODS: We developed a pipeline for ingesting, harmonizing, and centralizing data from 56 contributing data partners using 4 federated Common Data Models. N3C data quality (DQ) review involves both automated and manual procedures. In the process, several DQ heuristics were discovered in our centralized context, both within the pipeline and during downstream project-based analysis. Feedback to the sites led to many local and centralized DQ improvements. RESULTS: Beyond well-recognized DQ findings, we discovered 15 heuristics relating to source Common Data Model conformance, demographics, COVID tests, conditions, encounters, measurements, observations, coding completeness, and fitness for use. Of 56 sites, 37 sites (66%) demonstrated issues through these heuristics. These 37 sites demonstrated improvement after receiving feedback. DISCUSSION: We encountered site-to-site differences in DQ which would have been challenging to discover using federated checks alone. We have demonstrated that centralized DQ benchmarking reveals unique opportunities for DQ improvement that will support improved research analytics locally and in aggregate. CONCLUSION: By combining rapid, continual assessment of DQ with a large volume of multisite data, it is possible to support more nuanced scientific questions with the scale and rigor that they require.


Subject(s)
COVID-19 , Cohort Studies , Data Accuracy , Health Insurance Portability and Accountability Act , Humans , United States
SELECTION OF CITATIONS
SEARCH DETAIL