Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 93
Filter
1.
Online J Public Health Inform ; 16: e53445, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38700929

ABSTRACT

BACKGROUND: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.9 for "Post COVID-19 condition, unspecified" to identify patients with long COVID-19 has provided a method of evaluating this condition in EHRs; however, the accuracy of this code is unclear. OBJECTIVE: This study aimed to characterize the utility and accuracy of the U09.9 code across 3 health care systems-the Veterans Health Administration, the Beth Israel Deaconess Medical Center, and the University of Pittsburgh Medical Center-against patients identified with long COVID-19 via a chart review by operationalizing the World Health Organization (WHO) and Centers for Disease Control and Prevention (CDC) definitions. METHODS: Patients who were COVID-19 positive with either a U07.1 ICD-10 code or positive polymerase chain reaction test within these health care systems were identified for chart review. Among this cohort, we sampled patients based on two approaches: (1) with a U09.9 code and (2) without a U09.9 code but with a new onset long COVID-19-related ICD-10 code, which allows us to assess the sensitivity of the U09.9 code. To operationalize the long COVID-19 definition based on health agency guidelines, symptoms were grouped into a "core" cluster of 11 commonly reported symptoms among patients with long COVID-19 and an extended cluster that captured all other symptoms by disease domain. Patients having ≥2 symptoms persisting for ≥60 days that were new onset after their COVID-19 infection, with ≥1 symptom in the core cluster, were labeled as having long COVID-19 per chart review. The code's performance was compared across 3 health care systems and across different time periods of the pandemic. RESULTS: Overall, 900 patient charts were reviewed across 3 health care systems. The prevalence of long COVID-19 among the cohort with the U09.9 ICD-10 code based on the operationalized WHO definition was between 23.2% and 62.4% across these health care systems. We also evaluated a less stringent version of the WHO definition and the CDC definition and observed an increase in the prevalence of long COVID-19 at all 3 health care systems. CONCLUSIONS: This is one of the first studies to evaluate the U09.9 code against a clinical case definition for long COVID-19, as well as the first to apply this definition to EHR data using a chart review approach on a nationwide cohort across multiple health care systems. This chart review approach can be implemented at other EHR systems to further evaluate the utility and performance of the U09.9 code.

2.
PLOS Digit Health ; 3(4): e0000484, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38620037

ABSTRACT

Few studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries. For adults, we used a federated learning approach whereby we ran Cox proportional hazard models locally at each healthcare system and performed a meta-analysis on the aggregated results to estimate the overall risk of adverse outcomes across our geographically diverse populations. For children, we reported descriptive statistics separately due to their low frequency of neurological involvement and poor outcomes. Among the 106,229 hospitalized COVID-19 patients (104,031 patients ≥18 years; 2,198 patients <18 years, January 2020-October 2021), 15,101 (14%) had at least one CNS diagnosis, while 2,788 (3%) had at least one PNS diagnosis. After controlling for demographics and pre-existing conditions, adults with CNS involvement had longer hospital stay (11 versus 6 days) and greater risk of (Hazard Ratio = 1.78) and faster time to death (12 versus 24 days) than patients with no neurological condition (NNC) during acute COVID-19 hospitalization. Adults with PNS involvement also had longer hospital stay but lower risk of mortality than the NNC group. Although children had a low frequency of neurological involvement during COVID-19 hospitalization, a substantially higher proportion of children with CNS involvement died compared to those with NNC (6% vs 1%). Overall, patients with concurrent CNS manifestation during acute COVID-19 hospitalization faced greater risks for adverse clinical outcomes than patients without any neurological diagnosis. Our global informatics framework using a federated approach (versus a centralized data collection approach) has utility for clinical discovery beyond COVID-19.

3.
Sci Rep ; 14(1): 8021, 2024 04 05.
Article in English | MEDLINE | ID: mdl-38580710

ABSTRACT

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.


Subject(s)
Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/drug therapy , Diabetes Mellitus, Type 2/genetics , Genetic Association Studies , Phenotype , Polymorphism, Single Nucleotide , Receptors, Interleukin-6/genetics
4.
J Am Med Inform Assoc ; 31(5): 1126-1134, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38481028

ABSTRACT

OBJECTIVE: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility. We describe the development of the Centralized Interactive Phenomics Resource (CIPHER) knowledgebase, a comprehensive public-facing phenotype library, which aims to facilitate clinical and health services research. MATERIALS AND METHODS: The platform was designed to collect and catalog EHR-based computable phenotype algorithms from any healthcare system, scale metadata management, facilitate phenotype discovery, and allow for integration of tools and user workflows. Phenomics experts were engaged in the development and testing of the site. RESULTS: The knowledgebase stores phenotype metadata using the CIPHER standard, and definitions are accessible through complex searching. Phenotypes are contributed to the knowledgebase via webform, allowing metadata validation. Data visualization tools linking to the knowledgebase enhance user interaction with content and accelerate phenotype development. DISCUSSION: The CIPHER knowledgebase was developed in the largest healthcare system in the United States and piloted with external partners. The design of the CIPHER website supports a variety of front-end tools and features to facilitate phenotype development and reuse. Health data users are encouraged to contribute their algorithms to the knowledgebase for wider dissemination to the research community, and to use the platform as a springboard for phenotyping. CONCLUSION: CIPHER is a public resource for all health data users available at https://phenomics.va.ornl.gov/ which facilitates phenotype reuse, development, and dissemination of phenotyping knowledge.


Subject(s)
Electronic Health Records , Phenomics , Phenotype , Knowledge Bases , Algorithms
5.
JAMA Netw Open ; 7(3): e243062, 2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38512255

ABSTRACT

Importance: Body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) is a commonly used estimate of obesity, which is a complex trait affected by genetic and lifestyle factors. Marked weight gain and loss could be associated with adverse biological processes. Objective: To evaluate the association between BMI variability and incident cardiovascular disease (CVD) events in 2 distinct cohorts. Design, Setting, and Participants: This cohort study used data from the Million Veteran Program (MVP) between 2011 and 2018 and participants in the UK Biobank (UKB) enrolled between 2006 and 2010. Participants were followed up for a median of 3.8 (5th-95th percentile, 3.5) years. Participants with baseline CVD or cancer were excluded. Data were analyzed from September 2022 and September 2023. Exposure: BMI variability was calculated by the retrospective SD and coefficient of variation (CV) using multiple clinical BMI measurements up to the baseline. Main Outcomes and Measures: The main outcome was incident composite CVD events (incident nonfatal myocardial infarction, acute ischemic stroke, and cardiovascular death), assessed using Cox proportional hazards modeling after adjustment for CVD risk factors, including age, sex, mean BMI, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, smoking status, diabetes status, and statin use. Secondary analysis assessed whether associations were dependent on the polygenic score of BMI. Results: Among 92 363 US veterans in the MVP cohort (81 675 [88%] male; mean [SD] age, 56.7 [14.1] years), there were 9695 Hispanic participants, 22 488 non-Hispanic Black participants, and 60 180 non-Hispanic White participants. A total of 4811 composite CVD events were observed from 2011 to 2018. The CV of BMI was associated with 16% higher risk for composite CVD across all groups (hazard ratio [HR], 1.16; 95% CI, 1.13-1.19). These associations were unchanged among subgroups and after adjustment for the polygenic score of BMI. The UKB cohort included 65 047 individuals (mean [SD] age, 57.30 (7.77) years; 38 065 [59%] female) and had 6934 composite CVD events. Each 1-SD increase in BMI variability in the UKB cohort was associated with 8% increased risk of cardiovascular death (HR, 1.08; 95% CI, 1.04-1.11). Conclusions and Relevance: This cohort study found that among US veterans, higher BMI variability was a significant risk marker associated with adverse cardiovascular events independent of mean BMI across major racial and ethnic groups. Results were consistent in the UKB for the cardiovascular death end point. Further studies should investigate the phenotype of high BMI variability.


Subject(s)
Ischemic Stroke , Myocardial Infarction , Female , Male , Humans , Middle Aged , Body Mass Index , Cohort Studies , Retrospective Studies , Myocardial Infarction/epidemiology , Cholesterol, HDL
6.
J Nutr ; 154(3): 886-895, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38163586

ABSTRACT

BACKGROUND: Red meat consumption was associated with an increased risk of cardiovascular disease (CVD) in prospective cohort studies and a profile of biomarkers favoring high CVD risk in short-term controlled trials. However, several recent systematic reviews and meta-analyses concluded with no or weak evidence for limiting red meat intake. OBJECTIVES: To prospectively examine the associations between red meat intake and incident CVD in an ongoing cohort study with diverse socioeconomic and racial or ethnic backgrounds. METHODS: Our study included 148,506 participants [17,804 female (12.0%)] who were free of cancer, diabetes, and CVD at baseline from the Million Veteran Program. A food frequency questionnaire measured red meat intakes at baseline. Nonfatal myocardial infarction and acute ischemic stroke were identified through a high-throughput phenotyping algorithm, and fatal CVD events were identified by searching the National Death Index. RESULTS: Comparing the extreme categories of intake, the multivariate-adjusted relative risks of CVD was 1.18 (95% CI: 1.01, 1.38; P-trend < 0.0001) for total red meat, 1.14 (95% CI: 0.96, 1.36; P-trend = 0.01) for unprocessed red meat, and 1.29 (95% CI: 1.04, 1.60; P-trend = 0.003) for processed red meat. We observed a more pronounced positive association between red meat intake and CVD in African American participants than in White participants (P-interaction = 0.01). Replacing 0.5 servings/d of red meat with 0.5 servings/d of nuts, whole grains, and skimmed milk was associated with 14% (RR: 0.86; 95% CI: 0.83, 0.90), 7% (RR: 0.93; 95% CI: 0.89, 0.96), and 4% (RR: 0.96; 95% CI: 0.94, 0.99) lower risks of CVD, respectively. CONCLUSIONS: Red meat consumption is associated with an increased risk of CVD. Our findings support lowering red meat intake and replacing red meat with plant-based protein sources or low-fat dairy foods as a key dietary recommendation for the prevention of CVD.


Subject(s)
Cardiovascular Diseases , Ischemic Stroke , Red Meat , Veterans , Humans , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/etiology , Prospective Studies , Cohort Studies , Ischemic Stroke/complications , Risk Factors , Diet , Meat/adverse effects , Red Meat/adverse effects
7.
Sci Rep ; 14(1): 1793, 2024 01 20.
Article in English | MEDLINE | ID: mdl-38245528

ABSTRACT

We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.


Subject(s)
Carcinoma, Renal Cell , Kidney Neoplasms , Veterans , Humans , Veterans/psychology , Retrospective Studies , Cross-Sectional Studies , Prospective Studies , Suicide, Attempted , Machine Learning
8.
Sci Rep ; 14(1): 952, 2024 01 10.
Article in English | MEDLINE | ID: mdl-38200186

ABSTRACT

Most prior studies on the prognostic significance of newly-diagnosed atrial fibrillation (AF) in COVID-19 did not differentiate newly-diagnosed AF from pre-existing AF. To determine the association between newly-diagnosed AF and in-hospital and 30-day mortality among regular users of Veterans Health Administration using data linked to Medicare. We identified Veterans aged ≥ 65 years who were hospitalized for ≥ 24 h with COVID-19 from 06/01/2020 to 1/31/2022 and had ≥ 2 primary care visits within 24 months prior to the index hospitalization. We performed multivariable logistic regression analyses to estimate adjusted risks, risk differences (RD), and odds ratios (OR) for the association between newly-diagnosed AF and the mortality outcomes adjusting for patient demographics, baseline comorbidities, and presence of acute organ dysfunction on admission. Of 23,299 patients in the study cohort, 5.3% had newly-diagnosed AF, and 29.2% had pre-existing AF. In newly-diagnosed AF adjusted in-hospital and 30-day mortality were 16.5% and 22.7%, respectively. Newly-diagnosed AF was associated with increased mortality compared to pre-existing AF (in-hospital: OR 2.02, 95% confidence interval [CI] 1.72-2.37; RD 7.58%, 95% CI 5.54-9.62) (30-day: OR 1.86; 95% CI 1.60-2.16; RD 9.04%, 95% CI 6.61-11.5) or no AF (in-hospital: OR 2.24, 95% CI 1.93-2.60; RD 8.40%, 95% CI 6.44-10.4) (30-day: 2.07, 95% CI 1.80-2.37; RD 10.2%, 95% CI 7.89-12.6). There was a smaller association between pre-existing AF and the mortality outcomes. Newly-diagnosed AF is an important prognostic marker for patients hospitalized with COVID-19. Whether prevention or treatment of AF improves clinical outcomes in these patients remains unknown.


Subject(s)
Atrial Fibrillation , COVID-19 , Veterans , Aged , United States/epidemiology , Humans , Atrial Fibrillation/diagnosis , Atrial Fibrillation/epidemiology , Prognosis , Incidence , COVID-19/epidemiology , Medicare
9.
Patterns (N Y) ; 5(1): 100906, 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38264714

ABSTRACT

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.

10.
J Am Geriatr Soc ; 72(2): 410-422, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38055194

ABSTRACT

BACKGROUND: Statins are part of long-term medical regimens for many older adults. Whether frailty modifies the protective relationship between statins, mortality, and major adverse cardiovascular events (MACE) is unknown. METHODS: This was a retrospective study of US Veterans ≥65, without CVD or prior statin use seen in 2002-2012, followed through 2017. A 31-item frailty index was used. The co-primary endpoint was all-cause mortality or MACE (MI, stroke/TIA, revascularization, or cardiovascular death). Cox proportional hazards models were developed to evaluate the association of statin use with outcomes; propensity score overlap weighting accounted for confounding by indication. RESULTS: We identified 710,313 Veterans (mean age (SD) 75.3(6.5), 98% male, 89% white); 86,327 (12.1%) were frail. Over mean follow-up of 8 (5) years, there were 48.6 and 72.6 deaths per 1000 person-years (PY) among non-frail statin-users vs nonusers (weighted Incidence Rate Difference (wIRD)/1000 person years (PY), -24.0[95% CI, -24.5 to -23.6]), and 90.4 and 130.4 deaths per 1000PY among frail statin-users vs nonusers (wIRD/1000PY, -40.0[95% CI, -41.8 to -38.2]). There were 51.7 and 60.8 MACE per 1000PY among non-frail statin-users vs nonusers (wIRD/1000PY, -9.1[95% CI, -9.7 to -8.5]), and 88.2 and 102.0 MACE per 1000PY among frail statin-users vs nonusers (wIRD/1000PY, -13.8[95% CI, -16.2 to -11.4]). There were no significant interactions by frailty for statin users vs non-users by either mortality or MACE outcomes, p-interaction 0.770 and 0.319, respectively. Statin use was associated with lower risk of all-cause mortality (HR, 0.61 (0.60-0.61)) and MACE (HR 0.86 (0.85-0.87)). CONCLUSIONS: New statin use is associated with a lower risk of mortality and MACE, independent of frailty. These findings should be confirmed in a randomized clinical trial.


Subject(s)
Cardiovascular Diseases , Frailty , Hydroxymethylglutaryl-CoA Reductase Inhibitors , Stroke , Veterans , Aged , Female , Humans , Male , Cardiovascular Diseases/epidemiology , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Retrospective Studies , Stroke/epidemiology
11.
JAMA Netw Open ; 6(12): e2346783, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38064215

ABSTRACT

Importance: A significant proportion of SARS-CoV-2 infected individuals experience post-COVID-19 condition months after initial infection. Objective: To determine the rates, clinical setting, risk factors, and symptoms associated with the documentation of International Statistical Classification of Diseases Tenth Revision (ICD-10), code U09.9 for post-COVID-19 condition after acute infection. Design, Setting, and Participants: This retrospective cohort study was performed within the US Department of Veterans Affairs (VA) health care system. Veterans with a positive SARS-CoV-2 test result between October 1, 2021, the date ICD-10 code U09.9 was introduced, and January 31, 2023 (n = 388 980), and a randomly selected subsample of patients with the U09.9 code (n = 350) whose symptom prevalence was assessed by systematic medical record review, were included in the analysis. Exposure: Positive SARS-CoV-2 test result. Main Outcomes and Measures: Rates, clinical setting, risk factors, and symptoms associated with ICD-10 code U09.9 in the medical record. Results: Among the 388 980 persons with a positive SARS-CoV-2 test, the mean (SD) age was 61.4 (16.1) years; 87.3% were men. In terms of race and ethnicity, 0.8% were American Indian or Alaska Native, 1.4% were Asian, 20.7% were Black, 9.3% were Hispanic or Latino, 1.0% were Native Hawaiian or Other Pacific Islander; and 67.8% were White. Cumulative incidence of U09.9 documentation was 4.79% (95% CI, 4.73%-4.87%) at 6 months and 5.28% (95% CI, 5.21%-5.36%) at 12 months after infection. Factors independently associated with U09.9 documentation included older age, female sex, Hispanic or Latino ethnicity, comorbidity burden, and severe acute infection manifesting by symptoms, hospitalization, or ventilation. Primary vaccination (adjusted hazard ratio [AHR], 0.80 [95% CI, 0.78-0.83]) and booster vaccination (AHR, 0.66 [95% CI, 0.64-0.69]) were associated with a lower likelihood of U09.9 documentation. Marked differences by geographic region and facility in U09.9 code documentation may reflect local screening and care practices. Among the 350 patients undergoing systematic medical record review, the most common symptoms documented in the medical records among patients with the U09.9 code were shortness of breath (130 [37.1%]), fatigue or exhaustion (78 [22.3%]), cough (63 [18.0%]), reduced cognitive function or brain fog (22 [6.3%]), and change in smell and/or taste (20 [5.7%]). Conclusions and Relevance: In this cohort study of 388 980 veterans, documentation of ICD-10 code U09.9 had marked regional and facility-level variability. Strong risk factors for U09.9 documentation were identified, while vaccination appeared to be protective. Accurate and consistent documentation of U09.9 is needed to maximize its utility in tracking patients for clinical care and research. Future studies should examine the long-term trajectory of individuals with U09.9 documentation.


Subject(s)
COVID-19 , SARS-CoV-2 , Male , Humans , Female , Middle Aged , COVID-19/epidemiology , Cohort Studies , Retrospective Studies , International Classification of Diseases , Post-Acute COVID-19 Syndrome , Chronic Disease
12.
J Am Heart Assoc ; 12(21): e030496, 2023 11 07.
Article in English | MEDLINE | ID: mdl-37889207

ABSTRACT

Background The lipid hypothesis postulates that lower blood cholesterol is associated with reduced coronary heart disease (CHD) risk, which has been challenged by reports of a U-shaped relation between cholesterol and death in recent studies. We sought to examine whether the U-shaped relationship is true and to assess the impact of age on this association. Method and Results We conducted a prospective cohort study of 4 467 942 veterans aged >18 years, with baseline outpatient visits from 2002 to 2007 and follow-up to December 30, 2018, in the Veterans Health Administration electronic health record system. We observed a J-shaped relation between total cholesterol (TC) and CHD mortality after a comprehensive adjustment of confounding factors: flat for TC <180 mg/dL, and greater risk was present at higher cholesterol levels. Compared with veterans with TC between 180 and 199 mg/dL, the multiadjusted hazard ratios (HRs) for CHD death were 1.03 (95% CI, 1.02-1.04), 1.07 (95% CI, 1.06-1.09), 1.15 (95% CI, 1.13-1.18), 1.25 (95% CI, 1.22-1.28), and 1.45 (95% CI, 1.42-1.49) times greater among veterans with TC (mg/dL) of 200 to 219, 220 to 239, 140 to 259, 260 to 279 and ≥280, respectively. Similar J-shaped TC-CHD mortality patterns were observed among veterans with and without statin use at or before baseline. Conclusions The cholesterol paradox, for example, higher CHD death in patients with a low cholesterol level, was a reflection of reverse causality, especially among older participants. Our results support the lipid hypothesis that lower blood cholesterol is associated with reduced CHD. Furthermore, the hypothesis remained true when TC was low due to use of statins or other lipid-lowering medication.


Subject(s)
Coronary Disease , Hydroxymethylglutaryl-CoA Reductase Inhibitors , Veterans , Humans , Prospective Studies , Risk Factors , Cholesterol , Cholesterol, HDL
13.
medRxiv ; 2023 Oct 02.
Article in English | MEDLINE | ID: mdl-37873131

ABSTRACT

Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.

14.
medRxiv ; 2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37662265

ABSTRACT

Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10-9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.

15.
medRxiv ; 2023 Jun 29.
Article in English | MEDLINE | ID: mdl-37425708

ABSTRACT

Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P<4.6×10-11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.

16.
medRxiv ; 2023 May 21.
Article in English | MEDLINE | ID: mdl-37293026

ABSTRACT

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes, covering hundreds of thousands of clinical concepts available for research and clinical care. The complex, massive, heterogeneous, and noisy nature of EHR data imposes significant challenges for feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features. Methods: The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs. We validated the clinical utility of the ARCH knowledge graph, generated from 12.5 million patients in the Veterans Affairs (VA) healthcare system, through downstream tasks including detecting known relationships between entity pairs, predicting drug side effects, disease phenotyping, as well as sub-typing Alzheimer's disease patients. Results: ARCH produces high-quality clinical embeddings and KG for over 60,000 EHR concepts, as visualized in the R-shiny powered web-API (https://celehs.hms.harvard.edu/ARCH/). The ARCH embeddings attained an average area under the ROC curve (AUC) of 0.926 and 0.861 for detecting pairs of similar EHR concepts when the concepts are mapped to codified data and to NLP data; and 0.810 (codified) and 0.843 (NLP) for detecting related pairs. Based on the p-values computed by ARCH, the sensitivity of detecting similar and related entity pairs are 0.906 and 0.888 under false discovery rate (FDR) control of 5%. For detecting drug side effects, the cosine similarity based on the ARCH semantic representations achieved an AUC of 0.723 while the AUC improved to 0.826 after few-shot training via minimizing the loss function on the training data set. Incorporating NLP data substantially improved the ability to detect side effects in the EHR. For example, based on unsupervised ARCH embeddings, the power of detecting drug-side effects pairs when using codified data only was 0.15, much lower than the power of 0.51 when using both codified and NLP concepts. Compared to existing large-scale representation learning methods including PubmedBERT, BioBERT and SAPBERT, ARCH attains the most robust performance and substantially higher accuracy in detecting these relationships. Incorporating ARCH selected features in weakly supervised phenotyping algorithms can improve the robustness of algorithm performance, especially for diseases that benefit from NLP features as supporting evidence. For example, the phenotyping algorithm for depression attained an AUC of 0.927 when using ARCH selected features but only 0.857 when using codified features selected via the KESER network[1]. In addition, embeddings and knowledge graphs generated from the ARCH network were able to cluster AD patients into two subgroups, where the fast progression subgroup had a much higher mortality rate. Conclusions: The proposed ARCH algorithm generates large-scale high-quality semantic representations and knowledge graph for both codified and NLP EHR features, useful for a wide range of predictive modeling tasks.

17.
JAMA Cardiol ; 8(6): 564-574, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37133828

ABSTRACT

Importance: Primary prevention of atherosclerotic cardiovascular disease (ASCVD) relies on risk stratification. Genome-wide polygenic risk scores (PRSs) are proposed to improve ASCVD risk estimation. Objective: To determine whether genome-wide PRSs for coronary artery disease (CAD) and acute ischemic stroke improve ASCVD risk estimation with traditional clinical risk factors in an ancestrally diverse midlife population. Design, Setting, and Participants: This was a prognostic analysis of incident events in a retrospectively defined longitudinal cohort conducted from January 1, 2011, to December 31, 2018. Included in the study were adults free of ASCVD and statin naive at baseline from the Million Veteran Program (MVP), a mega biobank with genetic, survey, and electronic health record data from a large US health care system. Data were analyzed from March 15, 2021, to January 5, 2023. Exposures: PRSs for CAD and ischemic stroke derived from cohorts of largely European descent and risk factors, including age, sex, systolic blood pressure, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking, and diabetes status. Main Outcomes and Measures: Incident nonfatal myocardial infarction (MI), ischemic stroke, ASCVD death, and composite ASCVD events. Results: A total of 79 151 participants (mean [SD] age, 57.8 [13.7] years; 68 503 male [86.5%]) were included in the study. The cohort included participants from the following harmonized genetic ancestry and race and ethnicity categories: 18 505 non-Hispanic Black (23.4%), 6785 Hispanic (8.6%), and 53 861 non-Hispanic White (68.0%) with a median (5th-95th percentile) follow-up of 4.3 (0.7-6.9) years. From 2011 to 2018, 3186 MIs (4.0%), 1933 ischemic strokes (2.4%), 867 ASCVD deaths (1.1%), and 5485 composite ASCVD events (6.9%) were observed. CAD PRS was associated with incident MI in non-Hispanic Black (hazard ratio [HR], 1.10; 95% CI, 1.02-1.19), Hispanic (HR, 1.26; 95% CI, 1.09-1.46), and non-Hispanic White (HR, 1.23; 95% CI, 1.18-1.29) participants. Stroke PRS was associated with incident stroke in non-Hispanic White participants (HR, 1.15; 95% CI, 1.08-1.21). A combined CAD plus stroke PRS was associated with ASCVD deaths among non-Hispanic Black (HR, 1.19; 95% CI, 1.03-1.17) and non-Hispanic (HR, 1.11; 95% CI, 1.03-1.21) participants. The combined PRS was also associated with composite ASCVD across all ancestry groups but greater among non-Hispanic White (HR, 1.20; 95% CI, 1.16-1.24) than non-Hispanic Black (HR, 1.11; 95% CI, 1.05-1.17) and Hispanic (HR, 1.12; 95% CI, 1.00-1.25) participants. Net reclassification improvement from adding PRS to a traditional risk model was modest for the intermediate risk group for composite CVD among men (5-year risk >3.75%, 0.38%; 95% CI, 0.07%-0.68%), among women, (6.79%; 95% CI, 3.01%-10.58%), for age older than 55 years (0.25%; 95% CI, 0.03%-0.47%), and for ages 40 to 55 years (1.61%; 95% CI, -0.07% to 3.30%). Conclusions and Relevance: Study results suggest that PRSs derived predominantly in European samples were statistically significantly associated with ASCVD in the multiancestry midlife and older-age MVP cohort. Overall, modest improvement in discrimination metrics were observed with addition of PRSs to traditional risk factors with greater magnitude in women and younger age groups.


Subject(s)
Atherosclerosis , Cardiovascular Diseases , Coronary Artery Disease , Ischemic Stroke , Myocardial Infarction , Stroke , Veterans , Adult , Humans , Male , Female , Middle Aged , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics , Retrospective Studies , Risk Assessment/methods , Risk Factors , Coronary Artery Disease/epidemiology , Coronary Artery Disease/genetics , Atherosclerosis/epidemiology , Myocardial Infarction/epidemiology , Stroke/epidemiology , Cholesterol
19.
J Am Med Inform Assoc ; 30(5): 958-964, 2023 04 19.
Article in English | MEDLINE | ID: mdl-36882092

ABSTRACT

The development of phenotypes using electronic health records is a resource-intensive process. Therefore, the cataloging of phenotype algorithm metadata for reuse is critical to accelerate clinical research. The Department of Veterans Affairs (VA) has developed a standard for phenotype metadata collection which is currently used in the VA phenomics knowledgebase library, CIPHER (Centralized Interactive Phenomics Resource), to capture over 5000 phenotypes. The CIPHER standard improves upon existing phenotype library metadata collection by capturing the context of algorithm development, phenotyping method used, and approach to validation. While the standard was iteratively developed with VA phenomics experts, it is applicable to the capture of phenotypes across healthcare systems. We describe the framework of the CIPHER standard for phenotype metadata collection, the rationale for its development, and its current application to the largest healthcare system in the United States.


Subject(s)
Electronic Health Records , Phenomics , United States , Phenotype , Algorithms , Metadata
20.
EBioMedicine ; 90: 104536, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36989840

ABSTRACT

BACKGROUND: Genome-wide association studies (GWAS) for obstructive sleep apnoea (OSA) are limited due to the underdiagnosis of OSA, leading to misclassification of OSA, which consequently reduces statistical power. We performed a GWAS of OSA in the Million Veteran Program (MVP) of the U.S. Department of Veterans Affairs (VA) healthcare system, where OSA prevalence is close to its true population prevalence. METHODS: We performed GWAS of 568,576 MVP participants, stratified by biological sex and by harmonized race/ethnicity and genetic ancestry (HARE) groups of White, Black, Hispanic, and Asian individuals. We considered both BMI adjusted (BMI-adj) and unadjusted (BMI-unadj) models. We replicated associations in independent datasets, and analysed the heterogeneity of OSA genetic associations across HARE and sex groups. We finally performed a larger meta-analysis GWAS of MVP, FinnGen, and the MGB Biobank, totalling 916,696 individuals. FINDINGS: MVP participants are 91% male. OSA prevalence is 21%. In MVP there were 18 and 6 genome-wide significant loci in BMI-unadj and BMI-adj analyses, respectively, corresponding to 21 association regions. Of these, 17 were not previously reported in association with OSA, and 13 replicated in FinnGen (False Discovery Rate p-value < 0.05). There were widespread significant differences in genetic effects between men and women, but less so across HARE groups. Meta-analysis of MVP, FinnGen, and MGB biobank revealed 17 additional, previously unreported, genome-wide significant regions. INTERPRETATION: Sex differences in genetic associations with OSA are widespread, likely associated with multiple OSA risk factors. OSA shares genetic underpinnings with several sleep phenotypes, suggesting shared aetiology and causal pathways. FUNDING: Described in acknowledgements.


Subject(s)
Hares , Sleep Apnea, Obstructive , Veterans , Humans , Animals , Male , Female , Genome-Wide Association Study , Genetic Heterogeneity , Hares/genetics , Sleep Apnea, Obstructive/epidemiology , Sleep Apnea, Obstructive/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...