RESUMO
In a recent study by Zhao et al., rare protein-truncating variants (PTVs) in the BSN and APBA1 genes showed effects on obesity that exceeded those of well-known genes such as MC4R in a UK cohort. In this study, we leveraged the All of Us Research Program, to investigate the association of predicted LoF (pLoF) PTVs in BSN and APBA1 with body mass index (BMI) across a population of diverse ancestry. Our analysis revealed that the impact of pLoF variants in BSN and APBA1 on BMI was notably greater in this cohort, especially among individuals of European ancestry. Additionally, a phenome-wide association study (PheWAS) using the extensive phenotypic data available in the All of Us Research Program uncovered novel associations of BSN and APBA1 heterozygous pLoF carriers with various phenotypes. Specifically, BSN pLoF variants were associated with pulmonary hypertension, atrial fibrillation, and anticoagulant use, while APBA1 pLoF variants were linked to disorders of the temporomandibular joint. These findings underscore the potential of large-scale biobanks in advancing genetic discovery.
RESUMO
Background: Genetic variation in APOE is associated with altered lipid metabolism, as well as cardiovascular and neurodegenerative disease risk. However, prior studies are largely limited to European ancestry populations and differential risk by sex and ancestry has not been widely evaluated. We utilized a phenome-wide association study (PheWAS) approach to explore APOE-associated phenotypes in the All of Us Research Program. Methods: We determined APOE alleles for 181,880 All of Us participants with whole genome sequencing and electronic health record (EHR) data, representing seven gnomAD ancestry groups. We tested association of APOE variants, ordered based on Alzheimer's disease risk hierarchy (ε2/ε2<ε2/ε3<ε3/ε3<ε2/ε4<ε3/ε4<ε4/ε4), with 2,318 EHR-derived phenotypes. Bonferroni-adjusted analyses were performed overall, by ancestry, by sex, and with adjustment for social determinants of health (SDOH). Findings: In the overall cohort, PheWAS identified 17 significant associations, including an increased odds of hyperlipidemia (OR 1.15 [1.14-1.16] per APOE genotype group; P=1.8×10-129), dementia, and Alzheimer's disease (OR 1.55 [1.40-1.70]; P=5×10-19), and a reduced odds of fatty liver disease (OR 0.93 [0.90-0.95]; P=1.6×10-9) and chronic liver disease. ORs were similar after SDOH adjustment and by sex, except for an increased number of cardiovascular associations in males, and decreased odds of noninflammatory disorders of vulva and perineum in females (OR 0.89 [0.84-0.94]; P=1.1×10-5). Significant heterogeneity was observed for hyperlipidemia and mild cognitive impairment across ancestry. Unique associations by ancestry included transient retinal arterial occlusion in the European ancestry group, and first-degree atrioventricular block in the American Admixed/Latino ancestry group. Interpretation: We replicate extensive phenotypic associations with APOE alleles in a large, diverse cohort, despite limitations in accuracy for EHR-derived phenotypes. We provide a comprehensive catalog of APOE-associated phenotypes and present evidence of unique phenotypic associations by sex and ancestry, as well as heterogeneity in effect size across ancestry.
RESUMO
Combining information from multiple GWASs for a disease and its risk factors has proven a powerful approach for development of polygenic risk scores (PRSs). This may be particularly useful for type 2 diabetes (T2D), a highly polygenic and heterogeneous disease where the additional predictive value of a PRS is unclear. Here, we use a meta-scoring approach to develop a metaPRS for T2D that incorporated genome-wide associations from both European and non-European genetic ancestries and T2D risk factors. We evaluated the performance of this metaPRS and benchmarked it against existing genome-wide PRS in 620,059 participants and 50,572 T2D cases amongst six diverse genetic ancestries from UK Biobank, INTERVAL, the All of Us Research Program, and the Singapore Multi-Ethnic Cohort. We show that our metaPRS was the most powerful PRS for predicting T2D in European population-based cohorts and had comparable performance to the top ancestry-specific PRS, highlighting its transferability. In UK Biobank, we show the metaPRS had stronger predictive power for 10-year risk than all individual risk factors apart from BMI and biomarkers of dysglycemia. The metaPRS modestly improved T2D risk stratification of QDiabetes risk scores for 10-year risk prediction, particularly when prioritising individuals for blood tests of dysglycemia. Overall, we present a highly predictive and transferrable PRS for T2D and demonstrate that the potential for PRS to incrementally improve T2D risk prediction when incorporated into UK guideline-recommended screening and risk prediction with a clinical risk score.
RESUMO
Pharmacogenomics promises improved outcomes through individualized prescribing. However, the lack of diversity in studies impedes clinical translation and equitable application of precision medicine. We evaluated the frequencies of PGx variants, predicted phenotypes, and medication exposures using whole genome sequencing and EHR data from nearly 100k diverse All of Us Research Program participants. We report 100% of participants carried at least one pharmacogenomics variant and nearly all (99.13%) had a predicted phenotype with prescribing recommendations. Clinical impact was high with over 20% having both an actionable phenotype and a prior exposure to an impacted medication with pharmacogenomic prescribing guidance. Importantly, we also report hundreds of alleles and predicted phenotypes that deviate from known frequencies and/or were previously unreported, including within admixed American and African ancestry groups.
RESUMO
Variability in drug effectiveness and provider prescribing patterns have been reported in different racial and ethnic populations. We sought to evaluate antihypertensive drug effectiveness and prescribing patterns among self-identified Hispanic/Latino (Hispanic), Non-Hispanic Black (Black), and Non-Hispanic White (White) populations that enrolled in the NIH All of Us Research Program, a US longitudinal cohort. We employed a self-controlled case study method using electronic health record and survey data from 17,718 White, Hispanic, and Black participants who were diagnosed with essential hypertension and prescribed at least one of 19 commonly used antihypertensive medications. Effectiveness was determined by calculating the reduction in systolic blood pressure measurements after 28 or more days of drug exposure. Starting systolic blood pressure and effectiveness for each medication were compared for self-reported Black, Hispanic, and White participants using adjusted linear regressions. Black and Hispanic participants were started on antihypertensive medications at significantly higher SBP than White participants in 13 and 7 out of 19 medications, respectively. More Black participants were prescribed multiple antihypertensive medications (58.46%) than White (52.35%) or Hispanic (49.9%) participants. First-line HTN medications differed by race and ethnicity. Following the 2017 American College of Cardiology and the American Heart Association High Blood Pressure Guideline release, around 64% of Black participants were prescribed a recommended first-line antihypertensive drug compared with 76% of White and 82% of Hispanic participants. Effect sizes suggested that most antihypertensive drugs were less effective in Hispanic and Black, compared with White, participants, and statistical significance was reached in 6 out of 19 drugs. These results indicate that Black and Hispanic populations may benefit from earlier intervention and screening and highlight the potential benefits of personalizing first-line medications.
RESUMO
The Phenome-wide association studies (PheWAS) have become widely used for efficient, high-throughput evaluation of relationship between a genetic factor and a large number of disease phenotypes, typically extracted from a DNA biobank linked with electronic medical records (EMR). Phecodes, billing code-derived disease case-control status, are usually used as outcome variables in PheWAS and logistic regression has been the standard choice of analysis method. Since the clinical diagnoses in EMR are often inaccurate with errors which can lead to biases in the odds ratio estimates, much effort has been put to accurately define the cases and controls to ensure an accurate analysis. Specifically in order to correctly classify controls in the population, an exclusion criteria list for each Phecode was manually compiled to obtain unbiased odds ratios. However, the accuracy of the list cannot be guaranteed without extensive data curation process. The costly curation process limits the efficiency of large-scale analyses that take full advantage of all structured phenotypic information available in EMR. Here, we proposed to estimate relative risks (RR) instead. We first demonstrated the desired nature of RR that overcomes the inaccuracy in the controls via theoretical formula. With simulation and real data application, we further confirmed that RR is unbiased without compiling exclusion criteria lists. With RR as estimates, we are able to efficiently extend PheWAS to a larger-scale, phenome construction agnostic analysis of phenotypes, using ICD 9/10 codes, which preserve much more disease-related clinical information than Phecodes.
RESUMO
Clostridioides difficile infection causes pathology that ranges in severity from diarrhea to pseudomembranous colitis. Toxin A and Toxin B are the two primary virulence factors secreted by C. difficile that drive disease severity. The toxins damage intestinal epithelial cells leading to a loss of barrier integrity and induction of a proinflammatory host response. Monoclonal antibodies (mAbs) that neutralize Toxin A and Toxin B, actoxumab and bezlotoxumab, respectively, significantly reduce disease severity in a murine model of C. difficile infection. However, the impact of toxin neutralization on the induction and quality of the innate immune response following infection is unknown. The goal of this study was to define the quality of the host innate immune response in the context of anti-toxin mAbs therapy. At day 2 post-infection, C. difficile-infected, mAbs-treated mice had significantly less disease compared to isotype-treated mice despite remaining colonized with C. difficile. C. difficile-infected mAbs-treated mice still exhibited marked neutrophil infiltration and induction of a subset of proinflammatory cytokines within the intestinal lamina propria following infection that is comparable to isotype-treated mice. Furthermore, both mAbs and isotype-treated mice had an increase in IL-22-producing ILCs in the intestine following infection. MAbs-treated mice exhibited increased infiltration of eosinophils in the intestinal lamina propria, which has been previously reported to promote a protective host response following C. difficile infection. These findings show that activation of host protective mechanisms remain intact in the context of monoclonal antibody-mediated toxin neutralization.
Assuntos
Anticorpos Monoclonais , Toxinas Bacterianas , Clostridioides difficile , Infecções por Clostridium , Imunidade Inata , Animais , Toxinas Bacterianas/imunologia , Anticorpos Monoclonais/imunologia , Anticorpos Monoclonais/farmacologia , Clostridioides difficile/imunologia , Infecções por Clostridium/imunologia , Infecções por Clostridium/prevenção & controle , Infecções por Clostridium/microbiologia , Camundongos , Enterotoxinas/imunologia , Modelos Animais de Doenças , Anticorpos Neutralizantes/imunologia , Proteínas de Bactérias/imunologia , Feminino , Citocinas/metabolismo , Anticorpos Amplamente Neutralizantes/imunologia , Camundongos Endogâmicos C57BLRESUMO
Summary: With the rapid growth of genetic data linked to electronic health record data in huge cohorts, large-scale phenome-wide association study (PheWAS), have become powerful discovery tools in biomedical research. PheWAS is an analysis method to study phenotype associations utilizing longitudinal electronic health record (EHR) data. Previous PheWAS packages were developed mostly in the days of smaller biobanks and with earlier PheWAS approaches. PheTK was designed to simplify analysis and efficiently handle biobank-scale data. PheTK uses multithreading and supports a full PheWAS workflow including extraction of data from OMOP databases and Hail matrix tables as well as PheWAS analysis for both phecode version 1.2 and phecodeX. Benchmarking results showed PheTK took 64% less time than the R PheWAS package to complete the same workflow. PheTK can be run locally or on cloud platforms such as the All of Us Researcher Workbench ( All of Us ) or the UK Biobank (UKB) Research Analysis Platform (RAP). Availability and implementation: The PheTK package is freely available on the Python Package Index (PyPi) and on GitHub under GNU Public License (GPL-3) at https://github.com/nhgritctran/PheTK . It is implemented in Python and platform independent. The demonstration workspace for All of Us will be made available in the future as a featured workspace. Contact: PheTK@mail.nih.gov.
RESUMO
Genome-wide association studies (GWAS) have been instrumental in identifying genetic associations for various diseases and traits. However, uncovering genetic underpinnings among traits beyond univariate phenotype associations remains a challenge. Multi-phenotype associations (MPA), or genetic pleiotropy, offer important insights into shared genes and pathways among traits, enhancing our understanding of genetic architectures of complex diseases. GWAS of biobank-linked electronic health record (EHR) data are increasingly being utilized to identify MPA among various traits and diseases. However, methodologies that can efficiently take advantage of distributed EHR to detect MPA are still lacking. Here, we introduce mixWAS, a novel algorithm that efficiently and losslessly integrates multiple EHRs via summary statistics, allowing the detection of MPA among mixed phenotypes while accounting for heterogeneities across EHRs. Simulations demonstrate that mixWAS outperforms the widely used MPA detection method, Phenome-wide association study (PheWAS), across diverse scenarios. Applying mixWAS to data from seven EHRs in the US, we identified 4,534 MPA among blood lipids, BMI, and circulatory diseases. Validation in an independent EHR data from UK confirmed 97.7% of the associations. mixWAS fundamentally improves the detection of MPA and is available as a free, open-source software.
RESUMO
The National Institutes of Health's All of Us Research Program is an accessible platform that hosts genomic and phenotypic data to be collected from 1 million participants in the United States. Its mission is to accelerate medical research and clinical breakthroughs with a special emphasis on diversity.
Assuntos
Pesquisa Biomédica , Saúde da População , Humanos , Estados Unidos , Ciência de Dados , National Institutes of Health (U.S.)RESUMO
Malaria is caused by Plasmodium species and remains a significant cause of morbidity and mortality globally. Gut bacteria can influence the severity of malaria, but the contribution of specific bacteria to the risk of severe malaria is unknown. Here, multiomics approaches demonstrate that specific species of Bacteroides are causally linked to the risk of severe malaria. Plasmodium yoelii hyperparasitemia-resistant mice gavaged with murine-isolated Bacteroides fragilis develop P. yoelii hyperparasitemia. Moreover, Bacteroides are significantly more abundant in Ugandan children with severe malarial anemia than with asymptomatic P. falciparum infection. Human isolates of Bacteroides caccae, Bacteroides uniformis, and Bacteroides ovatus were able to cause susceptibility to severe malaria in mice. While monocolonization of germ-free mice with Bacteroides alone is insufficient to cause susceptibility to hyperparasitemia, meta-analysis across multiple studies support a main role for Bacteroides in susceptibility to severe malaria. Approaches that target gut Bacteroides present an opportunity to prevent severe malaria and associated deaths.
Assuntos
Anemia , Malária , Plasmodium yoelii , Criança , Humanos , Animais , Camundongos , Consórcios Microbianos , Bacteroides/genética , Bacteroides fragilis , Anemia/etiologiaRESUMO
Clostridioides difficile (C. diff.) infection (CDI) is a leading cause of hospital acquired diarrhea in North America and Europe and a major cause of morbidity and mortality. Known risk factors do not fully explain CDI susceptibility, and genetic susceptibility is suggested by the fact that some patients with colons that are colonized with C. diff. do not develop any infection while others develop severe or recurrent infections. To identify common genetic variants associated with CDI, we performed a genome-wide association analysis in 19,861 participants (1349 cases; 18,512 controls) from the Electronic Medical Records and Genomics (eMERGE) Network. Using logistic regression, we found strong evidence for genetic variation in the DRB locus of the MHC (HLA) II region that predisposes individuals to CDI (P > 1.0 × 10-14; OR 1.56). Altered transcriptional regulation in the HLA region may play a role in conferring susceptibility to this opportunistic enteric pathogen.
Assuntos
Infecções por Clostridium , Estudo de Associação Genômica Ampla , Humanos , Infecções por Clostridium/genética , Diarreia , Antígenos de Histocompatibilidade , Antígenos HLA/genética , Antígenos de Histocompatibilidade Classe II , Variação GenéticaRESUMO
OBJECTIVE: The All of Us Research Program (All of Us) aims to recruit over a million participants to further precision medicine. Essential to the verification of biobanks is a replication of known associations to establish validity. Here, we evaluated how well All of Us data replicated known cigarette smoking associations. MATERIALS AND METHODS: We defined smoking exposure as follows: (1) an EHR Smoking exposure that used International Classification of Disease codes; (2) participant provided information (PPI) Ever Smoking; and, (3) PPI Current Smoking, both from the lifestyle survey. We performed a phenome-wide association study (PheWAS) for each smoking exposure measurement type. For each, we compared the effect sizes derived from the PheWAS to published meta-analyses that studied cigarette smoking from PubMed. We defined two levels of replication of meta-analyses: (1) nominally replicated: which required agreement of direction of effect size, and (2) fully replicated: which required overlap of confidence intervals. RESULTS: PheWASes with EHR Smoking, PPI Ever Smoking, and PPI Current Smoking revealed 736, 492, and 639 phenome-wide significant associations, respectively. We identified 165 meta-analyses representing 99 distinct phenotypes that could be matched to EHR phenotypes. At P < .05, 74 were nominally replicated and 55 were fully replicated. At P < 2.68 × 10-5 (Bonferroni threshold), 58 were nominally replicated and 40 were fully replicated. DISCUSSION: Most phenotypes found in published meta-analyses associated with smoking were nominally replicated in All of Us. Both survey and EHR definitions for smoking produced similar results. CONCLUSION: This study demonstrated the feasibility of studying common exposures using All of Us data.
Assuntos
Estudo de Associação Genômica Ampla , Saúde da População , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , FumarRESUMO
Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.
Assuntos
Estudo de Associação Genômica Ampla , Saúde da População , Humanos , Genômica , Políticas , LipídeosRESUMO
Alpha-1 antitrypsin deficiency (AATD), a relatively common autosomal recessive genetic disorder, is underdiagnosed in symptomatic individuals. We sought to compare the risk of liver transplantation associated with hepatitis C infection with AATD heterozygotes and homozygotes and determine if SERPINA1 sequencing would identify undiagnosed AATD. We performed a retrospective cohort study in a deidentified Electronic Health Record (EHR)-linked DNA biobank with 72,027 individuals genotyped for the M, Z, and S alleles in SERPINA1. We investigated liver transplantation frequency by genotype group and compared with hepatitis C infection. We performed SERPINA1 sequencing in carriers of pathogenic AATD alleles who underwent liver transplantation. Liver transplantation was associated with the Z allele (ZZ: odds ratio [OR] = 1.31, p<2e-16; MZ: OR = 1.02, p = 1.2e-13) and with hepatitis C (OR = 1.20, p<2e-16). For liver transplantation, there was a significant interaction between genotype and hepatitis C (ZZ: interaction OR = 1.23, p = 4.7e-4; MZ: interaction OR = 1.11, p = 6.9e-13). Sequencing uncovered a second, rare, pathogenic SERPINA1 variant in six of 133 individuals with liver transplants and without hepatitis C. Liver transplantation was more common in individuals with AATD risk alleles (including heterozygotes), and AATD and hepatitis C demonstrated evidence of a gene-environment interaction in relation to liver transplantation. The current AATD screening strategy may miss diagnoses whereas SERPINA1 sequencing may increase diagnostic yield for AATD, stratify risk for liver disease, and inform clinical management for individuals with AATD risk alleles and liver disease risk factors.
Assuntos
Hepatite C , Deficiência de alfa 1-Antitripsina , Humanos , Alelos , Interação Gene-Ambiente , Estudos Retrospectivos , Deficiência de alfa 1-Antitripsina/diagnóstico , Deficiência de alfa 1-Antitripsina/genética , Hepatite C/genética , Hepacivirus/genética , Genética Populacional , alfa 1-Antitripsina/genéticaRESUMO
The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.
Assuntos
Pesquisa Biomédica , Saúde da População , Humanos , Ecossistema , Medicina de PrecisãoRESUMO
PURPOSE: Automated use of electronic health records may aid in decreasing the diagnostic delay for rare diseases. The phenotype risk score (PheRS) is a weighted aggregate of syndromically related phenotypes that measures the similarity between an individual's conditions and features of a disease. For some diseases, there are individuals without a diagnosis of that disease who have scores similar to diagnosed patients. These individuals may have that disease but not yet be diagnosed. METHODS: We calculated the PheRS for cystic fibrosis (CF) for 965,626 subjects in the Vanderbilt University Medical Center electronic health record. RESULTS: Of the 400 subjects with the highest PheRS for CF, 248 (62%) had been diagnosed with CF. Twenty-six of the remaining participants, those who were alive and had DNA available in the linked DNA biobank, underwent clinical review and sequencing analysis of CFTR and SERPINA1. This uncovered a potential diagnosis for 2 subjects, 1 with CF and 1 with alpha-1-antitrypsin deficiency. An additional 7 subjects had pathogenic or likely pathogenic variants, 2 in CFTR and 5 in SERPINA1. CONCLUSION: These findings may be clinically actionable for the providers caring for these patients. Importantly, this study highlights feasibility and challenges for future implications of this approach.