Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
Nat Commun ; 15(1): 4417, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38789417

ABSTRACT

Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.


Subject(s)
Genome-Wide Association Study , Telomere Homeostasis , Telomere , Humans , Telomere/genetics , Telomere/metabolism , K562 Cells , Telomere Homeostasis/genetics , Polymorphism, Single Nucleotide , Gene Expression Regulation , CRISPR-Cas Systems
2.
Alzheimers Dement (N Y) ; 10(1): e12462, 2024.
Article in English | MEDLINE | ID: mdl-38500778

ABSTRACT

INTRODUCTION: Alzheimer's disease (AD) is a complex disease influenced by genetics and environment. More than 75 susceptibility loci have been linked to late-onset AD, but most of these loci were discovered in genome-wide association studies (GWAS) exclusive to non-Hispanic White individuals. There are wide disparities in AD risk across racially stratified groups, and while these disparities are not due to genetic differences, underrepresentation in genetic research can further exacerbate and contribute to their persistence. We investigated the racial/ethnic representation of participants in United States (US)-based AD genetics and the statistical implications of current representation. METHODS: We compared racial/ethnic data of participants from array and sequencing studies in US AD genetics databases, including National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) and NIAGADS Data Sharing Service (dssNIAGADS), to AD and related dementia (ADRD) prevalence and mortality. We then simulated the statistical power of these datasets to identify risk variants from non-White populations. RESULTS: There is insufficient statistical power (probability <80%) to detect single nucleotide polymorphisms (SNPs) with low to moderate effect sizes (odds ratio [OR]<1.5) using array data from Black and Hispanic participants; studies of Asian participants are not powered to detect variants OR <= 2. Using available and projected sequencing data from Black and Hispanic participants, risk variants with OR = 1.2 are detectable at high allele frequencies. Sample sizes remain insufficiently powered to detect these variants in Asian populations. DISCUSSION: AD genetics datasets are largely representative of US ADRD burden. However, there is a wide discrepancy between proportional representation and statistically meaningful representation. Most variation identified in GWAS of non-Hispanic White individuals have low to moderate effects. Comparable risk variants in non-White populations are not detectable given current sample sizes, which could lead to disparities in future studies and drug development. We urge AD genetics researchers and institutions to continue investing in recruiting diverse participants and use community-based participatory research practices.

3.
Nat Genet ; 55(11): 1912-1919, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37904051

ABSTRACT

Megabase-scale mosaic chromosomal alterations (mCAs) in blood are prognostic markers for a host of human diseases. Here, to gain a better understanding of mCA rates in genetically diverse populations, we analyzed whole-genome sequencing data from 67,390 individuals from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine program. We observed higher sensitivity with whole-genome sequencing data, compared with array-based data, in uncovering mCAs at low mutant cell fractions and found that individuals of European ancestry have the highest rates of autosomal mCAs and the lowest rates of chromosome X mCAs, compared with individuals of African or Hispanic ancestry. Although further studies in diverse populations will be needed to replicate our findings, we report three loci associated with loss of chromosome X, associations between autosomal mCAs and rare variants in DCPS, ADM17, PPP1R16B and TET2 and ancestry-specific variants in ATM and MPL with mCAs in cis.


Subject(s)
Genome, Human , Genome-Wide Association Study , Mosaicism , Humans , Black People/genetics , Hispanic or Latino/genetics , Precision Medicine
4.
medRxiv ; 2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37662265

ABSTRACT

Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10-9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.

5.
Ann Am Thorac Soc ; 20(8): 1124-1135, 2023 08.
Article in English | MEDLINE | ID: mdl-37351609

ABSTRACT

Rationale: Chronic obstructive pulmonary disease (COPD) is a complex disease characterized by airway obstruction and accelerated lung function decline. Our understanding of systemic protein biomarkers associated with COPD remains incomplete. Objectives: To determine what proteins and pathways are associated with impaired pulmonary function in a diverse population. Methods: We studied 6,722 participants across six cohort studies with both aptamer-based proteomic and spirometry data (4,566 predominantly White participants in a discovery analysis and 2,156 African American cohort participants in a validation). In linear regression models, we examined protein associations with baseline forced expiratory volume in 1 second (FEV1) and FEV1/forced vital capacity (FVC). In linear mixed effects models, we investigated the associations of baseline protein levels with rate of FEV1 decline (ml/yr) in 2,777 participants with up to 7 years of follow-up spirometry. Results: We identified 254 proteins associated with FEV1 in our discovery analyses, with 80 proteins validated in the Jackson Heart Study. Novel validated protein associations include kallistatin serine protease inhibitor, growth differentiation factor 2, and tumor necrosis factor-like weak inducer of apoptosis (discovery ß = 0.0561, Q = 4.05 × 10-10; ß = 0.0421, Q = 1.12 × 10-3; and ß = 0.0358, Q = 1.67 × 10-3, respectively). In longitudinal analyses within cohorts with follow-up spirometry, we identified 15 proteins associated with FEV1 decline (Q < 0.05), including elafin leukocyte elastase inhibitor and mucin-associated TFF2 (trefoil factor 2; ß = -4.3 ml/yr, Q = 0.049; ß = -6.1 ml/yr, Q = 0.032, respectively). Pathways and processes highlighted by our study include aberrant extracellular matrix remodeling, enhanced innate immune response, dysregulation of angiogenesis, and coagulation. Conclusions: In this study, we identify and validate novel biomarkers and pathways associated with lung function traits in a racially diverse population. In addition, we identify novel protein markers associated with FEV1 decline. Several protein findings are supported by previously reported genetic signals, highlighting the plausibility of certain biologic pathways. These novel proteins might represent markers for risk stratification, as well as novel molecular targets for treatment of COPD.


Subject(s)
Lung , Pulmonary Disease, Chronic Obstructive , Humans , Forced Expiratory Volume/physiology , Proteomics , Vital Capacity/physiology , Spirometry , Biomarkers
6.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37192819

ABSTRACT

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Subject(s)
COVID-19 , Cloud Computing , Humans , Ecosystem , Reproducibility of Results , Lung , Software
7.
Circ Genom Precis Med ; 16(2): e003532, 2023 04.
Article in English | MEDLINE | ID: mdl-36960714

ABSTRACT

BACKGROUND: Risk for venous thromboembolism has a strong genetic component. Whole genome sequencing from the TOPMed program (Trans-Omics for Precision Medicine) allowed us to look for new associations, particularly rare variants missed by standard genome-wide association studies. METHODS: The 3793 cases and 7834 controls (11.6% of cases were individuals of African, Hispanic/Latino, or Asian ancestry) were analyzed using a single variant approach and an aggregate gene-based approach using our primary filter (included only loss-of-function and missense variants predicted to be deleterious) and our secondary filter (included all missense variants). RESULTS: Single variant analyses identified associations at 5 known loci. Aggregate gene-based analyses identified only PROC (odds ratio, 6.2 for carriers of rare variants; P=7.4×10-14) when using our primary filter. Employing our secondary variant filter led to a smaller effect size at PROC (odds ratio, 3.8; P=1.6×10-14), while excluding variants found only in rare isoforms led to a larger one (odds ratio, 7.5). Different filtering strategies improved the signal for 2 other known genes: PROS1 became significant (minimum P=1.8×10-6 with the secondary filter), while SERPINC1 did not (minimum P=4.4×10-5 with minor allele frequency <0.0005). Results were largely the same when restricting the analyses to include only unprovoked cases; however, one novel gene, MS4A1, became significant (P=4.4×10-7 using all missense variants with minor allele frequency <0.0005). CONCLUSIONS: Here, we have demonstrated the importance of using multiple variant filtering strategies, as we detected additional genes when filtering variants based on their predicted deleteriousness, frequency, and presence on the most expressed isoforms. Our primary analyses did not identify new candidate loci; thus larger follow-up studies are needed to replicate the novel MS4A1 locus and to identify additional rare variation associated with venous thromboembolism.


Subject(s)
Genome-Wide Association Study , Venous Thromboembolism , Humans , Venous Thromboembolism/genetics , Precision Medicine , Genetic Predisposition to Disease , Gene Frequency
8.
Circ Genom Precis Med ; 16(1): e003858, 2023 02.
Article in English | MEDLINE | ID: mdl-36598822

ABSTRACT

BACKGROUND: Whether genetics contribute to the rising prevalence of obesity or its cardiovascular consequences in today's obesogenic environment remains unclear. We sought to determine whether the effects of a higher aggregate genetic burden of obesity risk on body mass index (BMI) or cardiovascular disease (CVD) differed by birth year. METHODS: We split the FHS (Framingham Heart Study) into 4 equally sized birth cohorts (birth year before 1932, 1932 to 1946, 1947 to 1959, and after 1960). We modeled a genetic predisposition to obesity using an additive genetic risk score (GRS) of 941 BMI-associated variants and tested for GRS-birth year interaction on log-BMI (outcome) when participants were around 50 years old (N=7693). We repeated the analysis using a GRS of 109 BMI-associated variants that increased CVD risk factors (type 2 diabetes, blood pressure, total cholesterol, and high-density lipoprotein) in addition to BMI. We then evaluated whether the effects of the BMI GRSs on CVD risk differed by birth cohort when participants were around 60 years old (N=5493). RESULTS: Compared with participants born before 1932 (mean age, 50.8 yrs [2.4]), those born after 1960 (mean age, 43.3 years [4.5]) had higher BMI (median, 25.4 [23.3-28.0] kg/m2 versus 26.9 [interquartile range, 23.7-30.6] kg/m2). The effect of the 941-variant BMI GRS on BMI and CVD risk was stronger in people who were born in later years (GRS-birth year interaction: P=0.0007 and P=0.04 respectively). CONCLUSIONS: The significant GRS-birth year interactions indicate that common genetic variants have larger effects on middle-age BMI and CVD risk in people born more recently. These findings suggest that the increasingly obesogenic environment may amplify the impact of genetics on the risk of obesity and possibly its cardiovascular consequences.


Subject(s)
Cardiovascular Diseases , Diabetes Mellitus, Type 2 , Middle Aged , Humans , Adult , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/genetics , Body Mass Index , Obesity/epidemiology , Obesity/genetics , Risk Factors
9.
Nat Cardiovasc Res ; 2(12): 1159-1172, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38817323

ABSTRACT

Coronary artery calcification (CAC) is a measure of atherosclerosis and a well-established predictor of coronary artery disease (CAD) events. Here we describe a genome-wide association study (GWAS) of CAC in 22,400 participants from multiple ancestral groups. We confirmed associations with four known loci and identified two additional loci associated with CAC (ARSE and MMP16), with evidence of significant associations in replication analyses for both novel loci. Functional assays of ARSE and MMP16 in human vascular smooth muscle cells (VSMCs) demonstrate that ARSE is a promoter of VSMC calcification and VSMC phenotype switching from a contractile to a calcifying or osteogenic phenotype. Furthermore, we show that the association of variants near ARSE with reduced CAC is likely explained by reduced ARSE expression with the G allele of enhancer variant rs5982944. Our study highlights ARSE as an important contributor to atherosclerotic vascular calcification, and a potential drug target for vascular calcific disease.

10.
Nat Methods ; 19(12): 1599-1611, 2022 12.
Article in English | MEDLINE | ID: mdl-36303018

ABSTRACT

Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.


Subject(s)
Genome-Wide Association Study , Genome , Humans , Genome-Wide Association Study/methods , Whole Genome Sequencing/methods , Phenotype , Genetic Variation
11.
Cell Genom ; 2(8)2022 Aug 10.
Article in English | MEDLINE | ID: mdl-36119389

ABSTRACT

How race, ethnicity, and ancestry are used in genomic research has wide-ranging implications for how research is translated into clinical care and incorporated into public understanding. Correlation between race and genetic ancestry contributes to unresolved complexity for the scientific community, as illustrated by heterogeneous definitions and applications of these variables. Here, we offer commentary and recommendations on the use of race, ethnicity, and ancestry across the arc of genetic research, including data harmonization, analysis, and reporting. While informed by our experiences as researchers affiliated with the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, these recommendations are applicable to basic and translational genomic research in diverse populations with genome-wide data. Moving forward, considerable collaborative effort will be required to ensure that race, ethnicity, and ancestry are described and used appropriately to generate scientific knowledge that yields broad and equitable benefit.

12.
Med ; 3(6): 388-405.e6, 2022 06 10.
Article in English | MEDLINE | ID: mdl-35690059

ABSTRACT

BACKGROUND: Statins remain one of the most prescribed medications worldwide. While effective in decreasing atherosclerotic cardiovascular disease risk, statin use is associated with adverse effects for a subset of patients, including disrupted metabolic control and increased risk of type 2 diabetes. METHODS: We investigated the potential role of the gut microbiome in modifying patient responses to statin therapy across two independent cohorts (discovery n = 1,848, validation n = 991). Microbiome composition was assessed in these cohorts using stool 16S rRNA amplicon and shotgun metagenomic sequencing, respectively. Microbiome associations with markers of statin on-target and adverse effects were tested via a covariate-adjusted interaction analysis framework, utilizing blood metabolomics, clinical laboratory tests, genomics, and demographics data. FINDINGS: The hydrolyzed substrate for 3-hydroxy-3-methylglutarate-coenzyme-A (HMG-CoA) reductase, HMG, emerged as a promising marker for statin on-target effects in cross-sectional cohorts. Plasma HMG levels reflected both statin therapy intensity and known genetic markers for variable statin responses. Through exploring gut microbiome associations between blood-derived measures of statin effectiveness and adverse metabolic effects of statins, we find that heterogeneity in statin responses was consistently associated with variation in the gut microbiome across two independent cohorts. A Bacteroides-enriched and diversity-depleted gut microbiome was associated with more intense statin responses, both in terms of on-target and adverse effects. CONCLUSIONS: With further study and refinement, gut microbiome monitoring may help inform precision statin treatment. FUNDING: This research was supported by the M.J. Murdock Charitable Trust, WRF, NAM Catalyst Award, and NIH grant U19AG023122 awarded by the NIA.


Subject(s)
Diabetes Mellitus, Type 2 , Gastrointestinal Microbiome , Hydroxymethylglutaryl-CoA Reductase Inhibitors , Microbiota , Cross-Sectional Studies , Diabetes Mellitus, Type 2/drug therapy , Gastrointestinal Microbiome/genetics , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , RNA, Ribosomal, 16S/genetics
13.
Cell Genom ; 2(1)2022 Jan 12.
Article in English | MEDLINE | ID: mdl-35530816

ABSTRACT

Genetic studies on telomere length are important for understanding age-related diseases. Prior GWAS for leukocyte TL have been limited to European and Asian populations. Here, we report the first sequencing-based association study for TL across ancestrally-diverse individuals (European, African, Asian and Hispanic/Latino) from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. We used whole genome sequencing (WGS) of whole blood for variant genotype calling and the bioinformatic estimation of telomere length in n=109,122 individuals. We identified 59 sentinel variants (p-value <5×10-9) in 36 loci associated with telomere length, including 20 newly associated loci (13 were replicated in external datasets). There was little evidence of effect size heterogeneity across populations. Fine-mapping at OBFC1 indicated the independent signals colocalized with cell-type specific eQTLs for OBFC1 (STN1). Using a multi-variant gene-based approach, we identified two genes newly implicated in telomere length, DCLRE1B (SNM1B) and PARN. In PheWAS, we demonstrated our TL polygenic trait scores (PTS) were associated with increased risk of cancer-related phenotypes.

14.
Am J Hum Genet ; 109(6): 1175-1181, 2022 06 02.
Article in English | MEDLINE | ID: mdl-35504290

ABSTRACT

Current publicly available tools that allow rapid exploration of linkage disequilibrium (LD) between markers (e.g., HaploReg and LDlink) are based on whole-genome sequence (WGS) data from 2,504 individuals in the 1000 Genomes Project. Here, we present TOP-LD, an online tool to explore LD inferred with high-coverage (∼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. TOP-LD provides a significant upgrade compared to current LD tools, as the TOPMed WGS data provide a more comprehensive representation of genetic variation than the 1000 Genomes data, particularly for rare variants and in the specific populations that we analyzed. For example, TOP-LD encompasses LD information for 150.3, 62.2, and 36.7 million variants for European, African, and East Asian ancestral samples, respectively, offering 2.6- to 9.1-fold increase in variant coverage compared to HaploReg 4.0 or LDlink. In addition, TOP-LD includes tens of thousands of structural variants (SVs). We demonstrate the value of TOP-LD in fine-mapping at the GGT1 locus associated with gamma glutamyltransferase in the African ancestry participants in UK Biobank. Beyond fine-mapping, TOP-LD can facilitate a wide range of applications that are based on summary statistics and estimates of LD. TOP-LD is freely available online.


Subject(s)
Genome-Wide Association Study , Precision Medicine , Asian People , Humans , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Whole Genome Sequencing
15.
Commun Biol ; 5(1): 362, 2022 05 02.
Article in English | MEDLINE | ID: mdl-35501457

ABSTRACT

Deficiency of the immune checkpoint lymphocyte activation gene-3 (LAG3) protein is significantly associated with both elevated HDL-cholesterol (HDL-C) and myocardial infarction risk. We determined the association of genetic variants within ±500 kb of LAG3 with plasma LAG3 and defined LAG3-associated plasma proteins with HDL-C and clinical outcomes. Whole genome sequencing and plasma proteomics were obtained from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Framingham Heart Study (FHS) cohorts as part of the Trans-Omics for Precision Medicine program. In situ Hi-C chromatin capture was performed in EBV-transformed cell lines isolated from four MESA participants. Genetic association analyses were performed in MESA using multivariate regression models, with validation in FHS. A LAG3-associated protein network was tested for association with HDL-C, coronary heart disease, and all-cause mortality. We identify an association between the LAG3 rs3782735 variant and plasma LAG3 protein. Proteomics analysis reveals 183 proteins significantly associated with LAG3 with four proteins associated with HDL-C. Four proteins discovered for association with all-cause mortality in FHS shows nominal associations in MESA. Chromatin capture analysis reveals significant cis interactions between LAG3 and C1S, LRIG3, TNFRSF1A, and trans interactions between LAG3 and B2M. A LAG3-associated protein network has significant associations with HDL-C and mortality.


Subject(s)
Atherosclerosis , Precision Medicine , Cholesterol, HDL , Chromatin , Humans , Lymphocyte Activation , Membrane Proteins
16.
PLoS One ; 17(2): e0264341, 2022.
Article in English | MEDLINE | ID: mdl-35202437

ABSTRACT

Genetically regulated gene expression has helped elucidate the biological mechanisms underlying complex traits. Improved high-throughput technology allows similar interrogation of the genetically regulated proteome for understanding complex trait mechanisms. Here, we used the Trans-omics for Precision Medicine (TOPMed) Multi-omics pilot study, which comprises data from Multi-Ethnic Study of Atherosclerosis (MESA), to optimize genetic predictors of the plasma proteome for genetically regulated proteome-wide association studies (PWAS) in diverse populations. We built predictive models for protein abundances using data collected in TOPMed MESA, for which we have measured 1,305 proteins by a SOMAscan assay. We compared predictive models built via elastic net regression to models integrating posterior inclusion probabilities estimated by fine-mapping SNPs prior to elastic net. In order to investigate the transferability of predictive models across ancestries, we built protein prediction models in all four of the TOPMed MESA populations, African American (n = 183), Chinese (n = 71), European (n = 416), and Hispanic/Latino (n = 301), as well as in all populations combined. As expected, fine-mapping produced more significant protein prediction models, especially in African ancestries populations, potentially increasing opportunity for discovery. When we tested our TOPMed MESA models in the independent European INTERVAL study, fine-mapping improved cross-ancestries prediction for some proteins. Using GWAS summary statistics from the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ∼50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we applied S-PrediXcan to perform PWAS for 28 complex traits. The most protein-trait associations were discovered, colocalized, and replicated in large independent GWAS using proteome prediction model training populations with similar ancestries to PAGE. At current training population sample sizes, performance between baseline and fine-mapped protein prediction models in PWAS was similar, highlighting the utility of elastic net. Our predictive models in diverse populations are publicly available for use in proteome mapping methods at https://doi.org/10.5281/zenodo.4837327.


Subject(s)
Atherosclerosis/genetics , Genetic Association Studies , Models, Genetic , Proteins/genetics , Proteome/genetics , Atherosclerosis/ethnology , Female , Gene Frequency , Humans , Male , Pilot Projects , Polymorphism, Single Nucleotide , Quantitative Trait Loci
17.
Am J Hum Genet ; 108(10): 1836-1851, 2021 10 07.
Article in English | MEDLINE | ID: mdl-34582791

ABSTRACT

Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.


Subject(s)
Asthma/epidemiology , Biomarkers/metabolism , Dermatitis, Atopic/epidemiology , Leukocytes/pathology , Polymorphism, Single Nucleotide , Pulmonary Disease, Chronic Obstructive/epidemiology , Quantitative Trait Loci , Asthma/genetics , Asthma/metabolism , Asthma/pathology , Dermatitis, Atopic/genetics , Dermatitis, Atopic/metabolism , Dermatitis, Atopic/pathology , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Humans , National Heart, Lung, and Blood Institute (U.S.) , Phenotype , Prognosis , Proteome/analysis , Proteome/metabolism , Pulmonary Disease, Chronic Obstructive/genetics , Pulmonary Disease, Chronic Obstructive/metabolism , Pulmonary Disease, Chronic Obstructive/pathology , United Kingdom/epidemiology , United States/epidemiology , Whole Genome Sequencing
18.
HGG Adv ; 2(3)2021 Jul 08.
Article in English | MEDLINE | ID: mdl-34337551

ABSTRACT

Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.

19.
Hum Mol Genet ; 30(23): 2362-2369, 2021 11 16.
Article in English | MEDLINE | ID: mdl-34270706

ABSTRACT

Numerous genome-wide association studies (GWASs) have been conducted for the identification of genetic variants involved with human height. The vast majority of these studies, however, have been conducted in populations of European ancestry. Here, we report the first GWAS of adult height in the Taiwan Biobank using a discovery sample of 14 571 individuals and an independent replication sample of 20 506 individuals. From our analysis, we generalize to the Taiwanese population genome-wide significant associations with height and 18 previously identified genes in European and non-Taiwanese East Asian populations. We also identify and replicate, at the genome-wide significance level, associated variants for height in four novel genes at two loci that have not previously been reported: RASA2 on chromosome 3 and NABP2, RNF41 and SLC39A5 at 12q13.3 on chromosome 12. RASA2 and RNF41 are strong candidates for having a role in height with copy number and loss of function variants in RASA2 previously found to be associated with short stature disorders, and decreased expression of the RNF41 gene resulting in insulin resistance in skeletal muscle. The results from our analysis of the Taiwan Biobank underscore the potential for the identification of novel genetic discoveries in underrepresented worldwide populations, even for traits, such as height, that have been extensively investigated in large-scale studies of European ancestry populations.


Subject(s)
Biological Specimen Banks , Body Height/genetics , Cation Transport Proteins/genetics , Genome-Wide Association Study , Ubiquitin-Protein Ligases/genetics , ras GTPase-Activating Proteins/genetics , Adult , Alleles , Female , Genetic Association Studies , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Taiwan
20.
Nat Commun ; 12(1): 3506, 2021 06 09.
Article in English | MEDLINE | ID: mdl-34108454

ABSTRACT

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.


Subject(s)
Genetic Variation , Genome-Wide Association Study/methods , Algorithms , Computer Simulation , Gene Frequency , Genome-Wide Association Study/standards , Genome-Wide Association Study/statistics & numerical data , Humans , Phenotype , Sample Size
SELECTION OF CITATIONS
SEARCH DETAIL
...