RESUMO
Schizophrenia spectrum disorders (SSDs) are characterized by substantial clinical and genetic heterogeneity. Multiple recurrent copy number variants (CNVs) increase risk for SSDs; however, how known risk CNVs and broader genome-wide CNVs influence clinical variability is unclear. The current study examined associations between borderline intellectual functioning or childhood-onset psychosis, known risk CNVs, and burden of deletions affecting genes in 18 previously validated neurodevelopmental gene-sets in 618 SSD individuals. CNV associations were assessed for replication in 235 SSD relatives and 583 controls, and 9,930 youth from the Adolescent Brain Cognitive Development (ABCD) Study. Known SSD- and neurodevelopmental disorder (NDD)-risk CNVs were associated with borderline intellectual functioning in SSD cases (odds ratios (OR) = 7.09 and 4.57, respectively); NDD-risk deletions were nominally associated with childhood-onset psychosis (OR = 4.34). Furthermore, deletion of genes involved in regulating gene expression during fetal brain development was associated with borderline intellectual functioning across SSD cases and non-cases (OR = 2.58), with partial replication in the ABCD cohort. Exploratory analyses of cortical morphology showed associations between fetal gene regulatory gene deletions and altered gray matter volume and cortical thickness across cohorts. Results highlight contributions of known risk CNVs to phenotypic variability in SSD and the utility of a neurodevelopmental framework for identifying mechanisms that influence phenotypic variability in SSDs, as well as the broader population, with implications for personalized medicine approaches to care.
RESUMO
Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.
Assuntos
Estudo de Associação Genômica Ampla , Homeostase do Telômero , Telômero , Humanos , Telômero/genética , Telômero/metabolismo , Células K562 , Homeostase do Telômero/genética , Polimorfismo de Nucleotídeo Único , Regulação da Expressão Gênica , Sistemas CRISPR-CasRESUMO
INTRODUCTION: Alzheimer's disease (AD) is a complex disease influenced by genetics and environment. More than 75 susceptibility loci have been linked to late-onset AD, but most of these loci were discovered in genome-wide association studies (GWAS) exclusive to non-Hispanic White individuals. There are wide disparities in AD risk across racially stratified groups, and while these disparities are not due to genetic differences, underrepresentation in genetic research can further exacerbate and contribute to their persistence. We investigated the racial/ethnic representation of participants in United States (US)-based AD genetics and the statistical implications of current representation. METHODS: We compared racial/ethnic data of participants from array and sequencing studies in US AD genetics databases, including National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) and NIAGADS Data Sharing Service (dssNIAGADS), to AD and related dementia (ADRD) prevalence and mortality. We then simulated the statistical power of these datasets to identify risk variants from non-White populations. RESULTS: There is insufficient statistical power (probability <80%) to detect single nucleotide polymorphisms (SNPs) with low to moderate effect sizes (odds ratio [OR]<1.5) using array data from Black and Hispanic participants; studies of Asian participants are not powered to detect variants OR <= 2. Using available and projected sequencing data from Black and Hispanic participants, risk variants with OR = 1.2 are detectable at high allele frequencies. Sample sizes remain insufficiently powered to detect these variants in Asian populations. DISCUSSION: AD genetics datasets are largely representative of US ADRD burden. However, there is a wide discrepancy between proportional representation and statistically meaningful representation. Most variation identified in GWAS of non-Hispanic White individuals have low to moderate effects. Comparable risk variants in non-White populations are not detectable given current sample sizes, which could lead to disparities in future studies and drug development. We urge AD genetics researchers and institutions to continue investing in recruiting diverse participants and use community-based participatory research practices.
RESUMO
Megabase-scale mosaic chromosomal alterations (mCAs) in blood are prognostic markers for a host of human diseases. Here, to gain a better understanding of mCA rates in genetically diverse populations, we analyzed whole-genome sequencing data from 67,390 individuals from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine program. We observed higher sensitivity with whole-genome sequencing data, compared with array-based data, in uncovering mCAs at low mutant cell fractions and found that individuals of European ancestry have the highest rates of autosomal mCAs and the lowest rates of chromosome X mCAs, compared with individuals of African or Hispanic ancestry. Although further studies in diverse populations will be needed to replicate our findings, we report three loci associated with loss of chromosome X, associations between autosomal mCAs and rare variants in DCPS, ADM17, PPP1R16B and TET2 and ancestry-specific variants in ATM and MPL with mCAs in cis.
Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Mosaicismo , Humanos , População Negra/genética , Hispânico ou Latino/genética , Medicina de PrecisãoRESUMO
Obesity is a major public health crisis associated with high mortality rates. Previous genome-wide association studies (GWAS) investigating body mass index (BMI) have largely relied on imputed data from European individuals. This study leveraged whole-genome sequencing (WGS) data from 88,873 participants from the Trans-Omics for Precision Medicine (TOPMed) Program, of which 51% were of non-European population groups. We discovered 18 BMI-associated signals (P < 5 × 10-9). Notably, we identified and replicated a novel low frequency single nucleotide polymorphism (SNP) in MTMR3 that was common in individuals of African descent. Using a diverse study population, we further identified two novel secondary signals in known BMI loci and pinpointed two likely causal variants in the POC5 and DMD loci. Our work demonstrates the benefits of combining WGS and diverse cohorts in expanding current catalog of variants and genes confer risk for obesity, bringing us one step closer to personalized medicine.
RESUMO
Rationale: Chronic obstructive pulmonary disease (COPD) is a complex disease characterized by airway obstruction and accelerated lung function decline. Our understanding of systemic protein biomarkers associated with COPD remains incomplete. Objectives: To determine what proteins and pathways are associated with impaired pulmonary function in a diverse population. Methods: We studied 6,722 participants across six cohort studies with both aptamer-based proteomic and spirometry data (4,566 predominantly White participants in a discovery analysis and 2,156 African American cohort participants in a validation). In linear regression models, we examined protein associations with baseline forced expiratory volume in 1 second (FEV1) and FEV1/forced vital capacity (FVC). In linear mixed effects models, we investigated the associations of baseline protein levels with rate of FEV1 decline (ml/yr) in 2,777 participants with up to 7 years of follow-up spirometry. Results: We identified 254 proteins associated with FEV1 in our discovery analyses, with 80 proteins validated in the Jackson Heart Study. Novel validated protein associations include kallistatin serine protease inhibitor, growth differentiation factor 2, and tumor necrosis factor-like weak inducer of apoptosis (discovery ß = 0.0561, Q = 4.05 × 10-10; ß = 0.0421, Q = 1.12 × 10-3; and ß = 0.0358, Q = 1.67 × 10-3, respectively). In longitudinal analyses within cohorts with follow-up spirometry, we identified 15 proteins associated with FEV1 decline (Q < 0.05), including elafin leukocyte elastase inhibitor and mucin-associated TFF2 (trefoil factor 2; ß = -4.3 ml/yr, Q = 0.049; ß = -6.1 ml/yr, Q = 0.032, respectively). Pathways and processes highlighted by our study include aberrant extracellular matrix remodeling, enhanced innate immune response, dysregulation of angiogenesis, and coagulation. Conclusions: In this study, we identify and validate novel biomarkers and pathways associated with lung function traits in a racially diverse population. In addition, we identify novel protein markers associated with FEV1 decline. Several protein findings are supported by previously reported genetic signals, highlighting the plausibility of certain biologic pathways. These novel proteins might represent markers for risk stratification, as well as novel molecular targets for treatment of COPD.
Assuntos
Pulmão , Doença Pulmonar Obstrutiva Crônica , Humanos , Volume Expiratório Forçado/fisiologia , Proteômica , Capacidade Vital/fisiologia , Espirometria , BiomarcadoresRESUMO
BACKGROUND: Risk for venous thromboembolism has a strong genetic component. Whole genome sequencing from the TOPMed program (Trans-Omics for Precision Medicine) allowed us to look for new associations, particularly rare variants missed by standard genome-wide association studies. METHODS: The 3793 cases and 7834 controls (11.6% of cases were individuals of African, Hispanic/Latino, or Asian ancestry) were analyzed using a single variant approach and an aggregate gene-based approach using our primary filter (included only loss-of-function and missense variants predicted to be deleterious) and our secondary filter (included all missense variants). RESULTS: Single variant analyses identified associations at 5 known loci. Aggregate gene-based analyses identified only PROC (odds ratio, 6.2 for carriers of rare variants; P=7.4×10-14) when using our primary filter. Employing our secondary variant filter led to a smaller effect size at PROC (odds ratio, 3.8; P=1.6×10-14), while excluding variants found only in rare isoforms led to a larger one (odds ratio, 7.5). Different filtering strategies improved the signal for 2 other known genes: PROS1 became significant (minimum P=1.8×10-6 with the secondary filter), while SERPINC1 did not (minimum P=4.4×10-5 with minor allele frequency <0.0005). Results were largely the same when restricting the analyses to include only unprovoked cases; however, one novel gene, MS4A1, became significant (P=4.4×10-7 using all missense variants with minor allele frequency <0.0005). CONCLUSIONS: Here, we have demonstrated the importance of using multiple variant filtering strategies, as we detected additional genes when filtering variants based on their predicted deleteriousness, frequency, and presence on the most expressed isoforms. Our primary analyses did not identify new candidate loci; thus larger follow-up studies are needed to replicate the novel MS4A1 locus and to identify additional rare variation associated with venous thromboembolism.
Assuntos
Estudo de Associação Genômica Ampla , Tromboembolia Venosa , Humanos , Tromboembolia Venosa/genética , Medicina de Precisão , Predisposição Genética para Doença , Frequência do GeneRESUMO
BACKGROUND: Whether genetics contribute to the rising prevalence of obesity or its cardiovascular consequences in today's obesogenic environment remains unclear. We sought to determine whether the effects of a higher aggregate genetic burden of obesity risk on body mass index (BMI) or cardiovascular disease (CVD) differed by birth year. METHODS: We split the FHS (Framingham Heart Study) into 4 equally sized birth cohorts (birth year before 1932, 1932 to 1946, 1947 to 1959, and after 1960). We modeled a genetic predisposition to obesity using an additive genetic risk score (GRS) of 941 BMI-associated variants and tested for GRS-birth year interaction on log-BMI (outcome) when participants were around 50 years old (N=7693). We repeated the analysis using a GRS of 109 BMI-associated variants that increased CVD risk factors (type 2 diabetes, blood pressure, total cholesterol, and high-density lipoprotein) in addition to BMI. We then evaluated whether the effects of the BMI GRSs on CVD risk differed by birth cohort when participants were around 60 years old (N=5493). RESULTS: Compared with participants born before 1932 (mean age, 50.8 yrs [2.4]), those born after 1960 (mean age, 43.3 years [4.5]) had higher BMI (median, 25.4 [23.3-28.0] kg/m2 versus 26.9 [interquartile range, 23.7-30.6] kg/m2). The effect of the 941-variant BMI GRS on BMI and CVD risk was stronger in people who were born in later years (GRS-birth year interaction: P=0.0007 and P=0.04 respectively). CONCLUSIONS: The significant GRS-birth year interactions indicate that common genetic variants have larger effects on middle-age BMI and CVD risk in people born more recently. These findings suggest that the increasingly obesogenic environment may amplify the impact of genetics on the risk of obesity and possibly its cardiovascular consequences.
Assuntos
Doenças Cardiovasculares , Diabetes Mellitus Tipo 2 , Pessoa de Meia-Idade , Humanos , Adulto , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/genética , Índice de Massa Corporal , Obesidade/epidemiologia , Obesidade/genética , Fatores de RiscoRESUMO
Coronary artery calcification (CAC) is a measure of atherosclerosis and a well-established predictor of coronary artery disease (CAD) events. Here we describe a genome-wide association study (GWAS) of CAC in 22,400 participants from multiple ancestral groups. We confirmed associations with four known loci and identified two additional loci associated with CAC (ARSE and MMP16), with evidence of significant associations in replication analyses for both novel loci. Functional assays of ARSE and MMP16 in human vascular smooth muscle cells (VSMCs) demonstrate that ARSE is a promoter of VSMC calcification and VSMC phenotype switching from a contractile to a calcifying or osteogenic phenotype. Furthermore, we show that the association of variants near ARSE with reduced CAC is likely explained by reduced ARSE expression with the G allele of enhancer variant rs5982944. Our study highlights ARSE as an important contributor to atherosclerotic vascular calcification, and a potential drug target for vascular calcific disease.
RESUMO
Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.
Assuntos
Estudo de Associação Genômica Ampla , Genoma , Humanos , Estudo de Associação Genômica Ampla/métodos , Sequenciamento Completo do Genoma/métodos , Fenótipo , Variação GenéticaRESUMO
How race, ethnicity, and ancestry are used in genomic research has wide-ranging implications for how research is translated into clinical care and incorporated into public understanding. Correlation between race and genetic ancestry contributes to unresolved complexity for the scientific community, as illustrated by heterogeneous definitions and applications of these variables. Here, we offer commentary and recommendations on the use of race, ethnicity, and ancestry across the arc of genetic research, including data harmonization, analysis, and reporting. While informed by our experiences as researchers affiliated with the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, these recommendations are applicable to basic and translational genomic research in diverse populations with genome-wide data. Moving forward, considerable collaborative effort will be required to ensure that race, ethnicity, and ancestry are described and used appropriately to generate scientific knowledge that yields broad and equitable benefit.
RESUMO
BACKGROUND: Statins remain one of the most prescribed medications worldwide. While effective in decreasing atherosclerotic cardiovascular disease risk, statin use is associated with adverse effects for a subset of patients, including disrupted metabolic control and increased risk of type 2 diabetes. METHODS: We investigated the potential role of the gut microbiome in modifying patient responses to statin therapy across two independent cohorts (discovery n = 1,848, validation n = 991). Microbiome composition was assessed in these cohorts using stool 16S rRNA amplicon and shotgun metagenomic sequencing, respectively. Microbiome associations with markers of statin on-target and adverse effects were tested via a covariate-adjusted interaction analysis framework, utilizing blood metabolomics, clinical laboratory tests, genomics, and demographics data. FINDINGS: The hydrolyzed substrate for 3-hydroxy-3-methylglutarate-coenzyme-A (HMG-CoA) reductase, HMG, emerged as a promising marker for statin on-target effects in cross-sectional cohorts. Plasma HMG levels reflected both statin therapy intensity and known genetic markers for variable statin responses. Through exploring gut microbiome associations between blood-derived measures of statin effectiveness and adverse metabolic effects of statins, we find that heterogeneity in statin responses was consistently associated with variation in the gut microbiome across two independent cohorts. A Bacteroides-enriched and diversity-depleted gut microbiome was associated with more intense statin responses, both in terms of on-target and adverse effects. CONCLUSIONS: With further study and refinement, gut microbiome monitoring may help inform precision statin treatment. FUNDING: This research was supported by the M.J. Murdock Charitable Trust, WRF, NAM Catalyst Award, and NIH grant U19AG023122 awarded by the NIA.
Assuntos
Diabetes Mellitus Tipo 2 , Microbioma Gastrointestinal , Inibidores de Hidroximetilglutaril-CoA Redutases , Microbiota , Estudos Transversais , Diabetes Mellitus Tipo 2/tratamento farmacológico , Microbioma Gastrointestinal/genética , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , RNA Ribossômico 16S/genéticaRESUMO
Current publicly available tools that allow rapid exploration of linkage disequilibrium (LD) between markers (e.g., HaploReg and LDlink) are based on whole-genome sequence (WGS) data from 2,504 individuals in the 1000 Genomes Project. Here, we present TOP-LD, an online tool to explore LD inferred with high-coverage (â¼30×) WGS data from 15,578 individuals in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. TOP-LD provides a significant upgrade compared to current LD tools, as the TOPMed WGS data provide a more comprehensive representation of genetic variation than the 1000 Genomes data, particularly for rare variants and in the specific populations that we analyzed. For example, TOP-LD encompasses LD information for 150.3, 62.2, and 36.7 million variants for European, African, and East Asian ancestral samples, respectively, offering 2.6- to 9.1-fold increase in variant coverage compared to HaploReg 4.0 or LDlink. In addition, TOP-LD includes tens of thousands of structural variants (SVs). We demonstrate the value of TOP-LD in fine-mapping at the GGT1 locus associated with gamma glutamyltransferase in the African ancestry participants in UK Biobank. Beyond fine-mapping, TOP-LD can facilitate a wide range of applications that are based on summary statistics and estimates of LD. TOP-LD is freely available online.
Assuntos
Estudo de Associação Genômica Ampla , Medicina de Precisão , Povo Asiático , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do GenomaRESUMO
Genetic studies on telomere length are important for understanding age-related diseases. Prior GWAS for leukocyte TL have been limited to European and Asian populations. Here, we report the first sequencing-based association study for TL across ancestrally-diverse individuals (European, African, Asian and Hispanic/Latino) from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. We used whole genome sequencing (WGS) of whole blood for variant genotype calling and the bioinformatic estimation of telomere length in n=109,122 individuals. We identified 59 sentinel variants (p-value <5×10-9) in 36 loci associated with telomere length, including 20 newly associated loci (13 were replicated in external datasets). There was little evidence of effect size heterogeneity across populations. Fine-mapping at OBFC1 indicated the independent signals colocalized with cell-type specific eQTLs for OBFC1 (STN1). Using a multi-variant gene-based approach, we identified two genes newly implicated in telomere length, DCLRE1B (SNM1B) and PARN. In PheWAS, we demonstrated our TL polygenic trait scores (PTS) were associated with increased risk of cancer-related phenotypes.
RESUMO
Deficiency of the immune checkpoint lymphocyte activation gene-3 (LAG3) protein is significantly associated with both elevated HDL-cholesterol (HDL-C) and myocardial infarction risk. We determined the association of genetic variants within ±500 kb of LAG3 with plasma LAG3 and defined LAG3-associated plasma proteins with HDL-C and clinical outcomes. Whole genome sequencing and plasma proteomics were obtained from the Multi-Ethnic Study of Atherosclerosis (MESA) and the Framingham Heart Study (FHS) cohorts as part of the Trans-Omics for Precision Medicine program. In situ Hi-C chromatin capture was performed in EBV-transformed cell lines isolated from four MESA participants. Genetic association analyses were performed in MESA using multivariate regression models, with validation in FHS. A LAG3-associated protein network was tested for association with HDL-C, coronary heart disease, and all-cause mortality. We identify an association between the LAG3 rs3782735 variant and plasma LAG3 protein. Proteomics analysis reveals 183 proteins significantly associated with LAG3 with four proteins associated with HDL-C. Four proteins discovered for association with all-cause mortality in FHS shows nominal associations in MESA. Chromatin capture analysis reveals significant cis interactions between LAG3 and C1S, LRIG3, TNFRSF1A, and trans interactions between LAG3 and B2M. A LAG3-associated protein network has significant associations with HDL-C and mortality.
Assuntos
Aterosclerose , Medicina de Precisão , HDL-Colesterol , Cromatina , Humanos , Ativação Linfocitária , Proteínas de MembranaRESUMO
Many common and rare variants associated with hematologic traits have been discovered through imputation on large-scale reference panels. However, the majority of genome-wide association studies (GWASs) have been conducted in Europeans, and determining causal variants has proved challenging. We performed a GWAS of total leukocyte, neutrophil, lymphocyte, monocyte, eosinophil, and basophil counts generated from 109,563,748 variants in the autosomes and the X chromosome in the Trans-Omics for Precision Medicine (TOPMed) program, which included data from 61,802 individuals of diverse ancestry. We discovered and replicated 7 leukocyte trait associations, including (1) the association between a chromosome X, pseudo-autosomal region (PAR), noncoding variant located between cytokine receptor genes (CSF2RA and CLRF2) and lower eosinophil count; and (2) associations between single variants found predominantly among African Americans at the S1PR3 (9q22.1) and HBB (11p15.4) loci and monocyte and lymphocyte counts, respectively. We further provide evidence indicating that the newly discovered eosinophil-lowering chromosome X PAR variant might be associated with reduced susceptibility to common allergic diseases such as atopic dermatitis and asthma. Additionally, we found a burden of very rare FLT3 (13q12.2) variants associated with monocyte counts. Together, these results emphasize the utility of whole-genome sequencing in diverse samples in identifying associations missed by European-ancestry-driven GWASs.
Assuntos
Asma/epidemiologia , Biomarcadores/metabolismo , Dermatite Atópica/epidemiologia , Leucócitos/patologia , Polimorfismo de Nucleotídeo Único , Doença Pulmonar Obstrutiva Crônica/epidemiologia , Locos de Características Quantitativas , Asma/genética , Asma/metabolismo , Asma/patologia , Dermatite Atópica/genética , Dermatite Atópica/metabolismo , Dermatite Atópica/patologia , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , National Heart, Lung, and Blood Institute (U.S.) , Fenótipo , Prognóstico , Proteoma/análise , Proteoma/metabolismo , Doença Pulmonar Obstrutiva Crônica/genética , Doença Pulmonar Obstrutiva Crônica/metabolismo , Doença Pulmonar Obstrutiva Crônica/patologia , Reino Unido/epidemiologia , Estados Unidos/epidemiologia , Sequenciamento Completo do GenomaRESUMO
Whole-genome sequencing (WGS) and whole-exome sequencing studies have become increasingly available and are being used to identify rare genetic variants associated with health and disease outcomes. Investigators routinely use mixed models to account for genetic relatedness or other clustering variables (e.g., family or household) when testing genetic associations. However, no existing tests of the association of a rare variant with a binary outcome in the presence of correlated data control the type 1 error where there are (1) few individuals harboring the rare allele, (2) a small proportion of cases relative to controls, and (3) covariates to adjust for. Here, we address all three issues in developing a framework for testing rare variant association with a binary trait in individuals harboring at least one risk allele. In this framework, we estimate outcome probabilities under the null hypothesis and then use them, within the individuals with at least one risk allele, to test variant associations. We extend the BinomiRare test, which was previously proposed for independent observations, and develop the Conway-Maxwell-Poisson (CMP) test and study their properties in simulations. We show that the BinomiRare test always controls the type 1 error, while the CMP test sometimes does not. We then use the BinomiRare test to test the association of rare genetic variants in target genes with small-vessel disease (SVD) stroke, short sleep, and venous thromboembolism (VTE), in whole-genome sequence data from the Trans-Omics for Precision Medicine (TOPMed) program.
RESUMO
Numerous genome-wide association studies (GWASs) have been conducted for the identification of genetic variants involved with human height. The vast majority of these studies, however, have been conducted in populations of European ancestry. Here, we report the first GWAS of adult height in the Taiwan Biobank using a discovery sample of 14 571 individuals and an independent replication sample of 20 506 individuals. From our analysis, we generalize to the Taiwanese population genome-wide significant associations with height and 18 previously identified genes in European and non-Taiwanese East Asian populations. We also identify and replicate, at the genome-wide significance level, associated variants for height in four novel genes at two loci that have not previously been reported: RASA2 on chromosome 3 and NABP2, RNF41 and SLC39A5 at 12q13.3 on chromosome 12. RASA2 and RNF41 are strong candidates for having a role in height with copy number and loss of function variants in RASA2 previously found to be associated with short stature disorders, and decreased expression of the RNF41 gene resulting in insulin resistance in skeletal muscle. The results from our analysis of the Taiwan Biobank underscore the potential for the identification of novel genetic discoveries in underrepresented worldwide populations, even for traits, such as height, that have been extensively investigated in large-scale studies of European ancestry populations.
Assuntos
Bancos de Espécimes Biológicos , Estatura/genética , Proteínas de Transporte de Cátions/genética , Estudo de Associação Genômica Ampla , Ubiquitina-Proteína Ligases/genética , Proteínas Ativadoras de ras GTPase/genética , Adulto , Alelos , Feminino , Estudos de Associação Genética , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , TaiwanRESUMO
In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.
Assuntos
Variação Genética , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Simulação por Computador , Frequência do Gene , Estudo de Associação Genômica Ampla/normas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Fenótipo , Tamanho da AmostraRESUMO
Traditional Hardy-Weinberg equilibrium (HWE) tests (the χ2 test and the exact test) have long been used as a metric for evaluating genotype quality, as technical artifacts leading to incorrect genotype calls often can be identified as deviations from HWE. However, in data sets composed of individuals from diverse ancestries, HWE can be violated even without genotyping error, complicating the use of HWE testing to assess genotype data quality. In this manuscript, we present the Robust Unified Test for HWE (RUTH) to test for HWE while accounting for population structure and genotype uncertainty, and to evaluate the impact of population heterogeneity and genotype uncertainty on the standard HWE tests and alternative methods using simulated and real sequence data sets. Our results demonstrate that ignoring population structure or genotype uncertainty in HWE tests can inflate false-positive rates by many orders of magnitude. Our evaluations demonstrate different tradeoffs between false positives and statistical power across the methods, with RUTH consistently among the best across all evaluations. RUTH is implemented as a practical and scalable software tool to rapidly perform HWE tests across millions of markers and hundreds of thousands of individuals while supporting standard VCF/BCF formats. RUTH is publicly available at https://www.github.com/statgen/ruth.