RESUMO
We investigate saddlepoint approximations of tail probabilities of the score test statistic in logistic regression for genome-wide association studies. The inaccuracy in the normal approximation of the score test statistic increases with increasing imbalance in the response and with decreasing minor allele counts. Applying saddlepoint approximation methods greatly improve the accuracy, even far out in the tails of the distribution. By using exact results for a simple logistic regression model, as well as simulations for models with nuisance parameters, we compare double saddlepoint methods for computing two-sided P $$ P $$ -values and mid- P $$ P $$ -values. These methods are also compared to a recent single saddlepoint procedure. We investigate the methods further on data from UK Biobank with skin and soft tissue infections as phenotype, using both common and rare variants.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Modelos Logísticos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , ProbabilidadeRESUMO
We consider cross-sectional genetic association studies (common and rare variants) where non-genetic information is available or feasible to obtain for N individuals, but where it is infeasible to genotype all N individuals. We consider continuously measurable Gaussian traits (phenotypes). Genotyping n < N extreme phenotype individuals can yield better power to detect phenotype-genotype associations, as compared to randomly selecting n individuals. We define a person as having an extreme phenotype if the observed phenotype is above a specified threshold or below a specified threshold. We consider a model where these thresholds can be tailored to each individual. The classical extreme sampling design is to set equal thresholds for all individuals. We introduce a design (z-extreme sampling) where personalized thresholds are defined based on the residuals of a regression model including only non-genetic (fully available) information. We derive score tests for the situation where only n extremes are analyzed (complete case analysis) and for the situation where the non-genetic information on N - n non-extremes is included in the analysis (all case analysis). For the classical design, all case analysis is generally more powerful than complete case analysis. For the z-extreme sample, we show that all case and complete case tests are equally powerful. Simulations and data analysis also show that z-extreme sampling is at least as powerful as the classical extreme sampling design and the classical design is shown to be at times less powerful than random sampling. The method of dichotomizing extreme phenotypes is also discussed.
Assuntos
Estudos de Associação Genética , Fenótipo , Estudos de Amostragem , Estudos Transversais , Estudos de Associação Genética/métodos , Variação Genética , Humanos , Modelos LinearesRESUMO
The human proteome is a crucial intermediate between complex diseases and their genetic and environmental components, and an important source of drug development targets and biomarkers. Here, we comprehensively assess the genetic architecture of 257 circulating protein biomarkers of cardiometabolic relevance through high-depth (22.5×) whole-genome sequencing (WGS) in 1328 individuals. We discover 131 independent sequence variant associations (P < 7.45 × 10-11) across the allele frequency spectrum, all of which replicate in an independent cohort (n = 1605, 18.4x WGS). We identify for the first time replicating evidence for rare-variant cis-acting protein quantitative trait loci for five genes, involving both coding and noncoding variation. We construct and validate polygenic scores that explain up to 45% of protein level variation. We find causal links between protein levels and disease risk, identifying high-value biomarkers and drug development targets.
Assuntos
Miocárdio/metabolismo , Proteoma/genética , Sequenciamento Completo do Genoma , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Herança Multifatorial/genética , Proteoma/metabolismo , Locos de Características Quantitativas/genética , Fatores de RiscoRESUMO
The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1457 individuals from an isolated population, and test for rare variant burdens across six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens that are independent of established common variant signals (ADIPOQ and adiponectin, P = 4.2 × 10-8; APOC3 and triglyceride levels, P = 1.5 × 10-26), and identify replicating evidence for a burden associated with triglyceride levels in FAM189B (P = 2.2 × 10-8), indicating a role for this gene in lipid metabolism.
Assuntos
Alelos , Característica Quantitativa Herdável , Sequenciamento Completo do Genoma , Estudos de Coortes , Frequência do Gene/genética , Variação Genética , HumanosRESUMO
The original version of this Article contained an error in Fig. 2. In panel a, the two legend items "rare" and "common" were inadvertently swapped. This has been corrected in both the PDF and HTML versions of the Article.
RESUMO
BACKGROUND: Our aim was to assess the influence of age, gender and lifestyle factors on the effect of the obesity-promoting alleles of FTO and MCR4. METHODS: The HUNT study comprises health information on the population of Nord-Trøndelag county, Norway. Extreme phenotype participants (gender-wise lower and upper quartiles of waist-hip-ratio and BMI ≥ 35 kg/m2) in the third survey, HUNT3 (2006-08), were genotyped for the single-nucleotide polymorphisms rs9939609 (FTO) and rs17782313 (MC4R); 25686 participants were successfully genotyped. Extreme sampling was chosen to increase power to detect genetic and gene-environment effects on waist-hip-ratio and BMI. Statistical inference was based on linear regression models and a missing-covariate likelihood approach for the extreme phenotype sampling design. Environmental factors were physical activity, diet (artificially sweetened beverages) and smoking. Longitudinal analysis was performed using material from HUNT2 (1995-97). RESULTS: Cross-sectional and longitudinal genetic effects indicated stronger genetic associations with obesity in young than in old, as well as differences between women and men. We observed larger genetic effects among physically inactive compared to active individuals. This interaction was age-dependent and seen mainly in 20-40 year olds. We observed a greater FTO effect among men with a regular intake of artificially sweetened beverages, compared to non-drinkers. Interaction analysis of smoking was mainly inconclusive. CONCLUSIONS: In a large all-adult and area-based population survey the effects of obesity-promoting minor-alleles of FTO and MCR4, and interactions with life style factors are age- and gender-related. These findings appear relevant when designing individualized treatment for and prophylaxis against obesity.