RESUMEN
Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.
Asunto(s)
Exoma , Variación Genética , Estudio de Asociación del Genoma Completo , Lípidos/sangre , Sistemas de Lectura Abierta , Alelos , Glucemia/genética , Estudios de Casos y Controles , Biología Computacional/métodos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Humanos , Metabolismo de los Lípidos/genética , Hígado/metabolismo , Hígado/patología , Anotación de Secuencia Molecular , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
While polygenic risk scores (PRSs) enable early identification of genetic risk for chronic obstructive pulmonary disease (COPD), predictive performance is limited when the discovery and target populations are not well matched. Hypothesizing that the biological mechanisms of disease are shared across ancestry groups, we introduce a PrediXcan-derived polygenic transcriptome risk score (PTRS) to improve cross-ethnic portability of risk prediction. We constructed the PTRS using summary statistics from application of PrediXcan on large-scale GWASs of lung function (forced expiratory volume in 1 s [FEV1] and its ratio to forced vital capacity [FEV1/FVC]) in the UK Biobank. We examined prediction performance and cross-ethnic portability of PTRS through smoking-stratified analyses both on 29,381 multi-ethnic participants from TOPMed population/family-based cohorts and on 11,771 multi-ethnic participants from TOPMed COPD-enriched studies. Analyses were carried out for two dichotomous COPD traits (moderate-to-severe and severe COPD) and two quantitative lung function traits (FEV1 and FEV1/FVC). While the proposed PTRS showed weaker associations with disease than PRS for European ancestry, the PTRS showed stronger association with COPD than PRS for African Americans (e.g., odds ratio [OR] = 1.24 [95% confidence interval [CI]: 1.08-1.43] for PTRS versus 1.10 [0.96-1.26] for PRS among heavy smokers with ≥ 40 pack-years of smoking) for moderate-to-severe COPD. Cross-ethnic portability of the PTRS was significantly higher than the PRS (paired t test p < 2.2 × 10-16 with portability gains ranging from 5% to 28%) for both dichotomous COPD traits and across all smoking strata. Our study demonstrates the value of PTRS for improved cross-ethnic portability compared to PRS in predicting COPD risk.
Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Transcriptoma , Humanos , Pulmón , National Heart, Lung, and Blood Institute (U.S.) , Enfermedad Pulmonar Obstructiva Crónica/genética , Factores de Riesgo , Estados Unidos/epidemiologíaRESUMEN
Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits.
Asunto(s)
Estudio de Asociación del Genoma Completo , Genoma , Humanos , Estudio de Asociación del Genoma Completo/métodos , Secuenciación Completa del Genoma/métodos , Fenotipo , Variación GenéticaRESUMEN
Chronic obstructive pulmonary disease (COPD) is associated with age and smoking, but other determinants of the disease are incompletely understood. Clonal hematopoiesis of indeterminate potential (CHIP) is a common, age-related state in which somatic mutations in clonal blood populations induce aberrant inflammatory responses. Patients with CHIP have an elevated risk for cardiovascular disease, but the association of CHIP with COPD remains unclear. We analyzed whole-genome sequencing and whole-exome sequencing data to detect CHIP in 48 835 patients, of whom 8444 had moderate to very severe COPD, from four separate cohorts with COPD phenotyping and smoking history. We measured emphysema in murine models in which Tet2 was deleted in hematopoietic cells. In the COPDGene cohort, individuals with CHIP had risks of moderate-to-severe, severe, or very severe COPD that were 1.6 (adjusted 95% confidence interval [CI], 1.1-2.2) and 2.2 (adjusted 95% CI, 1.5-3.2) times greater than those for noncarriers. These findings were consistently observed in three additional cohorts and meta-analyses of all patients. CHIP was also associated with decreased FEV1% predicted in the COPDGene cohort (mean between-group differences, -5.7%; adjusted 95% CI, -8.8% to -2.6%), a finding replicated in additional cohorts. Smoke exposure was associated with a small but significant increased risk of having CHIP (odds ratio, 1.03 per 10 pack-years; 95% CI, 1.01-1.05 per 10 pack-years) in the meta-analysis of all patients. Inactivation of Tet2 in mouse hematopoietic cells exacerbated the development of emphysema and inflammation in models of cigarette smoke exposure. Somatic mutations in blood cells are associated with the development and severity of COPD, independent of age and cumulative smoke exposure.
Asunto(s)
Hematopoyesis Clonal , Enfermedad Pulmonar Obstructiva Crónica/genética , Animales , Femenino , Humanos , Masculino , Ratones , Persona de Mediana Edad , Oportunidad Relativa , Enfermedad Pulmonar Obstructiva Crónica/etiología , Factores de Riesgo , Fumar/efectos adversos , Secuenciación del ExomaRESUMEN
Genome-wide association studies (GWASs) have successfully identified loci of the human genome implicated in numerous complex traits. However, the limitations of this study design make it difficult to identify specific causal variants or biological mechanisms of association. We propose a novel method, AnnoRE, which uses GWAS summary statistics, local correlation structure among genotypes and functional annotation from external databases to prioritize the most plausible causal single-nucleotide polymorphisms (SNPs) in each trait-associated locus. Our proposed method improves upon previous fine-mapping approaches by estimating the effects of functional annotation from genome-wide summary statistics, allowing for the inclusion of many annotation categories. By implementing a multiple regression model with differential shrinkage via random effects, we avoid reductive assumptions on the number of causal SNPs per locus. Application of this method to a large GWAS meta-analysis of body mass index identified six loci with significant evidence in favor of one or more variants. In an additional 24 loci, one or two variants were strongly prioritized over others in the region. The use of functional annotation in genetic fine-mapping studies helps to distinguish between variants in high LD and to identify promising targets for follow-up studies.
Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Mapeo Cromosómico/métodos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Herencia Multifactorial , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
INTRODUCTION: Observational studies have shown that body mass index (BMI) and waist-to-hip ratio (WHR) are both inversely associated with lung function, as assessed by forced vital capacity (FVC) and forced expiratory volume in 1 s (FEV1). However, observational data are susceptible to confounding and reverse causation. METHODS: We selected genetic instruments based on their relevant large-scale genome-wide association studies. Summary statistics of lung function and asthma came from the UK Biobank and SpiroMeta Consortium meta-analysis (n = 400,102). After examining pleiotropy and removing outliers, we applied inverse-variance weighting to estimate the causal association of BMI and BMI-adjusted WHR (WHRadjBMI) with FVC, FEV1, FEV1/FVC, and asthma. Sensitivity analyses were performed using weighted median, MR-Egger, and MRlap methods. RESULTS: We found that BMI was inversely associated with FVC (effect estimate, -0.167; 95% confidence interval (CI), -0.203 to -0.130) and FEV1 (effect estimate, -0.111; 95%CI, -0.149 to -0.074). Higher BMI was associated with higher FEV1/FVC (effect estimate, 0.079; 95%CI, 0.049 to 0.110) but was not significantly associated with asthma. WHRadjBMI was inversely associated with FVC (effect estimate, -0.132; 95%CI, -0.180 to -0.084) but has no significant association with FEV1. Higher WHR was associated with higher FEV1/FVC (effect estimate, 0.181; 95%CI, 0.130 to 0.232) and with increased risk of asthma (effect estimate, 0.027; 95%CI, 0.001 to 0.053). CONCLUSION: We found significant evidence that increased BMI is suggested to be causally related to decreased FVC and FEV1, and increased BMI-adjusted WHR could lead to lower FVC value and higher risk of asthma. Higher BMI and BMI-adjusted WHR were suggested to be causally associated with higher FEV1/FVC.
Asunto(s)
Asma , Pulmón , Humanos , Asma/genética , Índice de Masa Corporal , Volumen Espiratorio Forzado , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Obesidad/genéticaRESUMEN
Cardiovascular disease (CVD) is responsible for 31% of all deaths worldwide. Among CVD risk factors are age, race, increased systolic blood pressure (BP), and dyslipidemia. Both BP and blood lipids levels change with age, with a dose-dependent relationship between the cumulative exposure to hyperlipidemia and the risk of CVD. We performed an exome sequence association study using longitudinal data with up to 7805 European Americans (EAs) and 3171 African Americans (AAs) from the Atherosclerosis Risk in Communities (ARIC) study. We assessed associations of common (minor allele frequency > 5%) nonsynonymous and splice-site variants and gene-based sets of rare variants with levels and with longitudinal change of seven CVD risk factor phenotypes (BP traits: systolic BP, diastolic BP, pulse pressure; lipids traits: triglycerides, total cholesterol, high-density lipoprotein cholesterol [HDL-C], low-density lipoprotein cholesterol [LDL-C]). Furthermore, we investigated the relationship of the identified variants and genes with select CVD endpoints. We identified two novel genes: DCLK3 associated with the change of HDL-C levels in AAs and RAB7L1 associated with the change of LDL-C levels in EAs. RAB7L1 is further associated with an increased risk of heart failure in ARIC EAs. Investigation of the contribution of genetic factors to the longitudinal change of CVD risk factor phenotypes promotes our understanding of the etiology of CVD outcomes, stressing the importance of incorporating the longitudinal structure of the cohort data in future analyses.
Asunto(s)
Aterosclerosis , Enfermedades Cardiovasculares , Negro o Afroamericano/genética , Aterosclerosis/genética , Enfermedades Cardiovasculares/epidemiología , Enfermedades Cardiovasculares/genética , Exoma , Factores de Riesgo de Enfermedad Cardiaca , Humanos , Fenotipo , Factores de RiesgoRESUMEN
BACKGROUND: While large genome-wide association studies have identified nearly one thousand loci associated with variation in blood pressure, rare variant identification is still a challenge. In family-based cohorts, genome-wide linkage scans have been successful in identifying rare genetic variants for blood pressure. This study aims to identify low frequency and rare genetic variants within previously reported linkage regions on chromosomes 1 and 19 in African American families from the Trans-Omics for Precision Medicine (TOPMed) program. Genetic association analyses weighted by linkage evidence were completed with whole genome sequencing data within and across TOPMed ancestral groups consisting of 60,388 individuals of European, African, East Asian, Hispanic, and Samoan ancestries. RESULTS: Associations of low frequency and rare variants in RCN3 and multiple other genes were observed for blood pressure traits in TOPMed samples. The association of low frequency and rare coding variants in RCN3 was further replicated in UK Biobank samples (N = 403,522), and reached genome-wide significance for diastolic blood pressure (p = 2.01 × 10- 7). CONCLUSIONS: Low frequency and rare variants in RCN3 contributes blood pressure variation. This study demonstrates that focusing association analyses in linkage regions greatly reduces multiple-testing burden and improves power to identify novel rare variants associated with blood pressure traits.
Asunto(s)
Estudio de Asociación del Genoma Completo , Medicina de Precisión , Presión Sanguínea/genética , Ligamiento Genético , Predisposición Genética a la Enfermedad , Humanos , Polimorfismo de Nucleótido Simple , Secuenciación Completa del GenomaRESUMEN
Hemoglobin A1c (HbA1c) is widely used to diagnose diabetes and assess glycemic control in individuals with diabetes. However, nonglycemic determinants, including genetic variation, may influence how accurately HbA1c reflects underlying glycemia. Analyzing the NHLBI Trans-Omics for Precision Medicine (TOPMed) sequence data in 10,338 individuals from five studies and four ancestries (6,158 Europeans, 3,123 African-Americans, 650 Hispanics, and 407 East Asians), we confirmed five regions associated with HbA1c (GCK in Europeans and African-Americans, HK1 in Europeans and Hispanics, FN3K and/or FN3KRP in Europeans, and G6PD in African-Americans and Hispanics) and we identified an African-ancestry-specific low-frequency variant (rs1039215 in HBG2 and HBE1, minor allele frequency (MAF) = 0.03). The most associated G6PD variant (rs1050828-T, p.Val98Met, MAF = 12% in African-Americans, MAF = 2% in Hispanics) lowered HbA1c (-0.88% in hemizygous males, -0.34% in heterozygous females) and explained 23% of HbA1c variance in African-Americans and 4% in Hispanics. Additionally, we identified a rare distinct G6PD coding variant (rs76723693, p.Leu353Pro, MAF = 0.5%; -0.98% in hemizygous males, -0.46% in heterozygous females) and detected significant association with HbA1c when aggregating rare missense variants in G6PD. We observed similar magnitude and direction of effects for rs1039215 (HBG2) and rs76723693 (G6PD) in the two largest TOPMed African American cohorts, and we replicated the rs76723693 association in the UK Biobank African-ancestry participants. These variants in G6PD and HBG2 were monomorphic in the European and Asian samples. African or Hispanic ancestry individuals carrying G6PD variants may be underdiagnosed for diabetes when screened with HbA1c. Thus, assessment of these variants should be considered for incorporation into precision medicine approaches for diabetes diagnosis.
Asunto(s)
Diabetes Mellitus/diagnóstico , Diabetes Mellitus/genética , Variación Genética , Hemoglobina Glucada/genética , Grupos de Población/genética , Medicina de Precisión , Estudios de Cohortes , Femenino , Humanos , Masculino , Polimorfismo de Nucleótido SimpleRESUMEN
With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
Asunto(s)
Estudios de Asociación Genética , Modelos Genéticos , Secuenciación Completa del Genoma , Cromosomas Humanos Par 4/genética , Nube Computacional , Femenino , Fibrinógeno/análisis , Fibrinógeno/genética , Genética de Población , Humanos , Masculino , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisión , Proyectos de Investigación , Factores de Tiempo , Estados UnidosRESUMEN
Most genome-wide association and fine-mapping studies to date have been conducted in individuals of European descent, and genetic studies of populations of Hispanic/Latino and African ancestry are limited. In addition, these populations have more complex linkage disequilibrium structure. In order to better define the genetic architecture of these understudied populations, we leveraged >100,000 phased sequences available from deep-coverage whole genome sequencing through the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) program to impute genotypes into admixed African and Hispanic/Latino samples with genome-wide genotyping array data. We demonstrated that using TOPMed sequencing data as the imputation reference panel improves genotype imputation quality in these populations, which subsequently enhanced gene-mapping power for complex traits. For rare variants with minor allele frequency (MAF) < 0.5%, we observed a 2.3- to 6.1-fold increase in the number of well-imputed variants, with 11-34% improvement in average imputation quality, compared to the state-of-the-art 1000 Genomes Project Phase 3 and Haplotype Reference Consortium reference panels. Impressively, even for extremely rare variants with minor allele count <10 (including singletons) in the imputation target samples, average information content rescued was >86%. Subsequent association analyses of TOPMed reference panel-imputed genotype data with hematological traits (hemoglobin (HGB), hematocrit (HCT), and white blood cell count (WBC)) in ~21,600 African-ancestry and ~21,700 Hispanic/Latino individuals identified associations with two rare variants in the HBB gene (rs33930165 with higher WBC [p = 8.8x10-15] in African populations, rs11549407 with lower HGB [p = 1.5x10-12] and HCT [p = 8.8x10-10] in Hispanics/Latinos). By comparison, neither variant would have been genome-wide significant if either 1000 Genomes Project Phase 3 or Haplotype Reference Consortium reference panels had been used for imputation. Our findings highlight the utility of the TOPMed imputation reference panel for identification of novel rare variant associations not previously detected in similarly sized genome-wide studies of under-represented African and Hispanic/Latino populations.
Asunto(s)
Negro o Afroamericano/genética , Hispánicos o Latinos/genética , Medicina de Precisión/métodos , Secuenciación Completa del Genoma/métodos , Globinas beta/genética , Adulto , Anciano , Anciano de 80 o más Años , Biología Computacional/métodos , Bases de Datos Genéticas , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Estados UnidosRESUMEN
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948-2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.
Asunto(s)
Estudios de Asociación Genética/métodos , Fenómica/métodos , Medicina de Precisión/métodos , Agregación de Datos , Humanos , Difusión de la Información , National Heart, Lung, and Blood Institute (U.S.) , Fenotipo , Evaluación de Programas y Proyectos de Salud , Estados UnidosRESUMEN
A correction to this paper has been published and can be accessed via a link at the top of the paper.
RESUMEN
The Alzheimer's Disease Sequencing Project (ADSP) undertook whole exome sequencing in 5,740 late-onset Alzheimer disease (AD) cases and 5,096 cognitively normal controls primarily of European ancestry (EA), among whom 218 cases and 177 controls were Caribbean Hispanic (CH). An age-, sex- and APOE based risk score and family history were used to select cases most likely to harbor novel AD risk variants and controls least likely to develop AD by age 85 years. We tested ~1.5 million single nucleotide variants (SNVs) and 50,000 insertion-deletion polymorphisms (indels) for association to AD, using multiple models considering individual variants as well as gene-based tests aggregating rare, predicted functional, and loss of function variants. Sixteen single variants and 19 genes that met criteria for significant or suggestive associations after multiple-testing correction were evaluated for replication in four independent samples; three with whole exome sequencing (2,778 cases, 7,262 controls) and one with genome-wide genotyping imputed to the Haplotype Reference Consortium panel (9,343 cases, 11,527 controls). The top findings in the discovery sample were also followed-up in the ADSP whole-genome sequenced family-based dataset (197 members of 42 EA families and 501 members of 157 CH families). We identified novel and predicted functional genetic variants in genes previously associated with AD. We also detected associations in three novel genes: IGHG3 (p = 9.8 × 10-7), an immunoglobulin gene whose antibodies interact with ß-amyloid, a long non-coding RNA AC099552.4 (p = 1.2 × 10-7), and a zinc-finger protein ZNF655 (gene-based p = 5.0 × 10-6). The latter two suggest an important role for transcriptional regulation in AD pathogenesis.
Asunto(s)
Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/inmunología , Secuenciación del Exoma , Regulación de la Expresión Génica/genética , Inmunidad/genética , Transcripción Genética/genética , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/patología , Péptidos beta-Amiloides/inmunología , Apolipoproteínas E/genética , Femenino , Haplotipos/genética , Humanos , Inmunoglobulina G , Factores de Transcripción de Tipo Kruppel/genética , Masculino , Polimorfismo Genético/genética , ARN Largo no Codificante/genéticaRESUMEN
BACKGROUND: Statistical methods for modeling longitudinal and time-to-event data has received much attention in medical research and is becoming increasingly useful. In clinical studies, such as cancer and AIDS, longitudinal biomarkers are used to monitor disease progression and to predict survival. These longitudinal measures are often missing at failure times and may be prone to measurement errors. More importantly, time-dependent survival models that include the raw longitudinal measurements may lead to biased results. In previous studies these two types of data are frequently analyzed separately where a mixed effects model is used for the longitudinal data and a survival model is applied to the event outcome. METHODS: In this paper we compare joint maximum likelihood methods, a two-step approach and a time dependent covariate method that link longitudinal data to survival data with emphasis on using longitudinal measures to predict survival. We apply a Bayesian semi-parametric joint method and maximum likelihood joint method that maximizes the joint likelihood of the time-to-event and longitudinal measures. We also implement the Two-Step approach, which estimates random effects separately, and a classic Time Dependent Covariate Model. We use simulation studies to assess bias, accuracy, and coverage probabilities for the estimates of the link parameter that connects the longitudinal measures to survival times. RESULTS: Simulation results demonstrate that the Two-Step approach performed best at estimating the link parameter when variability in the longitudinal measure is low but is somewhat biased downwards when the variability is high. Bayesian semi-parametric and maximum likelihood joint methods yield higher link parameter estimates with low and high variability in the longitudinal measure. The Time Dependent Covariate method resulted in consistent underestimation of the link parameter. We illustrate these methods using data from the Framingham Heart Study in which lipid measurements and Myocardial Infarction data were collected over a period of 26 years. CONCLUSIONS: Traditional methods for modeling longitudinal and survival data, such as the time dependent covariate method, that use the observed longitudinal data, tend to provide downwardly biased estimates. The two-step approach and joint models provide better estimates, although a comparison of these methods may depend on the underlying residual variance.
Asunto(s)
Modelos Estadísticos , Teorema de Bayes , Sesgo , Simulación por Computador , Humanos , Estudios Longitudinales , Análisis de SupervivenciaRESUMEN
Human GWAS of obesity have been successful in identifying loci associated with adiposity, but for the most part, these are non-coding SNPs whose function, or even whose gene of action, is unknown. To help identify the genes on which these human BMI loci may be operating, we conducted a high throughput screen in Drosophila melanogaster. Starting with 78 BMI loci from two recently published GWAS meta-analyses, we identified fly orthologs of all nearby genes (± 250KB). We crossed RNAi knockdown lines of each gene with flies containing tissue-specific drivers to knock down (KD) the expression of the genes only in the brain and the fat body. We then raised the flies on a control diet and compared the amount of fat/triglyceride in the tissue-specific KD group compared to the driver-only control flies. 16 of the 78 BMI GWAS loci could not be screened with this approach, as no gene in the 500-kb region had a fly ortholog. Of the remaining 62 GWAS loci testable in the fly, we found a significant fat phenotype in the KD flies for at least one gene for 26 loci (42%) even after correcting for multiple comparisons. By contrast, the rate of significant fat phenotypes in RNAi KD found in a recent genome-wide Drosophila screen (Pospisilik et al. (2010) is ~5%. More interestingly, for 10 of the 26 positive regions, we found that the nearest gene was not the one that showed a significant phenotype in the fly. Specifically, our screen suggests that for the 10 human BMI SNPs rs11057405, rs205262, rs9925964, rs9914578, rs2287019, rs11688816, rs13107325, rs7164727, rs17724992, and rs299412, the functional genes may NOT be the nearest ones (CLIP1, C6orf106, KAT8, SMG6, QPCTL, EHBP1, SLC39A8, ADPGK /ADPGK-AS1, PGPEP1, KCTD15, respectively), but instead, the specific nearby cis genes are the functional target (namely: ZCCHC8, VPS33A, RSRC2; SPDEF, NUDT3; PAGR1; SETD1, VKORC1; SGSM2, SRR; VASP, SIX5; OTX1; BANK1; ARIH1; ELL; CHST8, respectively). The study also suggests further functional experiments to elucidate mechanism of action for genes evolutionarily conserved for fat storage.
Asunto(s)
Índice de Masa Corporal , Cruzamientos Genéticos , Drosophila melanogaster/genética , Estudio de Asociación del Genoma Completo , Obesidad/genética , Interferencia de ARN , Tejido Adiposo , Animales , Humanos , Ratones , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
When testing genotype-phenotype associations using linear regression, departure of the trait distribution from normality can impact both Type I error rate control and statistical power, with worse consequences for rarer variants. Because genotypes are expected to have small effects (if any) investigators now routinely use a two-stage method, in which they first regress the trait on covariates, obtain residuals, rank-normalize them, and then use the rank-normalized residuals in association analysis with the genotypes. Potential confounding signals are assumed to be removed at the first stage, so in practice, no further adjustment is done in the second stage. Here, we show that this widely used approach can lead to tests with undesirable statistical properties, due to both combination of a mis-specified mean-variance relationship and remaining covariate associations between the rank-normalized residuals and genotypes. We demonstrate these properties theoretically, and also in applications to genome-wide and whole-genome sequencing association studies. We further propose and evaluate an alternative fully adjusted two-stage approach that adjusts for covariates both when residuals are obtained and in the subsequent association test. This method can reduce excess Type I errors and improve statistical power.
Asunto(s)
Estudios de Asociación Genética , Modelos Genéticos , Simulación por Computador , Estudio de Asociación del Genoma Completo , Genotipo , Hemoglobinas/metabolismo , Hispánicos o Latinos , Humanos , Modelos Lineales , FenotipoRESUMEN
It remains unclear whether the increased risk of new-onset type 2 diabetes (T2D) seen in statin users is due to low LDL-C concentrations, or due to the statin-induced proportional change in LDL-C. In addition, genetic instruments have not been proposed before to examine whether liability to T2D might cause greater proportional statin-induced LDL-C lowering. Using summary-level statistics from the Genomic Investigation of Statin Therapy (GIST, nmax = 40,914) and DIAGRAM (nmax = 159,208) consortia, we found a positive genetic correlation between LDL-C statin response and T2D using LD score regression (rgenetic = 0.36, s.e. = 0.13). However, mendelian randomization analyses did not provide support for statin response having a causal effect on T2D risk (OR 1.00 (95% CI: 0.97, 1.03) per 10% increase in statin response), nor that liability to T2D has a causal effect on statin-induced LDL-C response (0.20% increase in response (95% CI: -0.40, 0.80) per doubling of odds of liability to T2D). Although we found no evidence to suggest that proportional statin response influences T2D risk, a definitive assessment should be made in populations comprised exclusively of statin users, as the presence of nonstatin users in the DIAGRAM dataset may have substantially diluted our effect estimate.
Asunto(s)
LDL-Colesterol/genética , Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Inhibidores de Hidroximetilglutaril-CoA Reductasas/efectos adversos , Análisis de la Aleatorización Mendeliana/métodos , LDL-Colesterol/sangre , LDL-Colesterol/efectos de los fármacos , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/inducido químicamente , Femenino , Estudio de Asociación del Genoma Completo/métodos , Humanos , MasculinoRESUMEN
Phenotypic variance heterogeneity across genotypes at a single nucleotide polymorphism (SNP) may reflect underlying gene-environment (G×E) or gene-gene interactions. We modeled variance heterogeneity for blood lipids and BMI in up to 44,211 participants and investigated relationships between variance effects (Pv), G×E interaction effects (with smoking and physical activity), and marginal genetic effects (Pm). Correlations between Pv and Pm were stronger for SNPs with established marginal effects (Spearman's ρ = 0.401 for triglycerides, and ρ = 0.236 for BMI) compared to all SNPs. When Pv and Pm were compared for all pruned SNPs, only BMI was statistically significant (Spearman's ρ = 0.010). Overall, SNPs with established marginal effects were overrepresented in the nominally significant part of the Pv distribution (Pbinomial <0.05). SNPs from the top 1% of the Pm distribution for BMI had more significant Pv values (PMann-Whitney = 1.46×10-5), and the odds ratio of SNPs with nominally significant (<0.05) Pm and Pv was 1.33 (95% CI: 1.12, 1.57) for BMI. Moreover, BMI SNPs with nominally significant G×E interaction P-values (Pint<0.05) were enriched with nominally significant Pv values (Pbinomial = 8.63×10-9 and 8.52×10-7 for SNP × smoking and SNP × physical activity, respectively). We conclude that some loci with strong marginal effects may be good candidates for G×E, and variance-based prioritization can be used to identify them.
Asunto(s)
HDL-Colesterol/genética , LDL-Colesterol/genética , Interacción Gen-Ambiente , Obesidad/genética , Índice de Masa Corporal , HDL-Colesterol/sangre , LDL-Colesterol/sangre , Femenino , Heterogeneidad Genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Obesidad/sangre , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética , Factores de Riesgo , Fumar/genética , Población Blanca/genéticaRESUMEN
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.