RESUMEN
BACKGROUND: Dyslipoproteinemia often involves simultaneous derangements of multiple lipid traits. We aimed to evaluate the phenotypic and genetic characteristics of combined lipid disturbances in a general population-based cohort. METHODS: Among UK Biobank participants without prevalent coronary artery disease, we used blood lipid and apolipoprotein B concentrations to ascribe individuals into 1 of 6 reproducible and mutually exclusive dyslipoproteinemia subtypes. Incident coronary artery disease risk was estimated for each subtype using Cox proportional hazards models. Phenome-wide analyses and genome-wide association studies were performed for each subtype, followed by in silico causal gene prioritization and heritability analyses. Additionally, the prevalence of disruptive variants in causal genes for Mendelian lipid disorders was assessed using whole-exome sequence data. RESULTS: Among 450â 636 UK Biobank participants: 63 (0.01%) had chylomicronemia; 40â 005 (8.9%) had hypercholesterolemia; 94â 785 (21.0%) had combined hyperlipidemia; 13â 998 (3.1%) had remnant hypercholesterolemia; 110â 389 (24.5%) had hypertriglyceridemia; and 49 (0.01%) had mixed hypertriglyceridemia and hypercholesterolemia. Over a median (interquartile range) follow-up of 11.1 (10.4-11.8) years, incident coronary artery disease risk varied across subtypes, with combined hyperlipidemia exhibiting the largest hazard (hazard ratio, 1.92 [95% CI, 1.84-2.01]; P=2×10-16), even when accounting for non-HDL-C (hazard ratio, 1.45 [95% CI, 1.30-1.60]; P=2.6×10-12). Genome-wide association studies revealed 250 loci significantly associated with dyslipoproteinemia subtypes, of which 72 (28.8%) were not detected in prior single lipid trait genome-wide association studies. Mendelian lipid variant carriers were rare (2.0%) among individuals with dyslipoproteinemia, but polygenic heritability was high, ranging from 23% for remnant hypercholesterolemia to 54% for combined hyperlipidemia. CONCLUSIONS: Simultaneous assessment of multiple lipid derangements revealed nuanced differences in coronary artery disease risk and genetic architectures across dyslipoproteinemia subtypes. These findings highlight the importance of looking beyond single lipid traits to better understand combined lipid and lipoprotein phenotypes and implications for disease risk.
Asunto(s)
Enfermedad de la Arteria Coronaria , Dislipidemias , Estudio de Asociación del Genoma Completo , Humanos , Femenino , Masculino , Persona de Mediana Edad , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/sangre , Enfermedad de la Arteria Coronaria/epidemiología , Dislipidemias/genética , Dislipidemias/sangre , Dislipidemias/epidemiología , Dislipidemias/diagnóstico , Anciano , Lípidos/sangre , Adulto , Reino Unido/epidemiología , Apolipoproteína B-100/genética , Apolipoproteína B-100/sangre , Fenotipo , Predisposición Genética a la EnfermedadRESUMEN
A challenge in standard genetic studies is maintaining good power to detect associations, especially for low prevalent diseases and rare variants. The traditional methods are most powerful when evaluating the association between variants in balanced study designs. Without accounting for family correlation and unbalanced case-control ratio, these analyses could result in inflated type I error. One cost-effective solution to increase statistical power is exploitation of available family history (FH) that contains valuable information about disease heritability. Here, we develop methods to address the aforementioned type I error issues while providing optimal power to analyze aggregates of rare variants by incorporating additional information from FH. With enhanced power in these methods exploiting FH and accounting for relatedness and unbalanced designs, we successfully detect genes with suggestive associations with Alzheimer disease, dementia, and type 2 diabetes by using the exome chip data from the Framingham Heart Study.
Asunto(s)
Diabetes Mellitus Tipo 2 , Estudios de Casos y Controles , Diabetes Mellitus Tipo 2/genética , Exoma , Variación Genética/genética , Humanos , Estudios Longitudinales , Modelos Genéticos , Secuenciación del ExomaRESUMEN
BACKGROUND: Arterial and venous cardiovascular conditions, such as coronary artery disease (CAD), peripheral artery disease (PAD), and venous thromboembolism (VTE), are genetically correlated. Interrogating underlying mechanisms may shed light on disease mechanisms. In this study, we aimed to identify (1) epidemiological and (2) causal, genetic relationships between metabolites and CAD, PAD, and VTE. METHODS: We used metabolomic data from 95â 402 individuals in the UK Biobank, excluding individuals with prevalent cardiovascular disease. Cox proportional-hazards models estimated the associations of 249 metabolites with incident disease. Bidirectional 2-sample Mendelian randomization (MR) estimated the causal effects between metabolites and outcomes using genome-wide association summary statistics for metabolites (n=118â 466 from the UK Biobank), CAD (n=184â 305 from CARDIoGRAMplusC4D 2015), PAD (n=243â 060 from the Million Veterans Project), and VTE (n=650â 119 from the Million Veterans Project). Multivariable MR was performed in subsequent analyses. RESULTS: We found that 196, 115, and 74 metabolites were associated (P<0.001) with CAD, PAD, and VTE, respectively. Further interrogation of these metabolites with MR revealed 94, 34, and 9 metabolites with potentially causal effects on CAD, PAD, and VTE, respectively. There were 21 metabolites common to CAD and PAD and 4 common to PAD and VTE. Many putatively causal metabolites included lipoprotein traits with heterogeneity across different sizes and lipid subfractions. Small VLDL (very-low-density lipoprotein) particles increased the risk for CAD while large VLDL particles decreased the risk for VTE. We identified opposing directions of CAD and PAD effects for cholesterol and triglyceride concentrations within HDLs (high-density lipoproteins). Subsequent sensitivity analyses including multivariable MR revealed several metabolites with robust, potentially causal effects of VLDL particles on CAD. CONCLUSIONS: While common vascular conditions are associated with overlapping metabolomic profiles, MR prioritized the role of specific lipoprotein species for potential pharmacological targets to maximize benefits in both arterial and venous beds.
Asunto(s)
Enfermedad de la Arteria Coronaria , Análisis de la Aleatorización Mendeliana , Metabolómica , Enfermedad Arterial Periférica , Tromboembolia Venosa , Humanos , Enfermedad de la Arteria Coronaria/epidemiología , Enfermedad de la Arteria Coronaria/sangre , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad Arterial Periférica/epidemiología , Enfermedad Arterial Periférica/diagnóstico , Enfermedad Arterial Periférica/sangre , Enfermedad Arterial Periférica/genética , Tromboembolia Venosa/sangre , Tromboembolia Venosa/epidemiología , Tromboembolia Venosa/diagnóstico , Tromboembolia Venosa/genética , Masculino , Femenino , Persona de Mediana Edad , Factores de Riesgo , Anciano , Medición de Riesgo , Estudio de Asociación del Genoma Completo , Reino Unido/epidemiologíaRESUMEN
BACKGROUND: Calcific aortic stenosis (CAS) is the most common valvular heart disease in older adults and has no effective preventive therapies. Genome-wide association studies (GWAS) can identify genes influencing disease and may help prioritize therapeutic targets for CAS. METHODS: We performed a GWAS and gene association study of 14 451 patients with CAS and 398 544 controls in the Million Veteran Program. Replication was performed in the Million Veteran Program, Penn Medicine Biobank, Mass General Brigham Biobank, BioVU, and BioMe, totaling 12 889 cases and 348 094 controls. Causal genes were prioritized from genome-wide significant variants using polygenic priority score gene localization, expression quantitative trait locus colocalization, and nearest gene methods. CAS genetic architecture was compared with that of atherosclerotic cardiovascular disease. Causal inference for cardiometabolic biomarkers in CAS was performed using Mendelian randomization and genome-wide significant loci were characterized further through phenome-wide association study. RESULTS: We identified 23 genome-wide significant lead variants in our GWAS representing 17 unique genomic regions. Of the 23 lead variants, 14 were significant in replication, representing 11 unique genomic regions. Five replicated genomic regions were previously known risk loci for CAS (PALMD, TEX41, IL6, LPA, FADS) and 6 were novel (CEP85L, FTO, SLMAP, CELSR2, MECOM, CDAN1). Two novel lead variants were associated in non-White individuals (P<0.05): rs12740374 (CELSR2) in Black and Hispanic individuals and rs1522387 (SLMAP) in Black individuals. Of the 14 replicated lead variants, only 2 (rs10455872 [LPA], rs12740374 [CELSR2]) were also significant in atherosclerotic cardiovascular disease GWAS. In Mendelian randomization, lipoprotein(a) and low-density lipoprotein cholesterol were both associated with CAS, but the association between low-density lipoprotein cholesterol and CAS was attenuated when adjusting for lipoprotein(a). Phenome-wide association study highlighted varying degrees of pleiotropy, including between CAS and obesity at the FTO locus. However, the FTO locus remained associated with CAS after adjusting for body mass index and maintained a significant independent effect on CAS in mediation analysis. CONCLUSIONS: We performed a multiancestry GWAS in CAS and identified 6 novel genomic regions in the disease. Secondary analyses highlighted the roles of lipid metabolism, inflammation, cellular senescence, and adiposity in the pathobiology of CAS and clarified the shared and differential genetic architectures of CAS with atherosclerotic cardiovascular diseases.
Asunto(s)
Estenosis de la Válvula Aórtica , Veteranos , Humanos , Anciano , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Estenosis de la Válvula Aórtica/genética , Obesidad/genética , Factores de Transcripción/genética , Lipoproteína(a)/genética , Lipoproteínas LDL , Colesterol , Polimorfismo de Nucleótido Simple , Glicoproteínas/genética , Proteínas Nucleares/genéticaRESUMEN
BACKGROUND: Individuals who have experienced a stroke, or transient ischemic attack, face a heightened risk of future cardiovascular events. Identification of genetic and molecular risk factors for subsequent cardiovascular outcomes may identify effective therapeutic targets to improve prognosis after an incident stroke. METHODS: We performed genome-wide association studies for subsequent major adverse cardiovascular events (MACE; ncases=51 929; ncontrols=39 980) and subsequent arterial ischemic stroke (AIS; ncases=45 120; ncontrols=46 789) after the first incident stroke within the Million Veteran Program and UK Biobank. We then used genetic variants associated with proteins (protein quantitative trait loci) to determine the effect of 1463 plasma protein abundances on subsequent MACE using Mendelian randomization. RESULTS: Two variants were significantly associated with subsequent cardiovascular events: rs76472767 near gene RNF220 (odds ratio, 0.75 [95% CI, 0.64-0.85]; P=3.69×10-8) with subsequent AIS and rs13294166 near gene LINC01492 (odds ratio, 1.52 [95% CI, 1.37-1.67]; P=3.77×10-8) with subsequent MACE. Using Mendelian randomization, we identified 2 proteins with an effect on subsequent MACE after a stroke: CCL27 ([C-C motif chemokine 27], effect odds ratio, 0.77 [95% CI, 0.66-0.88]; adjusted P=0.05) and TNFRSF14 ([tumor necrosis factor receptor superfamily member 14], effect odds ratio, 1.42 [95% CI, 1.24-1.60]; adjusted P=0.006). These proteins are not associated with incident AIS and are implicated to have a role in inflammation. CONCLUSIONS: We found evidence that 2 proteins with little effect on incident stroke appear to influence subsequent MACE after incident AIS. These associations suggest that inflammation is a contributing factor to subsequent MACE outcomes after incident AIS and highlights potential novel targets.
Asunto(s)
Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Accidente Cerebrovascular , Veteranos , Humanos , Masculino , Accidente Cerebrovascular/genética , Accidente Cerebrovascular/epidemiología , Femenino , Reino Unido/epidemiología , Persona de Mediana Edad , Anciano , Progresión de la Enfermedad , Polimorfismo de Nucleótido Simple/genética , Accidente Cerebrovascular Isquémico/genética , Accidente Cerebrovascular Isquémico/epidemiología , Factores de Riesgo , Sitios de Carácter Cuantitativo , Biobanco del Reino UnidoRESUMEN
INTRODUCTION: Genome-wide association studies (GWAS) have identified loci associated with Alzheimer's disease (AD) but did not identify specific causal genes or variants within those loci. Analysis of whole genome sequence (WGS) data, which interrogates the entire genome and captures rare variations, may identify causal variants within GWAS loci. METHODS: We performed single common variant association analysis and rare variant aggregate analyses in the pooled population (N cases = 2184, N controls = 2383) and targeted analyses in subpopulations using WGS data from the Alzheimer's Disease Sequencing Project (ADSP). The analyses were restricted to variants within 100 kb of 83 previously identified GWAS lead variants. RESULTS: Seventeen variants were significantly associated with AD within five genomic regions implicating the genes OARD1/NFYA/TREML1, JAZF1, FERMT2, and SLC24A4. KAT8 was implicated by both single variant and rare variant aggregate analyses. DISCUSSION: This study demonstrates the utility of leveraging WGS to gain insights into AD loci identified via GWAS.
Asunto(s)
Enfermedad de Alzheimer , Estudio de Asociación del Genoma Completo , Secuenciación Completa del Genoma , Humanos , Enfermedad de Alzheimer/genética , Femenino , Masculino , Predisposición Genética a la Enfermedad/genética , Anciano , Polimorfismo de Nucleótido Simple/genética , Variación Genética/genéticaRESUMEN
INTRODUCTION: Alzheimer's disease (AD) is a common disorder of the elderly that is both highly heritable and genetically heterogeneous. METHODS: We investigated the association of AD with both common variants and aggregates of rare coding and non-coding variants in 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data. RESULTS: Pooled-population analyses of all individuals identified genetic variants at apolipoprotein E (APOE) and BIN1 associated with AD (p < 5 × 10-8). Subgroup-specific analyses identified a haplotype on chromosome 14 including PSEN1 associated with AD in Hispanics, further supported by aggregate testing of rare coding and non-coding variants in the region. Common variants in LINC00320 were observed associated with AD in Black individuals (p = 1.9 × 10-9). Finally, we observed rare non-coding variants in the promoter of TOMM40 distinct of APOE in pooled-population analyses (p = 7.2 × 10-8). DISCUSSION: We observed that complementary pooled-population and subgroup-specific analyses offered unique insights into the genetic architecture of AD. HIGHLIGHTS: We determine the association of genetic variants with Alzheimer's disease (AD) using 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data. We identified genetic variants at apolipoprotein E (APOE), BIN1, PSEN1, and LINC00320 associated with AD. We observed rare non-coding variants in the promoter of TOMM40 distinct of APOE.
RESUMEN
BACKGROUND: Prospective cohort studies have found a relation between sugar-sweetened beverage (SSB) consumption (sodas and fruit drinks) and dyslipidemia. There is limited evidence linking SSB consumption to emerging features of dyslipidemia, which can be characterized by variation in lipoprotein particle size, remnant-like particle (RLP), and apolipoprotein concentrations. OBJECTIVES: To examine the association between SSB consumption and plasma lipoprotein cholesterol, apolipoprotein, and lipoprotein particle size concentrations among US adults. METHODS: We examined participants from the Framingham Offspring Study (FOS; 1987-1995, n = 3047) and the Women's Health Study (1992, n = 26,218). Concentrations of plasma LDL cholesterol, apolipoprotein B (apoB), HDL cholesterol, apolipoprotein A1 (apoA1), triglyceride (TG), and non-HDL cholesterol, as well as total cholesterol:HDL cholesterol ratio and apoB:apoA1 ratio, were quantified in both cohorts; concentrations of apolipoprotein E, apolipoprotein C3, RLP-TG, and RLP cholesterol (RLP-C) were measured in the FOS only. Lipoprotein particle sizes were calculated from nuclear magnetic resonance signals for lipoprotein particle subclass concentrations (TG-rich lipoprotein particles [TRL-Ps]: very large, large, medium, small, and very small; LDL particles [LDL-Ps]: large, medium, and small; HDL particles [HDL-Ps]: large, medium, and small). SSB consumption was estimated from food frequency questionnaire data. We examined the associations between SSB consumption and all lipoprotein and apoprotein measures in linear regression models, adjusting for confounding factors such as lifestyle, diet, and traditional lipoprotein risk factors. RESULTS: SSB consumption was positively associated with LDL cholesterol, apoB, TG, RLP-TG, RLP-C, and non-HDL cholesterol concentrations and total cholesterol:HDL cholesterol and apoB:apoA1 ratios; and negatively associated with HDL cholesterol and apoA1 concentrations (P-trend range: <0.0001 to 0.008). After adjustment for traditional lipoprotein risk factors, SSB consumers had smaller LDL-P and HDL-P sizes; lower concentrations of large LDL-Ps and medium HDL-Ps; and higher concentrations of small LDL-Ps, small HDL-Ps, and large TRL-Ps (P-trend range: <0.0001 to 0.001). CONCLUSIONS: Higher SSB consumption was associated with multiple emerging features of dyslipidemia that have been linked to higher cardiometabolic risk in US adults.
Asunto(s)
Dislipidemias , Bebidas Azucaradas , Adulto , Femenino , Humanos , Apolipoproteínas , Apolipoproteínas B , Colesterol , HDL-Colesterol , LDL-Colesterol , Lipoproteínas , Tamaño de la Partícula , Estudios Prospectivos , Triglicéridos , MasculinoRESUMEN
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Asunto(s)
Exoma/genética , Variación Genética/genética , Análisis Mutacional de ADN , Conjuntos de Datos como Asunto , Humanos , Fenotipo , Proteoma/genética , Enfermedades Raras/genética , Tamaño de la MuestraRESUMEN
Population stratification may cause an inflated type-I error and spurious association when assessing the association between genetic variations with an outcome. Many genetic association studies are now using exonic variants, which captures only 1% of the genome, however, population stratification adjustments have not been evaluated in the context of exonic variants. We compare the performance of two established approaches: principal components analysis (PCA) and mixed-effects models and assess the utility of genome-wide (GW) and exonic variants, by simulation and using a data set from the Framingham Heart Study. Our results illustrate that although the PCs and genetic relationship matrices computed by GW and exonic markers are different, the type-I error rate of association tests for common variants with additive effect appear to be properly controlled in the presence of population stratification. In addition, by considering single nucleotide variants (SNVs) that have different levels of confounding by population stratification, we also compare the power across multiple association approaches to account for population stratification such as PC-based corrections and mixed-effects models. We find that while these two methods achieve a similar power for SNVs that have a low or medium level of confounding by population stratification, mixed-effects model can reach a higher power for SNVs highly confounded by population stratification.
Asunto(s)
Estudios de Asociación Genética/métodos , Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Simulación por Computador , Genotipo , Humanos , Análisis de Componente PrincipalRESUMEN
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase statistical power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data-set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data-sharing mechanisms. This system was developed for the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other -omics data for more than 80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants (recruited in 1948-2012) from up to 17 studies per phenotype. Here we discuss challenges in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include 1) the software code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify, or extend these harmonizations to additional studies, and 2) the results of labeling thousands of phenotype variables with controlled vocabulary terms.
Asunto(s)
Estudios de Asociación Genética/métodos , Fenómica/métodos , Medicina de Precisión/métodos , Agregación de Datos , Humanos , Difusión de la Información , National Heart, Lung, and Blood Institute (U.S.) , Fenotipo , Evaluación de Programas y Proyectos de Salud , Estados UnidosRESUMEN
Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
Asunto(s)
Alelos , Apolipoproteínas A/genética , Exoma/genética , Predisposición Genética a la Enfermedad/genética , Infarto del Miocardio/genética , Receptores de LDL/genética , Factores de Edad , Edad de Inicio , Apolipoproteína A-V , Estudios de Casos y Controles , LDL-Colesterol/sangre , Enfermedad de la Arteria Coronaria/genética , Femenino , Genética de Población , Heterocigoto , Humanos , Masculino , Persona de Mediana Edad , Mutación/genética , Infarto del Miocardio/sangre , National Heart, Lung, and Blood Institute (U.S.) , Triglicéridos/sangre , Estados UnidosRESUMEN
Genetic similarity of spouses can reflect factors influencing mate choice, such as physical/behavioral characteristics, and patterns of social endogamy. Spouse correlations for both genetic ancestry and measured traits may impact genotype distributions (Hardy Weinberg and linkage equilibrium), and therefore genetic association studies. Here we evaluate white spouse-pairs from the Framingham Heart Study (FHS) original and offspring cohorts (N = 124 and 755, respectively) to explore spousal genetic similarity and its consequences. Two principal components (PCs) of the genome-wide association (GWA) data were identified, with the first (PC1) delineating clines of Northern/Western to Southern European ancestry and the second (PC2) delineating clines of Ashkenazi Jewish ancestry. In the original (older) cohort, there was a striking positive correlation between the spouses in PC1 (r = 0.73, P = 3x10(-22)) and also for PC2 (r = 0.80, P = 7x10(-29)). In the offspring cohort, the spouse correlations were lower but still highly significant for PC1 (r = 0.38, P = 7x10(-28)) and for PC2 (r = 0.45, P = 2x10(-39)). We observed significant Hardy-Weinberg disequilibrium for single nucleotide polymorphisms (SNPs) loading heavily on PC1 and PC2 across 3 generations, and also significant linkage disequilibrium between unlinked SNPs; both decreased with time, consistent with reduced ancestral endogamy over generations and congruent with theoretical calculations. Ignoring ancestry, estimates of spouse kinship have a mean significantly greater than 0, and more so in the earlier generations. Adjusting kinship estimates for genetic ancestry through the use of PCs led to a mean spouse kinship not different from 0, demonstrating that spouse genetic similarity could be fully attributed to ancestral assortative mating. These findings also have significance for studies of heritability that are based on distantly related individuals (kinship less than 0.05), as we also demonstrate the poor correlation of kinship estimates in that range when ancestry is or is not taken into account.
Asunto(s)
Estudio de Asociación del Genoma Completo , Esposos , Población Blanca/genética , Femenino , Genoma Humano , Genotipo , Humanos , Judíos/genética , Desequilibrio de Ligamiento , Masculino , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an n×n matrix, for n subjects) has computational complexity proportional to n3 . As SKAT is often used when n>104 , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome-wide association by class of histone marker.
Asunto(s)
Algoritmos , Estudios de Asociación Genética , Análisis de Secuencia de ADN , Cromosomas Humanos/metabolismo , Marcadores Genéticos , Histonas/metabolismo , Humanos , Estadística como Asunto , Factores de TiempoAsunto(s)
Enfermedad Coronaria , Hipercolesterolemia , Hiperlipoproteinemia Tipo II , Humanos , Hipercolesterolemia/epidemiología , Hipercolesterolemia/genética , Hiperlipoproteinemia Tipo II/complicaciones , Hiperlipoproteinemia Tipo II/diagnóstico , Hiperlipoproteinemia Tipo II/epidemiología , Genotipo , Enfermedad Coronaria/epidemiología , Enfermedad Coronaria/genéticaRESUMEN
Platelet production, maintenance, and clearance are tightly controlled processes indicative of platelets' important roles in hemostasis and thrombosis. Platelets are common targets for primary and secondary prevention of several conditions. They are monitored clinically by complete blood counts, specifically with measurements of platelet count (PLT) and mean platelet volume (MPV). Identifying genetic effects on PLT and MPV can provide mechanistic insights into platelet biology and their role in disease. Therefore, we formed the Blood Cell Consortium (BCX) to perform a large-scale meta-analysis of Exomechip association results for PLT and MPV in 157,293 and 57,617 individuals, respectively. Using the low-frequency/rare coding variant-enriched Exomechip genotyping array, we sought to identify genetic variants associated with PLT and MPV. In addition to confirming 47 known PLT and 20 known MPV associations, we identified 32 PLT and 18 MPV associations not previously observed in the literature across the allele frequency spectrum, including rare large effect (FCER1A), low-frequency (IQGAP2, MAP1A, LY75), and common (ZMIZ2, SMG6, PEAR1, ARFGAP3/PACSIN2) variants. Several variants associated with PLT/MPV (PEAR1, MRVI1, PTGES3) were also associated with platelet reactivity. In concurrent BCX analyses, there was overlap of platelet-associated variants with red (MAP1A, TMPRSS6, ZMIZ2) and white (PEAR1, ZMIZ2, LY75) blood cell traits, suggesting common regulatory pathways with shared genetic architecture among these hematopoietic lineages. Our large-scale Exomechip analyses identified previously undocumented associations with platelet traits and further indicate that several complex quantitative hematological, lipid, and cardiovascular traits share genetic factors.
Asunto(s)
Plaquetas/metabolismo , Exoma/genética , Variación Genética/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Volúmen Plaquetario Medio , Recuento de PlaquetasRESUMEN
Circulating blood cell counts and indices are important indicators of hematopoietic function and a number of clinical parameters, such as blood oxygen-carrying capacity, inflammation, and hemostasis. By performing whole-exome sequence association analyses of hematologic quantitative traits in 15,459 community-dwelling individuals, followed by in silico replication in up to 52,024 independent samples, we identified two previously undescribed coding variants associated with lower platelet count: a common missense variant in CPS1 (rs1047891, MAF = 0.33, discovery + replication p = 6.38 × 10(-10)) and a rare synonymous variant in GFI1B (rs150813342, MAF = 0.009, discovery + replication p = 1.79 × 10(-27)). By performing CRISPR/Cas9 genome editing in hematopoietic cell lines and follow-up targeted knockdown experiments in primary human hematopoietic stem and progenitor cells, we demonstrate an alternative splicing mechanism by which the GFI1B rs150813342 variant suppresses formation of a GFI1B isoform that preferentially promotes megakaryocyte differentiation and platelet production. These results demonstrate how unbiased studies of natural variation in blood cell traits can provide insight into the regulation of human hematopoiesis.
Asunto(s)
Empalme Alternativo/genética , Análisis Mutacional de ADN , Exoma/genética , Sitios Genéticos/genética , Hematopoyesis/genética , Proteínas Proto-Oncogénicas/genética , Proteínas Represoras/genética , Plaquetas/citología , Sistemas CRISPR-Cas , Edición Génica , Células Madre Hematopoyéticas/citología , Humanos , Megacariocitos/citología , Recuento de PlaquetasRESUMEN
BACKGROUND: The discovery of low-frequency coding variants affecting the risk of coronary artery disease has facilitated the identification of therapeutic targets. METHODS: Through DNA genotyping, we tested 54,003 coding-sequence variants covering 13,715 human genes in up to 72,868 patients with coronary artery disease and 120,770 controls who did not have coronary artery disease. Through DNA sequencing, we studied the effects of loss-of-function mutations in selected genes. RESULTS: We confirmed previously observed significant associations between coronary artery disease and low-frequency missense variants in the genes LPA and PCSK9. We also found significant associations between coronary artery disease and low-frequency missense variants in the genes SVEP1 (p.D2702G; minor-allele frequency, 3.60%; odds ratio for disease, 1.14; P=4.2×10(-10)) and ANGPTL4 (p.E40K; minor-allele frequency, 2.01%; odds ratio, 0.86; P=4.0×10(-8)), which encodes angiopoietin-like 4. Through sequencing of ANGPTL4, we identified 9 carriers of loss-of-function mutations among 6924 patients with myocardial infarction, as compared with 19 carriers among 6834 controls (odds ratio, 0.47; P=0.04); carriers of ANGPTL4 loss-of-function alleles had triglyceride levels that were 35% lower than the levels among persons who did not carry a loss-of-function allele (P=0.003). ANGPTL4 inhibits lipoprotein lipase; we therefore searched for mutations in LPL and identified a loss-of-function variant that was associated with an increased risk of coronary artery disease (p.D36N; minor-allele frequency, 1.9%; odds ratio, 1.13; P=2.0×10(-4)) and a gain-of-function variant that was associated with protection from coronary artery disease (p.S447*; minor-allele frequency, 9.9%; odds ratio, 0.94; P=2.5×10(-7)). CONCLUSIONS: We found that carriers of loss-of-function mutations in ANGPTL4 had triglyceride levels that were lower than those among noncarriers; these mutations were also associated with protection from coronary artery disease. (Funded by the National Institutes of Health and others.).
Asunto(s)
Angiopoyetinas/genética , Moléculas de Adhesión Celular/genética , Enfermedad de la Arteria Coronaria/genética , Lipoproteína Lipasa/genética , Mutación , Triglicéridos/sangre , Anciano , Proteína 4 Similar a la Angiopoyetina , Femenino , Técnicas de Genotipaje , Humanos , Lipoproteína Lipasa/antagonistas & inhibidores , Lipoproteína Lipasa/metabolismo , Masculino , Persona de Mediana Edad , Mutación Missense , Factores de Riesgo , Análisis de Secuencia de ADN , Triglicéridos/genéticaRESUMEN
RATIONALE: Therapies that inhibit CETP (cholesteryl ester transfer protein) have failed to demonstrate a reduction in risk for coronary heart disease (CHD). Human DNA sequence variants that truncate the CETP gene may provide insight into the efficacy of CETP inhibition. OBJECTIVE: To test whether protein-truncating variants (PTVs) at the CETP gene were associated with plasma lipid levels and CHD. METHODS AND RESULTS: We sequenced the exons of the CETP gene in 58 469 participants from 12 case-control studies (18 817 CHD cases, 39 652 CHD-free controls). We defined PTV as those that lead to a premature stop, disrupt canonical splice sites, or lead to insertions/deletions that shift frame. We also genotyped 1 Japanese-specific PTV in 27561 participants from 3 case-control studies (14 286 CHD cases, 13 275 CHD-free controls). We tested association of CETP PTV carrier status with both plasma lipids and CHD. Among 58 469 participants with CETP gene-sequencing data available, average age was 51.5 years and 43% were women; 1 in 975 participants carried a PTV at the CETP gene. Compared with noncarriers, carriers of PTV at CETP had higher high-density lipoprotein cholesterol (effect size, 22.6 mg/dL; 95% confidence interval, 18-27; P<1.0×10-4), lower low-density lipoprotein cholesterol (-12.2 mg/dL; 95% confidence interval, -23 to -0.98; P=0.033), and lower triglycerides (-6.3%; 95% confidence interval, -12 to -0.22; P=0.043). CETP PTV carrier status was associated with reduced risk for CHD (summary odds ratio, 0.70; 95% confidence interval, 0.54-0.90; P=5.1×10-3). CONCLUSIONS: Compared with noncarriers, carriers of PTV at CETP displayed higher high-density lipoprotein cholesterol, lower low-density lipoprotein cholesterol, lower triglycerides, and lower risk for CHD.
Asunto(s)
Proteínas de Transferencia de Ésteres de Colesterol/genética , Enfermedad Coronaria/diagnóstico , Enfermedad Coronaria/genética , Variación Genética/genética , Adulto , Anciano , Estudios de Casos y Controles , Proteínas de Transferencia de Ésteres de Colesterol/sangre , Enfermedad Coronaria/sangre , Femenino , Humanos , Masculino , Persona de Mediana Edad , Factores de RiesgoRESUMEN
A fundamental challenge to contemporary genetics is to distinguish rare missense alleles that disrupt protein functions from the majority of alleles neutral on protein activities. High-throughput experimental tools to securely discriminate between disruptive and non-disruptive missense alleles are currently missing. Here we establish a scalable cell-based strategy to profile the biological effects and likely disease relevance of rare missense variants in vitro. We apply this strategy to systematically characterize missense alleles in the low-density lipoprotein receptor (LDLR) gene identified through exome sequencing of 3,235 individuals and exome-chip profiling of 39,186 individuals. Our strategy reliably identifies disruptive missense alleles, and disruptive-allele carriers have higher plasma LDL-cholesterol (LDL-C). Importantly, considering experimental data refined the risk of rare LDLR allele carriers from 4.5- to 25.3-fold for high LDL-C, and from 2.1- to 20-fold for early-onset myocardial infarction. Our study generates proof-of-concept that systematic functional variant profiling may empower rare variant-association studies by orders of magnitude.