RESUMEN
Severe obesity is a rapidly growing global health threat. Although often attributed to unhealthy lifestyle choices or environmental factors, obesity is known to be heritable and highly polygenic; the majority of inherited susceptibility is related to the cumulative effect of many common DNA variants. Here we derive and validate a new polygenic predictor comprised of 2.1 million common variants to quantify this susceptibility and test this predictor in more than 300,000 individuals ranging from middle age to birth. Among middle-aged adults, we observe a 13-kg gradient in weight and a 25-fold gradient in risk of severe obesity across polygenic score deciles. In a longitudinal birth cohort, we note minimal differences in birthweight across score deciles, but a significant gradient emerged in early childhood and reached 12 kg by 18 years of age. This new approach to quantify inherited susceptibility to obesity affords new opportunities for clinical prevention and mechanistic assessment.
Asunto(s)
Peso Corporal , Herencia Multifactorial/genética , Obesidad/patología , Adolescente , Índice de Masa Corporal , Niño , Bases de Datos Factuales , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Recién Nacido , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Obesidad/genética , Factores de Riesgo , Índice de Severidad de la EnfermedadRESUMEN
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
Asunto(s)
Biología , Aprendizaje Automático , Redes Neurales de la Computación , Humanos , Biología/métodos , Análisis de Expresión Génica de una Sola Célula , Conjuntos de Datos como Asunto , Cromatina/genética , Cromatina/metabolismo , Cardiomiopatías/tratamiento farmacológico , Cardiomiopatías/genética , Cardiomiopatías/metabolismoRESUMEN
Heart failure encompasses a heterogeneous set of clinical features that converge on impaired cardiac contractile function1,2 and presents a growing public health concern. Previous work has highlighted changes in both transcription and protein expression in failing hearts3,4, but may overlook molecular changes in less prevalent cell types. Here we identify extensive molecular alterations in failing hearts at single-cell resolution by performing single-nucleus RNA sequencing of nearly 600,000 nuclei in left ventricle samples from 11 hearts with dilated cardiomyopathy and 15 hearts with hypertrophic cardiomyopathy as well as 16 non-failing hearts. The transcriptional profiles of dilated or hypertrophic cardiomyopathy hearts broadly converged at the tissue and cell-type level. Further, a subset of hearts from patients with cardiomyopathy harbour a unique population of activated fibroblasts that is almost entirely absent from non-failing samples. We performed a CRISPR-knockout screen in primary human cardiac fibroblasts to evaluate this fibrotic cell state transition; knockout of genes associated with fibroblast transition resulted in a reduction of myofibroblast cell-state transition upon TGFß1 stimulation for a subset of genes. Our results provide insights into the transcriptional diversity of the human heart in health and disease as well as new potential therapeutic targets and biomarkers for heart failure.
Asunto(s)
Cardiomiopatía Dilatada , Cardiomiopatía Hipertrófica , Núcleo Celular , Perfilación de la Expresión Génica , Insuficiencia Cardíaca , Análisis de la Célula Individual , Sistemas CRISPR-Cas , Cardiomiopatía Dilatada/genética , Cardiomiopatía Dilatada/patología , Cardiomiopatía Hipertrófica/genética , Cardiomiopatía Hipertrófica/patología , Estudios de Casos y Controles , Núcleo Celular/genética , Células Cultivadas , Técnicas de Inactivación de Genes , Insuficiencia Cardíaca/genética , Insuficiencia Cardíaca/patología , Ventrículos Cardíacos/metabolismo , Ventrículos Cardíacos/patología , Humanos , Miocardio/metabolismo , Miocardio/patología , Miofibroblastos/metabolismo , Miofibroblastos/patología , RNA-Seq , Transcripción Genética , Factor de Crecimiento Transformador beta1RESUMEN
Droplet-based single-cell assays, including single-cell RNA sequencing (scRNA-seq), single-nucleus RNA sequencing (snRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), generate considerable background noise counts, the hallmark of which is nonzero counts in cell-free droplets and off-target gene expression in unexpected cell types. Such systematic background noise can lead to batch effects and spurious differential gene expression results. Here we develop a deep generative model based on the phenomenology of noise generation in droplet-based assays. The proposed model accurately distinguishes cell-containing droplets from cell-free droplets, learns the background noise profile and provides noise-free quantification in an end-to-end fashion. We implement this approach in the scalable and robust open-source software package CellBender. Analysis of simulated data demonstrates that CellBender operates near the theoretically optimal denoising limit. Extensive evaluations using real datasets and experimental benchmarks highlight enhanced concordance between droplet-based single-cell data and established gene expression patterns, while the learned background noise profile provides evidence of degraded or uncaptured cell types.
Asunto(s)
ARN Nuclear Pequeño , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodosRESUMEN
Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency < 1%) predicted damaging coding variation by using sequence data from >170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.
Asunto(s)
Exoma , Variación Genética , Estudio de Asociación del Genoma Completo , Lípidos/sangre , Sistemas de Lectura Abierta , Alelos , Glucemia/genética , Estudios de Casos y Controles , Biología Computacional/métodos , Bases de Datos Genéticas , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Predisposición Genética a la Enfermedad , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Humanos , Metabolismo de los Lípidos/genética , Hígado/metabolismo , Hígado/patología , Anotación de Secuencia Molecular , Herencia Multifactorial , Fenotipo , Polimorfismo de Nucleótido SimpleRESUMEN
For Alzheimer's disease-a leading cause of dementia and global morbidity-improved identification of presymptomatic high-risk individuals and identification of new circulating biomarkers are key public health needs. Here, we tested the hypothesis that a polygenic predictor of risk for Alzheimer's disease would identify a subset of the population with increased risk of clinically diagnosed dementia, subclinical neurocognitive dysfunction, and a differing circulating proteomic profile. Using summary association statistics from a recent genome-wide association study, we first developed a polygenic predictor of Alzheimer's disease comprised of 7.1 million common DNA variants. We noted a 7.3-fold (95% CI 4.8 to 11.0; p < 0.001) gradient in risk across deciles of the score among 288,289 middle-aged participants of the UK Biobank study. In cross-sectional analyses stratified by age, minimal differences in risk of Alzheimer's disease and performance on a digit recall test were present according to polygenic score decile at age 50 years, but significant gradients emerged by age 65. Similarly, among 30,541 participants of the Mass General Brigham Biobank, we again noted no significant differences in Alzheimer's disease diagnosis at younger ages across deciles of the score, but for those over 65 years we noted an odds ratio of 2.0 (95% CI 1.3 to 3.2; p = 0.002) in the top versus bottom decile of the polygenic score. To understand the proteomic signature of inherited risk, we performed aptamer-based profiling in 636 blood donors (mean age 43 years) with very high or low polygenic scores. In addition to the well-known apolipoprotein E biomarker, this analysis identified 27 additional proteins, several of which have known roles related to disease pathogenesis. Differences in protein concentrations were consistent even among the youngest subset of blood donors (mean age 33 years). Of these 28 proteins, 7 of the 8 proteins with concentrations available were similarly associated with the polygenic score in participants of the Multi-Ethnic Study of Atherosclerosis. These data highlight the potential for a DNA-based score to identify high-risk individuals during the prolonged presymptomatic phase of Alzheimer's disease and to enable biomarker discovery based on profiling of young individuals in the extremes of the score distribution.
Asunto(s)
Enfermedad de Alzheimer , Adulto , Anciano , Enfermedad de Alzheimer/patología , Biomarcadores , Estudios Transversales , Estudio de Asociación del Genoma Completo , Humanos , Persona de Mediana Edad , ProteómicaRESUMEN
[This corrects the article DOI: 10.1371/journal.pgen.1008629.].
RESUMEN
BACKGROUND: Mural cells in ascending aortic aneurysms undergo phenotypic changes that promote extracellular matrix destruction and structural weakening. To explore this biology, we analyzed the transcriptional features of thoracic aortic tissue. METHODS: Single-nuclear RNA sequencing was performed on 13 samples from human donors, 6 with thoracic aortic aneurysm, and 7 without aneurysm. Individual transcriptomes were then clustered based on transcriptional profiles. Clusters were used for between-disease differential gene expression analyses, subcluster analysis, and analyzed for intersection with genetic aortic trait data. RESULTS: We sequenced 71 689 nuclei from human thoracic aortas and identified 14 clusters, aligning with 11 cell types, predominantly vascular smooth muscle cells (VSMCs) consistent with aortic histology. With unbiased methodology, we found 7 vascular smooth muscle cell and 6 fibroblast subclusters. Differentially expressed genes analysis revealed a vascular smooth muscle cell group accounting for the majority of differential gene expression. Fibroblast populations in aneurysm exhibit distinct behavior with almost complete disappearance of quiescent fibroblasts. Differentially expressed genes were used to prioritize genes at aortic diameter and distensibility genome-wide association study loci highlighting the genes JUN, LTBP4 (latent transforming growth factor beta-binding protein 1), and IL34 (interleukin 34) in fibroblasts, ENTPD1, PDLIM5 (PDZ and LIM domain 5), ACTN4 (alpha-actinin-4), and GLRX in vascular smooth muscle cells, as well as LRP1 in macrophage populations. CONCLUSIONS: Using nuclear RNA sequencing, we describe the cellular diversity of healthy and aneurysmal human ascending aorta. Sporadic aortic aneurysm is characterized by differential gene expression within known cellular classes rather than by the appearance of novel cellular forms. Single-nuclear RNA sequencing of aortic tissue can be used to prioritize genes at aortic trait loci.
Asunto(s)
Aneurisma de la Aorta Torácica , Aneurisma de la Aorta , Humanos , Estudio de Asociación del Genoma Completo , Músculo Liso Vascular/metabolismo , Actinina/genética , ARN Nuclear/metabolismo , Aorta/patología , Miocitos del Músculo Liso/metabolismo , Aneurisma de la Aorta Torácica/patología , Aneurisma de la Aorta/metabolismo , Análisis de Secuencia de ARN , Factor de Crecimiento Transformador beta/metabolismoRESUMEN
Analyzing 12,361 all-cause cirrhosis cases and 790,095 controls from eight cohorts, we identify a common missense variant in the Mitochondrial Amidoxime Reducing Component 1 gene (MARC1 p.A165T) that associates with protection from all-cause cirrhosis (OR 0.91, p = 2.3*10-11). This same variant also associates with lower levels of hepatic fat on computed tomographic imaging and lower odds of physician-diagnosed fatty liver as well as lower blood levels of alanine transaminase (-0.025 SD, 3.7*10-43), alkaline phosphatase (-0.025 SD, 1.2*10-37), total cholesterol (-0.030 SD, p = 1.9*10-36) and LDL cholesterol (-0.027 SD, p = 5.1*10-30) levels. We identified a series of additional MARC1 alleles (low-frequency missense p.M187K and rare protein-truncating p.R200Ter) that also associated with lower cholesterol levels, liver enzyme levels and reduced risk of cirrhosis (0 cirrhosis cases for 238 R200Ter carriers versus 17,046 cases of cirrhosis among 759,027 non-carriers, p = 0.04) suggesting that deficiency of the MARC1 enzyme may lower blood cholesterol levels and protect against cirrhosis.
Asunto(s)
Hígado Graso/genética , Hígado Graso/prevención & control , Predisposición Genética a la Enfermedad , Cirrosis Hepática/genética , Cirrosis Hepática/prevención & control , Proteínas Mitocondriales/genética , Mutación Missense/genética , Oxidorreductasas/genética , Alelos , LDL-Colesterol/sangre , Enfermedad de la Arteria Coronaria/genética , Conjuntos de Datos como Asunto , Hígado Graso/sangre , Hígado Graso/enzimología , Femenino , Homocigoto , Humanos , Hígado/enzimología , Cirrosis Hepática/sangre , Cirrosis Hepática/enzimología , Cirrosis Hepática Alcohólica/sangre , Cirrosis Hepática Alcohólica/enzimología , Cirrosis Hepática Alcohólica/genética , Cirrosis Hepática Alcohólica/prevención & control , Mutación con Pérdida de Función/genética , Masculino , Persona de Mediana EdadRESUMEN
RATIONALE: Genome-wide association studies have identified a large number of common variants (single-nucleotide polymorphisms) associated with atrial fibrillation (AF). These variants are located mainly in noncoding regions of the genome and likely include variants that modulate the function of transcriptional regulatory elements (REs) such as enhancers. However, the actual REs modulated by variants and the target genes of such REs remain to be identified. Thus, the biological mechanisms by which genetic variation promotes AF has thus far remained largely unexplored. OBJECTIVE: To identify REs in genome-wide association study loci that are influenced by AF-associated variants. METHODS AND RESULTS: We screened 2.45 Mbp of human genomic DNA containing 12 strongly AF-associated loci for RE activity using self-transcribing active regulatory region sequencing and a recently generated monoclonal line of conditionally immortalized rat atrial myocytes. We identified 444 potential REs, 55 of which contain AF-associated variants (P<10-8). Subsequently, using an adaptation of the self-transcribing active regulatory region sequencing approach, we identified 24 variant REs with allele-specific regulatory activity. By mining available chromatin conformation data, the possible target genes of these REs were mapped. To define the physiological function and target genes of such REs, we deleted the orthologue of an RE containing noncoding variants in the Hcn4 (potassium/sodium hyperpolarization-activated cyclic nucleotide-gated channel 4) locus of the mouse genome. Mice heterozygous for the RE deletion showed bradycardia, sinus node dysfunction, and selective loss of Hcn4 expression. CONCLUSIONS: We have identified REs at multiple genetic loci for AF and found that loss of an RE at the HCN4 locus results in sinus node dysfunction and reduced gene expression. Our approach can be broadly applied to facilitate the identification of human disease-relevant REs and target genes at cardiovascular genome-wide association studies loci.
Asunto(s)
Fibrilación Atrial/genética , Elementos de Facilitación Genéticos , Animales , Fibrilación Atrial/metabolismo , Sitios Genéticos , Genoma Humano , Humanos , Canales Regulados por Nucleótidos Cíclicos Activados por Hiperpolarización/genética , Canales Regulados por Nucleótidos Cíclicos Activados por Hiperpolarización/metabolismo , Ratones , Ratones Endogámicos C57BL , Proteínas Musculares/genética , Proteínas Musculares/metabolismo , Canales de Potasio/genética , Canales de Potasio/metabolismoRESUMEN
RATIONALE: Genome-wide association studies have identified over 100 genetic loci for atrial fibrillation (AF); recent work described an association between loss-of-function (LOF) variants in TTN and early-onset AF. OBJECTIVE: We sought to determine the contribution of rare and common genetic variation to AF risk in the general population. METHODS: The UK Biobank is a population-based study of 500 000 individuals including a subset with genome-wide genotyping and exome sequencing. In this case-control study, we included AF cases and controls of genetically determined white-European ancestry; analyses were performed using a logistic mixed-effects model adjusting for age, sex, the first 4 principal components of ancestry, empirical relationships, and case-control imbalance. An exome-wide, gene-based burden analysis was performed to examine the relationship between AF and rare, high-confidence LOF variants in genes with ≥10 LOF carriers. A polygenic risk score for AF was estimated using the LDpred algorithm. We then compared the contribution of AF polygenic risk score and LOF variants to AF risk. RESULTS: The study included 1546 AF cases and 41 593 controls. In an analysis of 9099 genes with sufficient LOF variant carriers, a significant association between AF and rare LOF variants was observed in a single gene, TTN (odds ratio, 2.71, P=2.50×10-8). The association with AF was more significant (odds ratio, 6.15, P=3.26×10-14) when restricting to LOF variants located in exons highly expressed in cardiac tissue (TTNLOF). Overall, 0.44% of individuals carried TTNLOF variants, of whom 14% had AF. Among individuals in the highest 0.44% of the AF polygenic risk score only 9.3% had AF. In contrast, the AF polygenic risk score explained 4.7% of the variance in AF susceptibility, while TTNLOF variants only accounted for 0.2%. CONCLUSIONS: Both monogenic and polygenic factors contribute to AF risk in the general population. While rare TTNLOF variants confer a substantial AF penetrance, the additive effect of many common variants explains a larger proportion of genetic susceptibility to AF.
Asunto(s)
Fibrilación Atrial/genética , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Anciano , Conectina/genética , Bases de Datos Genéticas , Exoma , Femenino , Humanos , Mutación con Pérdida de Función , Masculino , Persona de Mediana Edad , PenetranciaRESUMEN
BACKGROUND: The human heart requires a complex ensemble of specialized cell types to perform its essential function. A greater knowledge of the intricate cellular milieu of the heart is critical to increase our understanding of cardiac homeostasis and pathology. As recent advances in low-input RNA sequencing have allowed definitions of cellular transcriptomes at single-cell resolution at scale, we have applied these approaches to assess the cellular and transcriptional diversity of the nonfailing human heart. METHODS: Microfluidic encapsulation and barcoding was used to perform single nuclear RNA sequencing with samples from 7 human donors, selected for their absence of overt cardiac disease. Individual nuclear transcriptomes were then clustered based on transcriptional profiles of highly variable genes. These clusters were used as the basis for between-chamber and between-sex differential gene expression analyses and intersection with genetic and pharmacologic data. RESULTS: We sequenced the transcriptomes of 287 269 single cardiac nuclei, revealing 9 major cell types and 20 subclusters of cell types within the human heart. Cellular subclasses include 2 distinct groups of resident macrophages, 4 endothelial subtypes, and 2 fibroblast subsets. Comparisons of cellular transcriptomes by cardiac chamber or sex reveal diversity not only in cardiomyocyte transcriptional programs but also in subtypes involved in extracellular matrix remodeling and vascularization. Using genetic association data, we identified strong enrichment for the role of cell subtypes in cardiac traits and diseases. Intersection of our data set with genes on cardiac clinical testing panels and the druggable genome reveals striking patterns of cellular specificity. CONCLUSIONS: Using large-scale single nuclei RNA sequencing, we defined the transcriptional and cellular diversity in the normal human heart. Our identification of discrete cell subtypes and differentially expressed genes within the heart will ultimately facilitate the development of new therapeutics for cardiovascular diseases.
Asunto(s)
Miocardio/citología , Transcripción Genética , Adipocitos/metabolismo , Adulto , Anciano , Fármacos Cardiovasculares/farmacología , Fármacos Cardiovasculares/uso terapéutico , Células Endoteliales/clasificación , Células Endoteliales/metabolismo , Fibroblastos/clasificación , Fibroblastos/metabolismo , Ontología de Genes , Corazón/inervación , Atrios Cardíacos/citología , Cardiopatías/tratamiento farmacológico , Ventrículos Cardíacos/citología , Homeostasis , Humanos , Subgrupos Linfocitarios/metabolismo , Macrófagos/clasificación , Macrófagos/metabolismo , Técnicas Analíticas Microfluídicas , Persona de Mediana Edad , Miocardio/metabolismo , Miocitos Cardíacos/metabolismo , Miocitos del Músculo Liso/metabolismo , Pericitos/metabolismo , RNA-Seq , Caracteres Sexuales , Análisis de la Célula Individual , TranscriptomaRESUMEN
OBJECTIVE: To determine the relationship of a genome-wide polygenic score for coronary artery disease (GPSCAD) with lifetime trajectories of CAD risk, directly compare its predictive capacity to traditional risk factors, and assess its interplay with the Pooled Cohort Equations (PCE) clinical risk estimator. Approach and Results: We studied GPSCAD in 28 556 middle-aged participants of the Malmö Diet and Cancer Study, of whom 4122 (14.4%) developed CAD over a median follow-up of 21.3 years. A pronounced gradient in lifetime risk of CAD was observed-16% for those in the lowest GPSCAD decile to 48% in the highest. We evaluated the discriminative capacity of the GPSCAD-as assessed by change in the C-statistic from a baseline model including age and sex-among 5685 individuals with PCE risk estimates available. The increment for the GPSCAD (+0.045, P<0.001) was higher than for any of 11 traditional risk factors (range +0.007 to +0.032). Minimal correlation was observed between GPSCAD and 10-year risk defined by the PCE (r=0.03), and addition of GPSCAD improved the C-statistic of the PCE model by 0.026. A significant gradient in lifetime risk was observed for the GPSCAD, even among individuals within a given PCE clinical risk stratum. We replicated key findings-noting strikingly consistent results-in 325 003 participants of the UK Biobank. CONCLUSIONS: GPSCAD-a risk estimator available from birth-stratifies individuals into varying trajectories of clinical risk for CAD. Implementation of GPSCAD may enable identification of high-risk individuals early in life, decades in advance of manifest risk factors or disease.
Asunto(s)
Enfermedad de la Arteria Coronaria/genética , Herencia Multifactorial , Adulto , Anciano , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Enfermedad de la Arteria Coronaria/epidemiología , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Factores de Riesgo de Enfermedad Cardiaca , Herencia , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Fenotipo , Pronóstico , Medición de Riesgo , Suecia/epidemiología , Factores de Tiempo , Reino Unido/epidemiologíaRESUMEN
BACKGROUND: Heart failure (HF) is a morbid and heritable disorder for which the biological mechanisms are incompletely understood. We therefore examined genetic associations with HF in a large national biobank, and assessed whether refined phenotypic classification would facilitate genetic discovery. METHODS: We defined all-cause HF among 488 010 participants from the UK Biobank and performed a genome-wide association analysis. We refined the HF phenotype by classifying individuals with left ventricular dysfunction and without coronary artery disease as having nonischemic cardiomyopathy (NICM), and repeated a genetic association analysis. We then pursued replication of lead HF and NICM variants in independent cohorts, and performed adjusted association analyses to assess whether identified genetic associations were mediated through clinical HF risk factors. In addition, we tested rare, loss-of-function mutations in 24 known dilated cardiomyopathy genes for association with HF and NICM. Finally, we examined associations between lead variants and left ventricular structure and function among individuals without HF using cardiac magnetic resonance imaging (n=4158) and echocardiographic data (n=30 201). RESULTS: We identified 7382 participants with all-cause HF in the UK Biobank. Genome-wide association analysis of all-cause HF identified several suggestive loci (P<1×10-6), the majority linked to upstream HF risk factors, ie, coronary artery disease (CDKN2B-AS1 and MAP3K7CL) and atrial fibrillation (PITX2). Refining the HF phenotype yielded a subset of 2038 NICM cases. In contrast to all-cause HF, genetic analysis of NICM revealed suggestive loci that have been implicated in dilated cardiomyopathy (BAG3, CLCNKA-ZBTB17). Dilated cardiomyopathy signals arising from our NICM analysis replicated in independent cohorts, persisted after HF risk factor adjustment, and were associated with indices of left ventricular dysfunction in individuals without clinical HF. In addition, analyses of loss-of-function variants implicated BAG3 as a disease susceptibility gene for NICM (loss-of-function variant carrier frequency=0.01%; odds ratio,12.03; P=3.62×10-5). CONCLUSIONS: We found several distinct genetic mechanisms of all-cause HF in a national biobank that reflect well-known HF risk factors. Phenotypic refinement to a NICM subtype appeared to facilitate the discovery of genetic signals that act independently of clinical HF risk factors and that are associated with subclinical left ventricular dysfunction.
RESUMEN
BACKGROUND: The relative prevalence and clinical importance of monogenic mutations related to familial hypercholesterolemia and of high polygenic score (cumulative impact of many common variants) pathways for early-onset myocardial infarction remain uncertain. Whole-genome sequencing enables simultaneous ascertainment of both monogenic mutations and polygenic score for each individual. METHODS: We performed deep-coverage whole-genome sequencing of 2081 patients from 4 racial subgroups hospitalized in the United States with early-onset myocardial infarction (age ≤55 years) recruited with a 2:1 female-to-male enrollment design. We compared these genomes with those of 3761 population-based control subjects. We first identified individuals with a rare, monogenic mutation related to familial hypercholesterolemia. Second, we calculated a recently developed polygenic score of 6.6 million common DNA variants to quantify the cumulative susceptibility conferred by common variants. We defined high polygenic score as the top 5% of the control distribution because this cutoff has previously been shown to confer similar risk to that of familial hypercholesterolemia mutations. RESULTS: The mean age of the 2081 patients presenting with early-onset myocardial infarction was 48 years, and 66% were female. A familial hypercholesterolemia mutation was present in 36 of these patients (1.7%) and was associated with a 3.8-fold (95% CI, 2.1-6.8; P<0.001) increased odds of myocardial infarction. Of the patients with early-onset myocardial infarction, 359 (17.3%) carried a high polygenic score, associated with a 3.7-fold (95% CI, 3.1-4.6; P<0.001) increased odds. Mean estimated untreated low-density lipoprotein cholesterol was 206 mg/dL in those with a familial hypercholesterolemia mutation, 132 mg/dL in those with high polygenic score, and 122 mg/dL in those in the remainder of the population. Although associated with increased risk in all racial groups, high polygenic score demonstrated the strongest association in white participants ( P for heterogeneity=0.008). CONCLUSIONS: Both familial hypercholesterolemia mutations and high polygenic score are associated with a >3-fold increased odds of early-onset myocardial infarction. However, high polygenic score has a 10-fold higher prevalence among patients presents with early-onset myocardial infarction. CLINICAL TRIAL REGISTRATION: URL: https://www.clinicaltrials.gov . Unique identifier: NCT00597922.
Asunto(s)
Predisposición Genética a la Enfermedad , Genoma Humano , Hiperlipoproteinemia Tipo II/genética , Herencia Multifactorial , Infarto del Miocardio/genética , Anciano , LDL-Colesterol/genética , Femenino , Humanos , Hiperlipoproteinemia Tipo II/sangre , Masculino , Persona de Mediana Edad , Infarto del Miocardio/sangre , Secuenciación Completa del GenomaRESUMEN
High-throughput metabolomics using liquid chromatography and mass spectrometry (LC/MS) provides a useful method to identify biomarkers of disease and explore biological systems. However, the majority of metabolic features detected from untargeted metabolomics experiments have unknown ion signatures, making it critical that data should be thoroughly quality controlled to avoid analyzing false signals. Here, we present a postalignment method relying on intermittent pooled study samples to separate genuine metabolic features from potential measurement artifacts. We apply the method to lipid metabolite data from the PREDIMED (PREvención con DIeta MEDi-terránea) study to demonstrate clear removal of measurement artifacts. The method is publicly available as the R package MetProc, available on CRAN under the GPL-v2 license.
Asunto(s)
Biomarcadores/metabolismo , Lípidos/aislamiento & purificación , Metabolómica/métodos , Artefactos , Cromatografía Liquida , Lípidos/química , Metaboloma/genética , Espectrometría de Masas en TándemRESUMEN
Background and Purpose- Coagulation factor XI (FXI) is a novel target for antithrombotic therapy addressed by various therapeutic modalities currently in clinical development. The expected magnitude of thrombotic event reduction mediated by targeting FXI is unclear. Methods- We analyzed the association of 2 common genetic variants, which alter levels of FXI, with a range of human phenotypes. We combined variants into a genetic score standardized to a 30% increase in relative activated partial thromboplastin time, equivalent to what can be achieved with pharmacological FXI reduction. Using data from 371 695 participants in the United Kingdom Biobank and 2 large-scale genome-wide association studies, we examined the effect of this FXI score on thrombotic and bleeding end points. Results- Genetic disposition to lower FXI levels was associated with reduced risks of venous thrombosis (odds ratio, 95% CI; P value; odds ratio=0.1, 0.07-0.14; P=3×10-43) and ischemic stroke (odds ratio=0.47, 0.36-0.61; P=2×10-8) but not with major bleeding (odds ratio=0.7, 0.45-1.04; P=0.0739). The observed relative risk reductions were consistent within a range of subgroups that were at high risk for thrombosis. Consistently, we observed higher absolute risk reductions conferred by genetically lower FXI levels in high-risk subgroups, such as patients with atrial fibrillation. Conclusions- Human genetic data suggest that pharmacological inhibition of FXI may achieve considerable reductions in ischemic stroke risk without clear evidence for an associated risk of major bleeding. The quantitative framework developed can be used to support the estimation of achievable risk reductions with pharmacological modulation of FXI.
Asunto(s)
Bancos de Muestras Biológicas , Factor XI , Variación Genética , Hemorragia , Accidente Cerebrovascular , Trombosis de la Vena , Adulto , Estudios Transversales , Factor XI/genética , Factor XI/metabolismo , Femenino , Estudio de Asociación del Genoma Completo , Hemorragia/sangre , Hemorragia/genética , Genética Humana , Humanos , Masculino , Persona de Mediana Edad , Tiempo de Tromboplastina Parcial , Factores de Riesgo , Accidente Cerebrovascular/sangre , Accidente Cerebrovascular/genética , Reino Unido , Trombosis de la Vena/sangre , Trombosis de la Vena/genéticaRESUMEN
Importance: Atrial fibrillation (AF) is the most common arrhythmia affecting 1% of the population. Young individuals with AF have a strong genetic association with the disease, but the mechanisms remain incompletely understood. Objective: To perform large-scale whole-genome sequencing to identify genetic variants related to AF. Design, Setting, and Participants: The National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine Program includes longitudinal and cohort studies that underwent high-depth whole-genome sequencing between 2014 and 2017 in 18â¯526 individuals from the United States, Mexico, Puerto Rico, Costa Rica, Barbados, and Samoa. This case-control study included 2781 patients with early-onset AF from 9 studies and identified 4959 controls of European ancestry from the remaining participants. Results were replicated in the UK Biobank (346â¯546 participants) and the MyCode Study (42â¯782 participants). Exposures: Loss-of-function (LOF) variants in genes at AF loci and common genetic variation across the whole genome. Main Outcomes and Measures: Early-onset AF (defined as AF onset in persons <66 years of age). Due to multiple testing, the significance threshold for the rare variant analysis was P = 4.55 × 10-3. Results: Among 2781 participants with early-onset AF (the case group), 72.1% were men, and the mean (SD) age of AF onset was 48.7 (10.2) years. Participants underwent whole-genome sequencing at a mean depth of 37.8 fold and mean genome coverage of 99.1%. At least 1 LOF variant in TTN, the gene encoding the sarcomeric protein titin, was present in 2.1% of case participants compared with 1.1% in control participants (odds ratio [OR], 1.76 [95% CI, 1.04-2.97]). The proportion of individuals with early-onset AF who carried a LOF variant in TTN increased with an earlier age of AF onset (P value for trend, 4.92 × 10-4), and 6.5% of individuals with AF onset prior to age 30 carried a TTN LOF variant (OR, 5.94 [95% CI, 2.64-13.35]; P = 1.65 × 10-5). The association between TTN LOF variants and AF was replicated in an independent study of 1582 patients with early-onset AF (cases) and 41â¯200 control participants (OR, 2.16 [95% CI, 1.19-3.92]; P = .01). Conclusions and Relevance: In a case-control study, there was a statistically significant association between an LOF variant in the TTN gene and early-onset AF, with the variant present in a small percentage of participants with early-onset AF (the case group). Further research is necessary to understand whether this is a causal relationship.
Asunto(s)
Fibrilación Atrial/genética , Conectina/genética , Mutación con Pérdida de Función , Adulto , Edad de Inicio , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Heterocigoto , Humanos , Masculino , Persona de Mediana Edad , Control de CalidadRESUMEN
The number of HIV cases attributed to heterosexual contact and the proportion of women among HIV positive individuals has increased worldwide. Russia is a country with the highest rates of newly diagnosed HIV infections in the region, and the infection spreads beyond traditional risk groups. While young women are affected disproportionately, knowledge of HIV risk behaviors in women in the general population remains limited. The objectives of this study were to identify patterns of behaviors that place women of childbearing age at high risk for HIV transmission and determine whether socio-demographic characteristics and alcohol use are predictive of the risk pattern. A total of 708 non-pregnant women, aged between 18 and 44 years, who were at risk for an alcohol-exposed pregnancy were enrolled in two regions in Russia. Participants completed a structured interview focused on HIV risk behaviors, including risky sexual behavior and alcohol and drug use. Latent class analysis was utilized to examine associations between HIV risk and other demographic and alcohol use characteristics and to identify patterns of risk among women. Three classes were identified. 34.93% of participants were at high risk, combining their risk behaviors, e.g., having multiple sexual partners, with high partner's risk associated with partner's drug use (class I). Despite reporting self-perceived risk for HIV/STI, this class of participants was unlikely to utilize adequate protection (i.e., condom use). The second high risk class included 13.19% of participants who combined their risky sexual behaviors, i.e., multiple sexual partners and having STDs, with partner's risk that included partner's imprisonment and partner's sex with other women (class II). Participants in this class were likely to utilize protection/condoms. Finally, 51.88% of participants were at lower risk, which was associated primarily with their partners' risk, and these participants utilized protection (class III). The odds of being in class I compared with class III were 3.3 (95% CI [1.06, 10.38]) times higher for those women who had Alcohol Use Disorders Identification Test scores ≥ 8 than those who had lower scores, and were 3.9 (95% CI [1.69, 8.97]) times higher for those who used alcohol before sex than those who did not. In addition, women who drank more days per week were 1.36 times more likely to be in class II than in class III. The study informs prevention by identifying specific population groups and targets for interventions. Alcohol use is a significant predictor and an overarching factor of HIV risk in women. Since at-risk drinking is common among young Russian women, alcohol risk reduction should be an essential component of HIV prevention efforts.