RESUMEN
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
Asunto(s)
Epigenómica , Enfermedades del Sistema Inmune/genética , Monocitos/metabolismo , Neutrófilos/metabolismo , Linfocitos T/metabolismo , Transcripción Genética , Adulto , Anciano , Empalme Alternativo , Femenino , Predisposición Genética a la Enfermedad , Células Madre Hematopoyéticas/metabolismo , Código de Histonas , Humanos , Masculino , Persona de Mediana Edad , Sitios de Carácter Cuantitativo , Adulto JovenRESUMEN
Accurate predictive models of future disease onset are crucial for effective preventive healthcare, yet longitudinal data sets linking early risk factors to subsequent health outcomes are limited. To overcome this challenge, we introduce a novel framework, Predictive Risk modeling using Mendelian Randomization (PRiMeR), which utilizes genetic effects as supervisory signals to learn disease risk predictors without relying on longitudinal data. To do so, PRiMeR leverages risk factors and genetic data from a healthy cohort, along with results from genome-wide association studies of diseases of interest. After training, the learned predictor can be used to assess risk for new patients solely based on risk factors. We validate PRiMeR through comprehensive simulations and in future type 2 diabetes predictions in UK Biobank participants without diabetes, using follow-up onset labels for validation. Moreover, we apply PRiMeR to predict future Alzheimer's disease onset from brain imaging biomarkers and future Parkinson's disease onset from accelerometer-derived traits. Overall, with PRiMeR we offer a new perspective in predictive modeling, showing it is possible to learn risk predictors leveraging genetics rather than longitudinal data.
Asunto(s)
Enfermedad de Alzheimer , Diabetes Mellitus Tipo 2 , Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Humanos , Análisis de la Aleatorización Mendeliana/métodos , Diabetes Mellitus Tipo 2/genética , Estudio de Asociación del Genoma Completo/métodos , Enfermedad de Alzheimer/genética , Factores de Riesgo , Predisposición Genética a la Enfermedad , Enfermedad de Parkinson/genética , Medición de Riesgo/métodos , Polimorfismo de Nucleótido SimpleRESUMEN
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Asunto(s)
Variación Genética , Lípidos , Simulación por Computador , Estudios de Asociación Genética , Fenotipo , Estudio de Asociación del Genoma CompletoRESUMEN
This corrects the article DOI: 10.1038/nature22403.
RESUMEN
Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5-46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.
Asunto(s)
Variación Genética/genética , Células Madre Pluripotentes Inducidas/metabolismo , Células Cultivadas , Reprogramación Celular/genética , Variaciones en el Número de Copia de ADN/genética , Regulación de la Expresión Génica/genética , Genotipo , Humanos , Especificidad de Órganos , Fenotipo , Control de Calidad , Sitios de Carácter Cuantitativo/genética , Transcriptoma/genéticaRESUMEN
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genéticaRESUMEN
Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.
Asunto(s)
Epistasis Genética , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo/genética , Proteína C-Reactiva/genética , Genotipo , Humanos , Modelos Genéticos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido SimpleRESUMEN
Assessing the impact of the social environment on health and disease is challenging. As social effects are in part determined by the genetic makeup of social partners, they can be studied from associations between genotypes of one individual and phenotype of another (social genetic effects, SGE, also called indirect genetic effects). For the first time we quantified the contribution of SGE to more than 100 organismal phenotypes and genome-wide gene expression measured in laboratory mice. We find that genetic variation in cage mates (i.e. SGE) contributes to variation in organismal and molecular measures related to anxiety, wound healing, immune function, and body weight. Social genetic effects explained up to 29% of phenotypic variance, and for several traits their contribution exceeded that of direct genetic effects (effects of an individual's genotypes on its own phenotype). Importantly, we show that ignoring SGE can severely bias estimates of direct genetic effects (heritability). Thus SGE may be an important source of "missing heritability" in studies of complex traits in human populations. In summary, our study uncovers an important contribution of the social environment to phenotypic variation, sets the basis for using SGE to dissect social effects, and identifies an opportunity to improve studies of direct genetic effects.
Asunto(s)
Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad/genética , Variación Genética , Medio Social , Animales , Peso Corporal/genética , Genotipo , Inmunidad/genética , Ratones , Ratones Endogámicos C57BL , Ratones Endogámicos DBA , Carácter Cuantitativo Heredable , Cicatrización de Heridas/genéticaRESUMEN
Set tests are a powerful approach for genome-wide association testing between groups of genetic variants and quantitative traits. We describe mtSet (http://github.com/PMBio/limix), a mixed-model approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness. mtSet effectively combines the benefits of set tests with multi-trait modeling and is computationally efficient, enabling genetic analysis of large cohorts (up to 500,000 individuals) and multiple traits.
Asunto(s)
Biología Computacional/métodos , Algoritmos , Alelos , Animales , Calibración , Simulación por Computador , Interpretación Estadística de Datos , Frecuencia de los Genes , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Internet , Leucocitos/citología , Modelos Estadísticos , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Ratas , Análisis de Regresión , Reproducibilidad de los Resultados , Programas InformáticosRESUMEN
BACKGROUND: The phenotype of an individual can be affected not only by the individual's own genotypes, known as direct genetic effects (DGE), but also by genotypes of interacting partners, indirect genetic effects (IGE). IGE have been detected using polygenic models in multiple species, including laboratory mice and humans. However, the underlying mechanisms remain largely unknown. Genome-wide association studies of IGE (igeGWAS) can point to IGE genes, but have not yet been applied to non-familial IGE arising from "peers" and affecting biomedical phenotypes. In addition, the extent to which igeGWAS will identify loci not identified by dgeGWAS remains an open question. Finally, findings from igeGWAS have not been confirmed by experimental manipulation. RESULTS: We leverage a dataset of 170 behavioral, physiological, and morphological phenotypes measured in 1812 genetically heterogeneous laboratory mice to study IGE arising between same-sex, adult, unrelated mice housed in the same cage. We develop and apply methods for igeGWAS in this context and identify 24 significant IGE loci for 17 phenotypes (FDR < 10%). We observe no overlap between IGE loci and DGE loci for the same phenotype, which is consistent with the moderate genetic correlations between DGE and IGE for the same phenotype estimated using polygenic models. Finally, we fine-map seven significant IGE loci to individual genes and find supportive evidence in an experiment with a knockout model that Epha4 gives rise to IGE on stress-coping strategy and wound healing. CONCLUSIONS: Our results demonstrate the potential for igeGWAS to identify IGE genes and shed light into the mechanisms of peer influence.
Asunto(s)
Interacción Gen-Ambiente , Genotipo , Herencia Multifactorial , Fenotipo , Receptor EphA4/genética , Estrés Fisiológico/genética , Animales , Conjuntos de Datos como Asunto , Femenino , Expresión Génica , Heterogeneidad Genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Receptor EphA4/metabolismo , Cicatrización de Heridas/genéticaRESUMEN
Different exposures, including diet, physical activity, or external conditions can contribute to genotype-environment interactions (G×E). Although high-dimensional environmental data are increasingly available and multiple exposures have been implicated with G×E at the same loci, multi-environment tests for G×E are not established. Here, we propose the structured linear mixed model (StructLMM), a computationally efficient method to identify and characterize loci that interact with one or more environments. After validating our model using simulations, we applied StructLMM to body mass index in the UK Biobank, where our model yields previously known and novel G×E signals. Finally, in an application to a large blood eQTL dataset, we demonstrate that StructLMM can be used to study interactions with hundreds of environmental variables.
Asunto(s)
Interacción Gen-Ambiente , Algoritmos , Simulación por Computador , Ambiente , Genotipo , Humanos , Modelos Lineales , Modelos Genéticos , Sitios de Carácter Cuantitativo/genéticaRESUMEN
Patients with seemingly the same tumour can respond very differently to treatment. There are strong, well-established effects of somatic mutations on drug efficacy, but there is at-most anecdotal evidence of a germline component to drug response. Here, we report a systematic survey of how inherited germline variants affect drug susceptibility in cancer cell lines. We develop a joint analysis approach that leverages both germline and somatic variants, before applying it to screening data from 993 cell lines and 265 drugs. Surprisingly, we find that the germline contribution to variation in drug susceptibility can be as large or larger than effects due to somatic mutations. Several of the associations identified have a direct relationship to the drug target. Finally, using 17-AAG response as an example, we show how germline effects in combination with transcriptomic data can be leveraged for improved patient stratification and to identify new markers for drug sensitivity.
Asunto(s)
Ensayos de Selección de Medicamentos Antitumorales , Células Germinativas/metabolismo , Neoplasias/genética , Benzoquinonas/metabolismo , Línea Celular Tumoral , Mutación de Línea Germinal/genética , Humanos , Lactamas Macrocíclicas/metabolismo , Sitios de Carácter Cuantitativo/genéticaRESUMEN
From whole organisms to individual cells, responses to environmental conditions are influenced by genetic makeup, where the effect of genetic variation on a trait depends on the environmental context. RNA-sequencing quantifies gene expression as a molecular trait, and is capable of capturing both genetic and environmental effects. In this study, we explore opportunities of using allele-specific expression (ASE) to discover cis-acting genotype-environment interactions (GxE)-genetic effects on gene expression that depend on an environmental condition. Treating 17 common, clinical traits as approximations of the cellular environment of 267 skeletal muscle biopsies, we identify 10 candidate environmental response expression quantitative trait loci (reQTLs) across 6 traits (12 unique gene-environment trait pairs; 10% FDR per trait) including sex, systolic blood pressure, and low-density lipoprotein cholesterol. Although using ASE is in principle a promising approach to detect GxE effects, replication of such signals can be challenging as validation requires harmonization of environmental traits across cohorts and a sufficient sampling of heterozygotes for a transcribed SNP. Comprehensive discovery and replication will require large human transcriptome datasets, or the integration of multiple transcribed SNPs, coupled with standardized clinical phenotyping.
Asunto(s)
Microambiente Celular , Regulación de la Expresión Génica , Interacción Gen-Ambiente , Variación Genética , Fibras Musculares Esqueléticas/metabolismo , Músculo Esquelético/metabolismo , Metabolismo Energético , Estudios de Asociación Genética , Genotipo , Humanos , Músculo Esquelético/citología , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
BACKGROUND: A healthy immune system requires immune cells that adapt rapidly to environmental challenges. This phenotypic plasticity can be mediated by transcriptional and epigenetic variability. RESULTS: We apply a novel analytical approach to measure and compare transcriptional and epigenetic variability genome-wide across CD14+CD16- monocytes, CD66b+CD16+ neutrophils, and CD4+CD45RA+ naïve T cells from the same 125 healthy individuals. We discover substantially increased variability in neutrophils compared to monocytes and T cells. In neutrophils, genes with hypervariable expression are found to be implicated in key immune pathways and are associated with cellular properties and environmental exposure. We also observe increased sex-specific gene expression differences in neutrophils. Neutrophil-specific DNA methylation hypervariable sites are enriched at dynamic chromatin regions and active enhancers. CONCLUSIONS: Our data highlight the importance of transcriptional and epigenetic variability for the key role of neutrophils as the first responders to inflammatory stimuli. We provide a resource to enable further functional studies into the plasticity of immune cells, which can be accessed from: http://blueprint-dev.bioinfo.cnio.es/WP10/hypervariability .
Asunto(s)
Epigénesis Genética , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Sistema Inmunológico/citología , Sistema Inmunológico/metabolismo , Transcripción Genética , Análisis por Conglomerados , Islas de CpG , Metilación de ADN , Femenino , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Variación Genética , Humanos , Sistema Inmunológico/inmunología , Masculino , Neutrófilos/metabolismo , Especificidad de Órganos/genética , Factores SexualesRESUMEN
Estrogen responsive breast cancer cell lines have been extensively studied to characterize transcriptional patterns in hormone-responsive tumors. Nevertheless, due to current technological limitations, genome-wide studies have typically been limited to population averaged data. Here we obtain, for the first time, a characterization at the single-cell level of the states and expression signatures of a hormone-starved MCF-7 cell system responding to estrogen. To do so, we employ a recently proposed model that allows for dissecting single-cell states from time-course microarray data. We show that within 32 hours following stimulation, MCF-7 cells traverse, most likely, six states, with a faster early response followed by a progressive deceleration. We also derive the genome-wide transcriptional profiles of such single-cell states and their functional characterization. Our results support a scenario where estrogen promotes cell cycle progression by controlling multiple, sequential regulatory steps, whose single-cell events are here identified.