RESUMEN
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética/genética , Alelos , Análisis Mutacional de ADN , Europa (Continente)/etnología , Exoma , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Tamaño de la MuestraRESUMEN
We integrate comeasured gene expression and DNA methylation (DNAme) in 265 human skeletal muscle biopsies from the FUSION study with >7 million genetic variants and eight physiological traits: height, waist, weight, waist-hip ratio, body mass index, fasting serum insulin, fasting plasma glucose, and type 2 diabetes. We find hundreds of genes and DNAme sites associated with fasting insulin, waist, and body mass index, as well as thousands of DNAme sites associated with gene expression (eQTM). We find that controlling for heterogeneity in tissue/muscle fiber type reduces the number of physiological trait associations, and that long-range eQTMs (>1 Mb) are reduced when controlling for tissue/muscle fiber type or latent factors. We map genetic regulators (quantitative trait loci; QTLs) of expression (eQTLs) and DNAme (mQTLs). Using Mendelian randomization (MR) and mediation techniques, we leverage these genetic maps to predict 213 causal relationships between expression and DNAme, approximately two-thirds of which predict methylation to causally influence expression. We use MR to integrate FUSION mQTLs, FUSION eQTLs, and GTEx eQTLs for 48 tissues with genetic associations for 534 diseases and quantitative traits. We identify hundreds of genes and thousands of DNAme sites that may drive the reported disease/quantitative trait genetic associations. We identify 300 gene expression MR associations that are present in both FUSION and GTEx skeletal muscle and that show stronger evidence of MR association in skeletal muscle than other tissues, which may partially reflect differences in power across tissues. As one example, we find that increased RXRA muscle expression may decrease lean tissue mass.
Asunto(s)
Metilación de ADN/genética , Expresión Génica/genética , Músculo Esquelético , Glucemia/análisis , Pesos y Medidas Corporales , Diabetes Mellitus Tipo 2 , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Humanos , Insulina/análisis , Músculo Esquelético/química , Músculo Esquelético/fisiología , Sitios de Carácter Cuantitativo/genéticaRESUMEN
Patients with classic hydroa vacciniforme-like lymphoproliferative disorder (HVLPD) typically have high levels of Epstein-Barr virus (EBV) DNA in T cells and/or natural killer (NK) cells in blood and skin lesions induced by sun exposure that are infiltrated with EBV-infected lymphocytes. HVLPD is very rare in the United States and Europe but more common in Asia and South America. The disease can progress to a systemic form that may result in fatal lymphoma. We report our 11-year experience with 16 HVLPD patients from the United States and England and found that whites were less likely to develop systemic EBV disease (1/10) than nonwhites (5/6). All (10/10) of the white patients were generally in good health at last follow-up, while two-thirds (4/6) of the nonwhite patients required hematopoietic stem cell transplantation. Nonwhite patients had later age of onset of HVLPD than white patients (median age, 8 vs 5 years) and higher levels of EBV DNA (median, 1 515 000 vs 250 000 copies/ml) and more often had low numbers of NK cells (83% vs 50% of patients) and T-cell clones in the blood (83% vs 30% of patients). RNA-sequencing analysis of an HVLPD skin lesion in a white patient compared with his normal skin showed increased expression of interferon-γ and chemokines that attract T cells and NK cells. Thus, white patients with HVLPD were less likely to have systemic disease with EBV and had a much better prognosis than nonwhite patients. This trial was registered at www.clinicaltrials.gov as #NCT00369421 and #NCT00032513.
Asunto(s)
Infecciones por Virus de Epstein-Barr/patología , Hidroa Vacciniforme/virología , Trastornos Linfoproliferativos/patología , Trastornos Linfoproliferativos/virología , Niño , Preescolar , Infecciones por Virus de Epstein-Barr/etnología , Infecciones por Virus de Epstein-Barr/inmunología , Femenino , Humanos , Trastornos Linfoproliferativos/etnología , Masculino , Población BlancaRESUMEN
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genéticaRESUMEN
A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad/genética , Variación Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnología , Diabetes Mellitus Tipo 2/patología , Salud de la Familia , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad/etnología , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Masculino , Linaje , Fenotipo , Sitios de Carácter Cuantitativo/genética , Secuenciación Completa del Genoma/métodosRESUMEN
Comprehensive metabolite profiling captures many highly heritable traits, including amino acid levels, which are potentially sensitive biomarkers for disease pathogenesis. To better understand the contribution of genetic variation to amino acid levels, we performed single variant and gene-based tests of association between nine serum amino acids (alanine, glutamine, glycine, histidine, isoleucine, leucine, phenylalanine, tyrosine, and valine) and 16.6 million genotyped and imputed variants in 8545 non-diabetic Finnish men from the METabolic Syndrome In Men (METSIM) study with replication in Northern Finland Birth Cohort (NFBC1966). We identified five novel loci associated with amino acid levels (P = < 5×10-8): LOC157273/PPP1R3B with glycine (rs9987289, P = 2.3×10-26); ZFHX3 (chr16:73326579, minor allele frequency (MAF) = 0.42%, P = 3.6×10-9), LIPC (rs10468017, P = 1.5×10-8), and WWOX (rs9937914, P = 3.8×10-8) with alanine; and TRIB1 with tyrosine (rs28601761, P = 8×10-9). Gene-based tests identified two novel genes harboring missense variants of MAF <1% that show aggregate association with amino acid levels: PYCR1 with glycine (Pgene = 1.5×10-6) and BCAT2 with valine (Pgene = 7.4×10-7); neither gene was implicated by single variant association tests. These findings are among the first applications of gene-based tests to identify new loci for amino acid levels. In addition to the seven novel gene associations, we identified five independent signals at established amino acid loci, including two rare variant signals at GLDC (rs138640017, MAF=0.95%, Pconditional = 5.8×10-40) with glycine levels and HAL (rs141635447, MAF = 0.46%, Pconditional = 9.4×10-11) with histidine levels. Examination of all single variant association results in our data revealed a strong inverse relationship between effect size and MAF (Ptrend<0.001). These novel signals provide further insight into the molecular mechanisms of amino acid metabolism and potentially, their perturbations in disease.
Asunto(s)
Aminoácidos/metabolismo , Estudio de Asociación del Genoma Completo/métodos , Finlandia , Frecuencia de los Genes/genética , Genotipo , Humanos , Masculino , Persona de Mediana EdadRESUMEN
Subcutaneous adipose tissue stores excess lipids and maintains energy balance. We performed expression quantitative trait locus (eQTL) analyses by using abdominal subcutaneous adipose tissue of 770 extensively phenotyped participants of the METSIM study. We identified cis-eQTLs for 12,400 genes at a 1% false-discovery rate. Among an approximately 680 known genome-wide association study (GWAS) loci for cardio-metabolic traits, we identified 140 coincident cis-eQTLs at 109 GWAS loci, including 93 eQTLs not previously described. At 49 of these 140 eQTLs, gene expression was nominally associated (p < 0.05) with levels of the GWAS trait. The size of our dataset enabled identification of five loci associated (p < 5 × 10-8) with at least five genes located >5 Mb away. These trans-eQTL signals confirmed and extended the previously reported KLF14-mediated network to 55 target genes, validated the CIITA regulation of class II MHC genes, and identified ZNF800 as a candidate master regulator. Finally, we observed similar expression-clinical trait correlations of genes associated with GWAS loci in both humans and a panel of genetically diverse mice. These results provide candidate genes for further investigation of their potential roles in adipose biology and in regulating cardio-metabolic traits.
Asunto(s)
Enfermedades Cardiovasculares/genética , Regulación de la Expresión Génica , Síndrome Metabólico/genética , Sitios de Carácter Cuantitativo , Grasa Subcutánea/metabolismo , Anciano , Animales , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Masculino , Ratones , Persona de Mediana Edad , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fenotipo , Reproducibilidad de los Resultados , Transactivadores/genética , Transactivadores/metabolismoRESUMEN
Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits. However, the pathogenic mechanisms of most of these SNPs remain elusive. Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D. We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps. Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers. High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATAC-seq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL. Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions. Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints. Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions. Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Genoma Humano , Islotes Pancreáticos/metabolismo , Sitios de Carácter Cuantitativo , Transcriptoma , Alelos , Secuencia de Bases , Sitios de Unión , Cromatina/química , Cromatina/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Epigénesis Genética , Perfilación de la Expresión Génica , Variación Genética , Estudio de Asociación del Genoma Completo , Impresión Genómica , Humanos , Islotes Pancreáticos/patología , Polimorfismo de Nucleótido Simple , Unión Proteica , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Factores de Transcripción del Factor Regulador X/genética , Factores de Transcripción del Factor Regulador X/metabolismoRESUMEN
Lipid and lipoprotein subclasses are associated with metabolic and cardiovascular diseases, yet the genetic contributions to variability in subclass traits are not fully understood. We conducted single-variant and gene-based association tests between 15.1M variants from genome-wide and exome array and imputed genotypes and 72 lipid and lipoprotein traits in 8,372 Finns. After accounting for 885 variants at 157 previously identified lipid loci, we identified five novel signals near established loci at HIF3A, ADAMTS3, PLTP, LCAT, and LIPG. Four of the signals were identified with a low-frequency (0.005Asunto(s)
Frecuencia de los Genes/genética
, Metabolismo de los Lípidos/genética
, Lípidos/genética
, Lipoproteínas/genética
, Polimorfismo de Nucleótido Simple/genética
, Triglicéridos/genética
, Población Blanca/genética
, HDL-Colesterol/genética
, Exoma/genética
, Finlandia
, Estudio de Asociación del Genoma Completo/métodos
, Genotipo
, Humanos
, Masculino
, Persona de Mediana Edad
, Análisis de Componente Principal/métodos
RESUMEN
BACKGROUND: Bisulfite sequencing is widely employed to study the role of DNA methylation in disease; however, the data suffer from biases due to coverage depth variability. Imputation of methylation values at low-coverage sites may mitigate these biases while also identifying important genomic features associated with predictive power. RESULTS: Here we describe BoostMe, a method for imputing low-quality DNA methylation estimates within whole-genome bisulfite sequencing (WGBS) data. BoostMe uses a gradient boosting algorithm, XGBoost, and leverages information from multiple samples for prediction. We find that BoostMe outperforms existing algorithms in speed and accuracy when applied to WGBS of human tissues. Furthermore, we show that imputation improves concordance between WGBS and the MethylationEPIC array at low WGBS depth, suggesting improved WGBS accuracy after imputation. CONCLUSIONS: Our findings support the use of BoostMe as a preprocessing step for WGBS analysis.
Asunto(s)
Biología Computacional/métodos , Metilación de ADN/efectos de los fármacos , Sulfitos/farmacología , Secuenciación Completa del Genoma , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
BACKGROUND: Hutchinson-Gilford progeria syndrome (HGPS) is a fatal sporadic autosomal dominant premature ageing disease caused by single base mutations that optimise a cryptic splice site within exon 11 of the LMNA gene. The resultant disease-causing protein, progerin, acts as a dominant negative. Disease severity relies partly on progerin levels. METHODS AND RESULTS: We report a novel form of somatic mosaicism, where a child possessed two cell populations with different HGPS disease-producing mutations of the same nucleotide-one producing severe HGPS and one mild HGPS. The proband possessed an intermediate phenotype. The mosaicism was initially discovered when Sanger sequencing showed a c.1968+2T>A mutation in blood DNA and a c.1968+2T>C in DNA from cultured fibroblasts. Deep sequencing of DNA from the proband's blood revealed 4.7% c.1968+2T>C mutation, and 41.3% c.1968+2T>A mutation. CONCLUSIONS: We hypothesise that the germline mutation was c.1968+2T>A, but a rescue event occurred during early development, where the somatic mutation from A to C at 1968+2 provided a selective advantage. This type of mosaicism where a partial phenotypic rescue event results from a second but milder disease-causing mutation in the same nucleotide has not been previously characterised for any disease.
Asunto(s)
Núcleo Celular/genética , Lamina Tipo A/genética , Progeria/genética , Adolescente , Núcleo Celular/patología , Células Cultivadas , Niño , Preescolar , Exones/genética , Femenino , Fibroblastos/patología , Predisposición Genética a la Enfermedad , Mutación de Línea Germinal , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Lactante , Masculino , Mosaicismo , Progeria/patologíaRESUMEN
Although prostate cancer typically runs an indolent course, a subset of men develop aggressive, fatal forms of this disease. We hypothesize that germline variation modulates susceptibility to aggressive prostate cancer. The goal of this work is to identify susceptibility genes using the C57BL/6-Tg(TRAMP)8247Ng/J (TRAMP) mouse model of neuroendocrine prostate cancer. Quantitative trait locus (QTL) mapping was performed in transgene-positive (TRAMPxNOD/ShiLtJ) F2 intercross males (nâ=â228), which facilitated identification of 11 loci associated with aggressive disease development. Microarray data derived from 126 (TRAMPxNOD/ShiLtJ) F2 primary tumors were used to prioritize candidate genes within QTLs, with candidate genes deemed as being high priority when possessing both high levels of expression-trait correlation and a proximal expression QTL. This process enabled the identification of 35 aggressive prostate tumorigenesis candidate genes. The role of these genes in aggressive forms of human prostate cancer was investigated using two concurrent approaches. First, logistic regression analysis in two human prostate gene expression datasets revealed that expression levels of five genes (CXCL14, ITGAX, LPCAT2, RNASEH2A, and ZNF322) were positively correlated with aggressive prostate cancer and two genes (CCL19 and HIST1H1A) were protective for aggressive prostate cancer. Higher than average levels of expression of the five genes that were positively correlated with aggressive disease were consistently associated with patient outcome in both human prostate cancer tumor gene expression datasets. Second, three of these five genes (CXCL14, ITGAX, and LPCAT2) harbored polymorphisms associated with aggressive disease development in a human GWAS cohort consisting of 1,172 prostate cancer patients. This study is the first example of using a systems genetics approach to successfully identify novel susceptibility genes for aggressive prostate cancer. Such approaches will facilitate the identification of novel germline factors driving aggressive disease susceptibility and allow for new insights into these deadly forms of prostate cancer.
Asunto(s)
1-Acilglicerofosfocolina O-Aciltransferasa/genética , Antígeno CD11c/genética , Quimiocinas CXC/genética , Neoplasias de la Próstata/genética , Animales , Transformación Celular Neoplásica/genética , Modelos Animales de Enfermedad , Regulación Neoplásica de la Expresión Génica , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Ratones , Neoplasias de la Próstata/patología , Sitios de Carácter Cuantitativo/genética , Ribonucleasa H/genéticaRESUMEN
Genome-wide association studies have identified genomic loci, whose single-nucleotide polymorphisms (SNPs) predispose to prostate cancer (PCa). However, the mechanisms of most of these variants are largely unknown. We integrated chromatin-immunoprecipitation-coupled sequencing and microarray expression profiling in TMPRSS2-ERG gene rearrangement positive DUCaP cells with the GWAS PCa risk SNPs catalog to identify disease susceptibility SNPs localized within functional androgen receptor-binding sites (ARBSs). Among the 48 GWAS index risk SNPs and 3,917 linked SNPs, 80 were found located in ARBSs. Of these, rs11891426:T>G in an intron of the melanophilin gene (MLPH) was within a novel putative auxiliary AR-binding motif, which is enriched in the neighborhood of canonical androgen-responsive elements. TâG exchange attenuated the transcriptional activity of the ARBS in an AR reporter gene assay. The expression of MLPH in primary prostate tumors was significantly lower in those with the G compared with the T allele and correlated significantly with AR protein. Higher melanophilin level in prostate tissue of patients with a favorable PCa risk profile points out a tumor-suppressive effect. These results unravel a hidden link between AR and a functional putative PCa risk SNP, whose allele alteration affects androgen regulation of its host gene MLPH.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/genética , Sitios de Unión , Polimorfismo de Nucleótido Simple , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Receptores Androgénicos/metabolismo , Elementos de Respuesta , Adulto , Anciano , Alelos , Secuencia de Bases , Línea Celular Tumoral , Inmunoprecipitación de Cromatina , Regulación Neoplásica de la Expresión Génica , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Persona de Mediana Edad , Clasificación del Tumor , Estadificación de Neoplasias , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Neoplasias de la Próstata/patología , Unión Proteica , Carga TumoralRESUMEN
Hutchinson-Gilford progeria syndrome (HGPS) is a premature aging disease that is frequently caused by a de novo point mutation at position 1824 in LMNA. This mutation activates a cryptic splice donor site in exon 11, and leads to an in-frame deletion within the prelamin A mRNA and the production of a dominant-negative lamin A protein, known as progerin. Here we show that primary HGPS skin fibroblasts experience genome-wide correlated alterations in patterns of H3K27me3 deposition, DNA-lamin A/C associations, and, at late passages, genome-wide loss of spatial compartmentalization of active and inactive chromatin domains. We further demonstrate that the H3K27me3 changes associate with gene expression alterations in HGPS cells. Our results support a model that the accumulation of progerin in the nuclear lamina leads to altered H3K27me3 marks in heterochromatin, possibly through the down-regulation of EZH2, and disrupts heterochromatin-lamina interactions. These changes may result in transcriptional misregulation and eventually trigger the global loss of spatial chromatin compartmentalization in late passage HGPS fibroblasts.
Asunto(s)
Genoma Humano , Histonas/metabolismo , Laminas/metabolismo , Progeria/genética , Progeria/metabolismo , Línea Celular , Inmunoprecipitación de Cromatina , Fibroblastos/metabolismo , Regulación de la Expresión Génica , Heterocromatina/metabolismo , Humanos , Metilación , Unión Proteica , Análisis de Secuencia de ADNRESUMEN
Chromatin-based functional genomic analyses and genomewide association studies (GWASs) together implicate enhancers as critical elements influencing gene expression and risk for common diseases. Here, we performed systematic chromatin and transcriptome profiling in human pancreatic islets. Integrated analysis of islet data with those from nine cell types identified specific and significant enrichment of type 2 diabetes and related quantitative trait GWAS variants in islet enhancers. Our integrated chromatin maps reveal that most enhancers are short (median = 0.8 kb). Each cell type also contains a substantial number of more extended (≥ 3 kb) enhancers. Interestingly, these stretch enhancers are often tissue-specific and overlap locus control regions, suggesting that they are important chromatin regulatory beacons. Indeed, we show that (i) tissue specificity of enhancers and nearby gene expression increase with enhancer length; (ii) neighborhoods containing stretch enhancers are enriched for important cell type-specific genes; and (iii) GWAS variants associated with traits relevant to a particular cell type are more enriched in stretch enhancers compared with short enhancers. Reporter constructs containing stretch enhancer sequences exhibited tissue-specific activity in cell culture experiments and in transgenic mice. These results suggest that stretch enhancers are critical chromatin elements for coordinating cell type-specific regulatory programs and that sequence variation in stretch enhancers affects risk of major common human diseases.
Asunto(s)
Diferenciación Celular/fisiología , Cromatina/fisiología , Diabetes Mellitus Tipo 2/fisiopatología , Elementos de Facilitación Genéticos/genética , Epigenómica/métodos , Regulación de la Expresión Génica/fisiología , Células Secretoras de Insulina/metabolismo , Animales , Inmunoprecipitación de Cromatina , Diabetes Mellitus Tipo 2/genética , Elementos de Facilitación Genéticos/fisiología , Perfilación de la Expresión Génica , Regulación de la Expresión Génica/genética , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Secretoras de Insulina/fisiología , Luciferasas , Ratones , Ratones TransgénicosRESUMEN
Transgenic animals are extensively used to model human disease. Typically, the transgene copy number is estimated, but the exact integration site and configuration of the foreign DNA remains uncharacterized. When transgenes have been closely examined, some unexpected configurations have been found. Here, we describe a method to recover transgene insertion sites and assess structural rearrangements of host and transgene DNA using microarray hybridization and targeted sequence capture. We used information about the transgene insertion site to develop a polymerase chain reaction genotyping assay to distinguish heterozygous from homozygous transgenic animals. Although we worked with a bacterial artificial chromosome transgenic mouse line, this method can be used to analyse the integration site and configuration of any foreign DNA in a sequenced genome.
Asunto(s)
Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN , Transgenes , Animales , Cromosomas Artificiales Bacterianos , Ratones , Ratones Transgénicos , Reacción en Cadena de la PolimerasaRESUMEN
Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe the Metabochip and its component SNP sets, evaluate its performance in capturing variation across the allele-frequency spectrum, describe solutions to methodological challenges commonly encountered in its analysis, and evaluate its performance as a platform for genotype imputation. The metabochip achieves dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping and sequencing studies and advance our understanding of the genetic basis of complex human diseases and traits.
Asunto(s)
Antropometría/instrumentación , Metabolómica/instrumentación , Análisis de Secuencia por Matrices de Oligonucleótidos/instrumentación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Alelos , Antropometría/métodos , Enfermedades Cardiovasculares/diagnóstico , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/metabolismo , Diabetes Mellitus Tipo 2/diagnóstico , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Frecuencia de los Genes , Genoma Humano , Estudio de Asociación del Genoma Completo , Genotipo , Técnicas de Genotipaje , Humanos , Metabolómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , FenotipoRESUMEN
Airway allergen exposure induces inflammation among individuals with atopy that is characterized by altered airway gene expression, elevated levels of T helper type 2 cytokines, mucus hypersecretion, and airflow obstruction. To identify the genetic determinants of the airway allergen response, we employed a systems genetics approach. We applied a house dust mite mouse model of allergic airway disease to 151 incipient lines of the Collaborative Cross, a new mouse genetic reference population, and measured serum IgE, airway eosinophilia, and gene expression in the lung. Allergen-induced serum IgE and airway eosinophilia were not correlated. We detected quantitative trait loci (QTL) for airway eosinophilia on chromosome (Chr) 11 (71.802-87.098 megabases [Mb]) and allergen-induced IgE on Chr 4 (13.950-31.660 Mb). More than 4,500 genes expressed in the lung had gene expression QTL (eQTL), the majority of which were located near the gene itself. However, we also detected approximately 1,700 trans-eQTL, and many of these trans-eQTL clustered into two regions on Chr 2. We show that one of these loci (at 147.6 Mb) is associated with the expression of more than 100 genes, and, using bioinformatics resources, fine-map this locus to a 53 kb-long interval. We also use the gene expression and eQTL data to identify a candidate gene, Tlcd2, for the eosinophil QTL. Our results demonstrate that hallmark allergic airway disease phenotypes are associated with distinct genetic loci on Chrs 4 and 11, and that gene expression in the allergically inflamed lung is controlled by both cis and trans regulatory factors.
Asunto(s)
Hiperreactividad Bronquial/inmunología , Hipersensibilidad/metabolismo , Pulmón/inmunología , Animales , Antígenos Dermatofagoides/inmunología , Dermatophagoides pteronyssinus/metabolismo , Modelos Animales de Enfermedad , Regulación de la Expresión Génica , Genética , Hipersensibilidad/inmunología , Inmunoglobulina E/sangre , Inflamación , Pulmón/metabolismo , Masculino , Ratones , Fenotipo , Sitios de Carácter Cuantitativo , Hipersensibilidad Respiratoria/inmunologíaRESUMEN
Massively parallel DNA sequencing technologies have greatly increased our ability to generate large amounts of sequencing data at a rapid pace. Several methods have been developed to enrich for genomic regions of interest for targeted sequencing. We have compared three of these methods: Molecular Inversion Probes (MIP), Solution Hybrid Selection (SHS), and Microarray-based Genomic Selection (MGS). Using HapMap DNA samples, we compared each of these methods with respect to their ability to capture an identical set of exons and evolutionarily conserved regions associated with 528 genes (2.61 Mb). For sequence analysis, we developed and used a novel Bayesian genotype-assigning algorithm, Most Probable Genotype (MPG). All three capture methods were effective, but sensitivities (percentage of targeted bases associated with high-quality genotypes) varied for an equivalent amount of pass-filtered sequence: for example, 70% (MIP), 84% (SHS), and 91% (MGS) for 400 Mb. In contrast, all methods yielded similar accuracies of >99.84% when compared to Infinium 1M SNP BeadChip-derived genotypes and >99.998% when compared to 30-fold coverage whole-genome shotgun sequencing data. We also observed a low false-positive rate with all three methods; of the heterozygous positions identified by each of the capture methods, >99.57% agreed with 1M SNP BeadChip, and >98.840% agreed with the whole-genome shotgun data. In addition, we successfully piloted the genomic enrichment of a set of 12 pooled samples via the MGS method using molecular bar codes. We find that these three genomic enrichment methods are highly accurate and practical, with sensitivities comparable to that of 30-fold coverage whole-genome shotgun data.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Genoma Humano , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Teorema de Bayes , ADN/genética , Sondas de ADN/genética , Exones , Genotipo , Humanos , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
Hereditary congenital facial paresis type 1 (HCFP1) is an autosomal dominant disorder of absent or limited facial movement that maps to chromosome 3q21-q22 and is hypothesized to result from facial branchial motor neuron (FBMN) maldevelopment. In the present study, we report that HCFP1 results from heterozygous duplications within a neuron-specific GATA2 regulatory region that includes two enhancers and one silencer, and from noncoding single-nucleotide variants (SNVs) within the silencer. Some SNVs impair binding of NR2F1 to the silencer in vitro and in vivo and attenuate in vivo enhancer reporter expression in FBMNs. Gata2 and its effector Gata3 are essential for inner-ear efferent neuron (IEE) but not FBMN development. A humanized HCFP1 mouse model extends Gata2 expression, favors the formation of IEEs over FBMNs and is rescued by conditional loss of Gata3. These findings highlight the importance of temporal gene regulation in development and of noncoding variation in rare mendelian disease.