RESUMO
Tissue-specific regulatory regions harbor substantial genetic risk for disease. Because brain development is a critical epoch for neuropsychiatric disease susceptibility, we characterized the genetic control of the transcriptome in 201 mid-gestational human brains, identifying 7,962 expression quantitative trait loci (eQTL) and 4,635 spliceQTL (sQTL), including several thousand prenatal-specific regulatory regions. We show that significant genetic liability for neuropsychiatric disease lies within prenatal eQTL and sQTL. Integration of eQTL and sQTL with genome-wide association studies (GWAS) via transcriptome-wide association identified dozens of novel candidate risk genes, highlighting shared and stage-specific mechanisms in schizophrenia (SCZ). Gene network analysis revealed that SCZ and autism spectrum disorder (ASD) affect distinct developmental gene co-expression modules. Yet, in each disorder, common and rare genetic variation converges within modules, which in ASD implicates superficial cortical neurons. More broadly, these data, available as a web browser and our analyses, demonstrate the genetic mechanisms by which developmental events have a widespread influence on adult anatomical and behavioral phenotypes.
Assuntos
Transtorno do Espectro Autista/genética , Locos de Características Quantitativas/genética , Esquizofrenia/genética , Transcriptoma/genética , Transtorno do Espectro Autista/metabolismo , Transtorno do Espectro Autista/patologia , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Feminino , Feto/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Idade Gestacional , Humanos , Masculino , Neurônios/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Splicing de RNA/genética , Esquizofrenia/metabolismo , Esquizofrenia/patologiaRESUMO
Neuropsychiatric disorders classically lack defining brain pathologies, but recent work has demonstrated dysregulation at the molecular level, characterized by transcriptomic and epigenetic alterations1-3. In autism spectrum disorder (ASD), this molecular pathology involves the upregulation of microglial, astrocyte and neural-immune genes, the downregulation of synaptic genes, and attenuation of gene-expression gradients in cortex1,2,4-6. However, whether these changes are limited to cortical association regions or are more widespread remains unknown. To address this issue, we performed RNA-sequencing analysis of 725 brain samples spanning 11 cortical areas from 112 post-mortem samples from individuals with ASD and neurotypical controls. We find widespread transcriptomic changes across the cortex in ASD, exhibiting an anterior-to-posterior gradient, with the greatest differences in primary visual cortex, coincident with an attenuation of the typical transcriptomic differences between cortical regions. Single-nucleus RNA-sequencing and methylation profiling demonstrate that this robust molecular signature reflects changes in cell-type-specific gene expression, particularly affecting excitatory neurons and glia. Both rare and common ASD-associated genetic variation converge within a downregulated co-expression module involving synaptic signalling, and common variation alone is enriched within a module of upregulated protein chaperone genes. These results highlight widespread molecular changes across the cerebral cortex in ASD, extending beyond association cortex to broadly involve primary sensory regions.
Assuntos
Transtorno do Espectro Autista , Córtex Cerebral , Variação Genética , Transcriptoma , Humanos , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/metabolismo , Transtorno do Espectro Autista/patologia , Córtex Cerebral/metabolismo , Córtex Cerebral/patologia , Neurônios/metabolismo , RNA/análise , RNA/genética , Transcriptoma/genética , Autopsia , Análise de Sequência de RNA , Córtex Visual Primário/metabolismo , Neuroglia/metabolismoRESUMO
Change history: In this Letter, the labels for splicing events A3SS and A5SS were swapped in column D of Supplementary Table 3a and b. This has been corrected online.
RESUMO
Autism spectrum disorder (ASD) involves substantial genetic contributions. These contributions are profoundly heterogeneous but may converge on common pathways that are not yet well understood. Here, through post-mortem genome-wide transcriptome analysis of the largest cohort of samples analysed so far, to our knowledge, we interrogate the noncoding transcriptome, alternative splicing, and upstream molecular regulators to broaden our understanding of molecular convergence in ASD. Our analysis reveals ASD-associated dysregulation of primate-specific long noncoding RNAs (lncRNAs), downregulation of the alternative splicing of activity-dependent neuron-specific exons, and attenuation of normal differences in gene expression between the frontal and temporal lobes. Our data suggest that SOX5, a transcription factor involved in neuron fate specification, contributes to this reduction in regional differences. We further demonstrate that a genetically defined subtype of ASD, chromosome 15q11.2-13.1 duplication syndrome (dup15q), shares the core transcriptomic signature observed in idiopathic ASD. Co-expression network analysis reveals that individuals with ASD show age-related changes in the trajectory of microglial and synaptic function over the first two decades, and suggests that genetic risk for ASD may influence changes in regional cortical gene expression. Our findings illustrate how diverse genetic perturbations can lead to phenotypic convergence at multiple biological levels in a complex neuropsychiatric disorder.
Assuntos
Processamento Alternativo/genética , Transtorno do Espectro Autista/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Genoma Humano/genética , RNA Longo não Codificante/genética , Animais , Autopsia , Estudos de Casos e Controles , Aberrações Cromossômicas , Cromossomos Humanos Par 15/genética , Éxons/genética , Lobo Frontal/metabolismo , Humanos , Deficiência Intelectual/genética , Neurônios/metabolismo , Primatas/genética , Fatores de Transcrição SOXD/metabolismo , Especificidade da Espécie , Lobo Temporal/metabolismo , Transcriptoma/genéticaRESUMO
The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.
Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Alelos , Análise Mutacional de DNA , Europa (Continente)/etnologia , Exoma , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Humanos , Tamanho da AmostraRESUMO
MOTIVATION: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers. RESULTS: We propose SMaSH, a benchmarking methodology for evaluating germline variant calling algorithms. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on these benchmarking data. Moreover, we illustrate the utility of SMaSH to evaluate the performance of some leading single-nucleotide polymorphism, indel and structural variant calling algorithms. AVAILABILITY AND IMPLEMENTATION: We provide free and open access online to the SMaSH tool kit, along with detailed documentation, at smash.cs.berkeley.edu
Assuntos
Biologia Computacional/métodos , Genoma Humano , Genômica/métodos , Mutação INDEL , Algoritmos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
We have developed an inducible Huntington's disease (HD) mouse model that allows temporal control of whole-body allele-specific mutant huntingtin (mHtt) expression. We asked whether moderate global lowering of mHtt (~50%) was sufficient for long-term amelioration of HD-related deficits and, if so, whether early mHtt lowering (before measurable deficits) was required. Both early and late mHtt lowering delayed behavioral dysfunction and mHTT protein aggregation, as measured biochemically. However, long-term follow-up revealed that the benefits, in all mHtt-lowering groups, attenuated by 12 months of age. While early mHtt lowering attenuated cortical and striatal transcriptional dysregulation evaluated at 6 months of age, the benefits diminished by 12 months of age, and late mHtt lowering did not ameliorate striatal transcriptional dysregulation at 12 months of age. Only early mHtt lowering delayed the elevation in cerebrospinal fluid neurofilament light chain that we observed in our model starting at 9 months of age. As small-molecule HTT-lowering therapeutics progress to the clinic, our findings suggest that moderate mHtt lowering allows disease progression to continue, albeit at a slower rate, and could be relevant to the degree of mHTT lowering required to sustain long-term benefits in humans.
Assuntos
Doença de Huntington , Camundongos , Humanos , Animais , Lactente , Doença de Huntington/tratamento farmacológico , Doença de Huntington/genética , Agregados Proteicos , Proteína Huntingtina/genética , Proteína Huntingtina/líquido cefalorraquidiano , Modelos Animais de Doenças , Corpo Estriado/metabolismo , Progressão da DoençaRESUMO
BACKGROUND: Mouse models have allowed for the direct interrogation of genetic effects on molecular, physiological, and behavioral brain phenotypes. However, it is unknown to what extent neurological or psychiatric traits may be human- or primate-specific and therefore which components can be faithfully recapitulated in mouse models. RESULTS: We compare conservation of co-expression in 116 independent data sets derived from human, mouse, and non-human primate representing more than 15,000 total samples. We observe greater changes occurring on the human lineage than mouse, and substantial regional variation that highlights cerebral cortex as the most diverged region. Glia, notably microglia, astrocytes, and oligodendrocytes are the most divergent cell type, three times more on average than neurons. We show that cis-regulatory sequence divergence explains a significant fraction of co-expression divergence. Moreover, protein coding sequence constraint parallels co-expression conservation, such that genes with loss of function intolerance are enriched in neuronal, rather than glial modules. We identify dozens of human neuropsychiatric and neurodegenerative disease risk genes, such as COMT, PSEN-1, LRRK2, SHANK3, and SNCA, with highly divergent co-expression between mouse and human and show that 3D human brain organoids recapitulate in vivo co-expression modules representing several human cell types. CONCLUSIONS: We identify robust co-expression modules reflecting whole-brain and regional patterns of gene expression. Compared with those that represent basic metabolic processes, cell-type-specific modules, most prominently glial modules, are the most divergent between species. These data and analyses serve as a foundational resource to guide human disease modeling and its interpretation.
Assuntos
Encéfalo/metabolismo , Evolução Molecular , Transcriptoma , Animais , Astrócitos/metabolismo , Catecol O-Metiltransferase/metabolismo , Córtex Cerebral/metabolismo , Perfilação da Expressão Gênica , Humanos , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Camundongos , Proteínas dos Microfilamentos , Proteínas do Tecido Nervoso/metabolismo , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/metabolismo , Neurônios/metabolismo , Presenilina-1/metabolismo , Primatas , alfa-Sinucleína/metabolismoRESUMO
Gene networks have yielded numerous neurobiological insights, yet an integrated view across brain regions is lacking. We leverage RNA sequencing in 864 samples representing 12 brain regions to robustly identify 12 brain-wide, 50 cross-regional and 114 region-specific coexpression modules. Nearly 40% of genes fall into brain-wide modules, while 25% comprise region-specific modules reflecting regional biology, such as oxytocin signaling in the hypothalamus, or addiction pathways in the nucleus accumbens. Schizophrenia and autism genetic risk are enriched in brain-wide and multiregional modules, indicative of broad impact; these modules implicate neuronal proliferation and activity-dependent processes, including endocytosis and splicing, in disease pathophysiology. We find that cell-type-specific long noncoding RNA and gene isoforms contribute substantially to regional synaptic diversity and that constrained, mutation-intolerant genes are primarily enriched in neurons. We leverage these data using an omnigenic-inspired network framework to characterize how coexpression and gene regulatory networks reflect neuropsychiatric disease risk, supporting polygenic models.
Assuntos
Encéfalo/fisiopatologia , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/fisiologia , Predisposição Genética para Doença/genética , Transtornos Mentais/genética , Humanos , Transtornos Mentais/fisiopatologia , TranscriptomaRESUMO
This corrects the article DOI: 10.1038/sdata.2017.179.
RESUMO
To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2.
Assuntos
Diabetes Mellitus Tipo 2/genética , Jejum/metabolismo , Resistência à Insulina/genética , Insulina/metabolismo , Proteínas Proto-Oncogênicas c-akt/genética , População Branca/genética , Negro ou Afro-Americano/genética , Alelos , Povo Asiático/genética , Estudos de Casos e Controles , Diabetes Mellitus Tipo 2/metabolismo , Finlândia , Frequência do Gene , Predisposição Genética para Doença , Genótipo , Hispânico ou Latino/genética , Humanos , Razão de ChancesRESUMO
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Assuntos
Diabetes Mellitus Tipo 2/genética , Variação Genética , Humanos , População BrancaRESUMO
This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.
Assuntos
Variação Genética , Genoma Humano , Software , Calibragem , Bases de Dados Genéticas , Haploidia , Haplótipos/genética , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de SequênciaRESUMO
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (â¼4×) 1000 Genomes Project datasets.