RESUMO
Recent large genome-wide association studies have identified multiple confident risk loci linked to addiction-associated behavioral traits. Most genetic variants linked to addiction-associated traits lie in noncoding regions of the genome, likely disrupting cis-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied stratified linkage disequilibrium score regression to compare genome-wide association studies to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative CREs marked by open chromatin in neuronal (NeuN+) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of female and male mouse neuronal subtypes: cortical excitatory, D1, D2, and PV. Last, we developed machine learning models to predict mouse cell type-specific open chromatin, enabling us to further categorize human NeuN+ open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuronal subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.SIGNIFICANCE STATEMENT We combine statistical genetic and machine learning techniques to find that the predisposition to for nicotine, alcohol, and cannabis use behaviors can be partially explained by genetic variants in conserved regulatory elements within specific brain regions and neuronal subtypes of the reward system. Our computational framework can flexibly integrate open chromatin data across species to screen for putative causal variants in a cell type- and tissue-specific manner for numerous complex traits.
Assuntos
Comportamento Aditivo/genética , Encéfalo/fisiologia , Predisposição Genética para Doença/genética , Variação Genética/fisiologia , Neurônios/fisiologia , Elementos Reguladores de Transcrição/fisiologia , Animais , Comportamento Aditivo/patologia , Encéfalo/patologia , Bases de Dados Genéticas , Feminino , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Transgênicos , Neurônios/patologia , Locos de Características Quantitativas/genéticaRESUMO
BACKGROUND: Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high. RESULTS: We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin. CONCLUSION: The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements.
Assuntos
Cromatina , Elementos Facilitadores Genéticos , Animais , Cromatina/genética , Humanos , Mamíferos/genética , Redes Neurais de Computação , NucleotídeosRESUMO
Neuron subtype dysfunction is a key contributor to neurologic disease circuits, but identifying associated gene regulatory pathways is complicated by the molecular complexity of the brain. For example, parvalbumin-expressing (PV+) neurons in the external globus pallidus (GPe) are critically involved in the motor deficits of dopamine-depleted mouse models of Parkinson's disease, where cell type-specific optogenetic stimulation of PV+ neurons over other neuron populations rescues locomotion. Despite the distinct roles these cell types play in the neural circuit, the molecular correlates remain unknown because of the difficulty of isolating rare neuron subtypes. To address this issue, we developed a new viral affinity purification strategy, Cre-Specific Nuclear Anchored Independent Labeling, to isolate Cre recombinase-expressing (Cre+) nuclei from the adult mouse brain. Applying this technology, we performed targeted assessments of the cell type-specific transcriptomic and epigenetic effects of dopamine depletion on PV+ and PV- cells within three brain regions of male and female mice: GPe, striatum, and cortex. We found GPe PV+ neuron-specific gene expression changes that suggested increased hypoxia-inducible factor 2α signaling. Consistent with transcriptomic data, regions of open chromatin affected by dopamine depletion within GPe PV+ neurons were enriched for hypoxia-inducible factor family binding motifs. The gene expression and epigenomic experiments performed on PV+ neurons isolated by Cre-Specific Nuclear Anchored Independent Labeling identified a transcriptional regulatory network mediated by the neuroprotective factor Hif2a as underlying neural circuit differences in response to dopamine depletion.SIGNIFICANCE STATEMENT Cre-Specific Nuclear Anchored Independent Labeling is an enhanced, virus-based approach to isolate nuclei of a specific cell type for transcriptome and epigenome interrogation that decreases dependency on transgenic animals. Applying this technology to GPe parvalbumin-expressing neurons in a mouse model of Parkinson's disease, we discovered evidence for an upregulation of the oxygen homeostasis maintaining pathway involving Hypoxia-inducible factor 2α. These results provide new insight into how neuron subtypes outside the substantia nigra pars compacta may be compensating at a molecular level for differences in the motor production neural circuit during the progression of Parkinson's disease. Furthermore, they emphasize the utility of cell type-specific technologies, such as Cre-Specific Nuclear Anchored Independent Labeling, for isolated assessment of specific neuron subtypes in complex systems.
Assuntos
Globo Pálido/metabolismo , Neurônios/metabolismo , Estresse Oxidativo/fisiologia , Doença de Parkinson Secundária/metabolismo , Animais , Córtex Cerebral/metabolismo , Corpo Estriado/metabolismo , Camundongos , Camundongos Transgênicos , Oxidopamina , Doença de Parkinson Secundária/induzido quimicamenteRESUMO
Knowledge concerning the taxonomic diversity of marine organisms is crucial for understanding processes associated with species diversification in geographic areas that are devoid of obvious barriers to dispersal. The marine gastropod family Conidae contains many species complexes due to lack of clear morphological distinctiveness and existence of morphological intergradations among described species. Conus flavidus Lamarck, 1810 and Conus frigidus Reeve, 1848 are currently recognized as distinct taxa, but are often difficult to distinguish by morphological characters and include several synonyms, including Conus peasei Brazier, 1877. C. peasei was originally described by Pease in 1861 (as Conus neglectus) based on slight morphological differences of a population of C. flavidus from Hawaii that distinguished it from C. flavidus from elsewhere. To evaluate the systematics of this group and specifically test the hypothesis of synonymy of C. peasei with C. flavidus, we examined molecular and morphometric data from specimens of C. flavidus, C. frigidus and C. peasei (i.e., C. flavidus from Hawaii). Multiple clades that contain individuals from particular geographic regions are apparent in gene trees constructed from sequences of a mitochondrial gene region. In particular, sequences of C. peasei cluster together separately from sequences of C. flavidus and C. frigidus. Although individuals of C. peasei, C. flavidus and C. frigidus each contain a unique set of alleles for a nuclear locus, a conotoxin gene, alleles of C. peasei are more similar to those of C. flavidus. In addition, sequences of a region of a second nuclear gene are identical among C. peasei and C. flavidus though they are distinct from sequences of C. frigidus. Morphometric data revealed that shells of C. peasei are distinct in some aspects, but are more similar to those of C. flavidus than to those of C. frigidus. Taken together, these results suggest that C. peasei represents a distinct species. Moreover, based on the contradictory relationships inferred from the mitochondrial and nuclear sequences (as well as morphometric data), C. peasei may have originated through past hybridization among the ancestral lineages that gave rise to C. flavidus and C. frigidus.
Assuntos
Gastrópodes/genética , Animais , Conotoxinas/genética , Caramujo Conus/classificação , Gastrópodes/classificação , Genes Mitocondriais , Havaí , Hibridização Genética , FilogeniaRESUMO
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, anatomical, and neurophysiological data from the Egyptian fruit bat (Rousettus aegyptiacus) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Assuntos
Elementos Facilitadores Genéticos , Eutérios , Evolução Molecular , Regulação da Expressão Gênica , Córtex Motor , Neurônios Motores , Proteínas , Vocalização Animal , Animais , Quirópteros/genética , Quirópteros/fisiologia , Vocalização Animal/fisiologia , Córtex Motor/citologia , Córtex Motor/fisiologia , Cromatina/metabolismo , Neurônios Motores/fisiologia , Laringe/fisiologia , Epigênese Genética , Genoma , Proteínas/genética , Proteínas/metabolismo , Sequência de Aminoácidos , Eutérios/genética , Eutérios/fisiologia , Aprendizado de MáquinaRESUMO
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Assuntos
Elementos Facilitadores Genéticos , Variação Genética , Aprendizado de Máquina , Mamíferos , Animais , Mamíferos/genética , FenótipoRESUMO
Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
RESUMO
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Assuntos
Eutérios , Evolução Molecular , Animais , Feminino , Humanos , Sequência Conservada/genética , Eutérios/genética , Genoma HumanoRESUMO
Recent discoveries of extreme cellular diversity in the brain warrant rapid development of technologies to access specific cell populations within heterogeneous tissue. Available approaches for engineering-targeted technologies for new neuron subtypes are low yield, involving intensive transgenic strain or virus screening. Here, we present Specific Nuclear-Anchored Independent Labeling (SNAIL), an improved virus-based strategy for cell labeling and nuclear isolation from heterogeneous tissue. SNAIL works by leveraging machine learning and other computational approaches to identify DNA sequence features that confer cell type-specific gene activation and then make a probe that drives an affinity purification-compatible reporter gene. As a proof of concept, we designed and validated two novel SNAIL probes that target parvalbumin-expressing (PV+) neurons. Nuclear isolation using SNAIL in wild-type mice is sufficient to capture characteristic open chromatin features of PV+ neurons in the cortex, striatum, and external globus pallidus. The SNAIL framework also has high utility for multispecies cell probe engineering; expression from a mouse PV+ SNAIL enhancer sequence was enriched in PV+ neurons of the macaque cortex. Expansion of this technology has broad applications in cell type-specific observation, manipulation, and therapeutics across species and disease models.