Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 66
Filter
Add more filters

Publication year range
1.
Cell ; 155(5): 1022-33, 2013 Nov 21.
Article in English | MEDLINE | ID: mdl-24267888

ABSTRACT

Sequence polymorphisms linked to human diseases and phenotypes in genome-wide association studies often affect noncoding regions. A SNP within an intron of the gene encoding Interferon Regulatory Factor 4 (IRF4), a transcription factor with no known role in melanocyte biology, is strongly associated with sensitivity of skin to sun exposure, freckles, blue eyes, and brown hair color. Here, we demonstrate that this SNP lies within an enhancer of IRF4 transcription in melanocytes. The allele associated with this pigmentation phenotype impairs binding of the TFAP2A transcription factor that, together with the melanocyte master regulator MITF, regulates activity of the enhancer. Assays in zebrafish and mice reveal that IRF4 cooperates with MITF to activate expression of Tyrosinase (TYR), an essential enzyme in melanin synthesis. Our findings provide a clear example of a noncoding polymorphism that affects a phenotype by modulating a developmental gene regulatory network.


Subject(s)
Interferon Regulatory Factors/metabolism , Polymorphism, Single Nucleotide , Animals , Base Sequence , Enhancer Elements, Genetic , Humans , Interferon Regulatory Factors/chemistry , Interferon Regulatory Factors/genetics , Melanocytes/metabolism , Mice , Molecular Sequence Data , Pigmentation , Signal Transduction , Transcription Factor AP-2/chemistry , Transcription Factor AP-2/metabolism , Zebrafish
2.
BMC Genomics ; 24(1): 306, 2023 Jun 07.
Article in English | MEDLINE | ID: mdl-37286935

ABSTRACT

To overcome the ethical and technical limitations of in vivo human disease models, the broader scientific community frequently employs model organism-derived cell lines to investigate disease mechanisms, pathways, and therapeutic strategies. Despite the widespread use of certain in vitro models, many still lack contemporary genomic analysis supporting their use as a proxy for the affected human cells and tissues. Consequently, it is imperative to determine how accurately and effectively any proposed biological surrogate may reflect the biological processes it is assumed to model. One such cellular surrogate of human disease is the established mouse neural precursor cell line, SN4741, which has been used to elucidate mechanisms of neurotoxicity in Parkinson disease for over 25 years. Here, we are using a combination of classic and contemporary genomic techniques - karyotyping, RT-qPCR, single cell RNA-seq, bulk RNA-seq, and ATAC-seq - to characterize the transcriptional landscape, chromatin landscape, and genomic architecture of this cell line, and evaluate its suitability as a proxy for midbrain dopaminergic neurons in the study of Parkinson disease. We find that SN4741 cells possess an unstable triploidy and consistently exhibits low expression of dopaminergic neuron markers across assays, even when the cell line is shifted to the non-permissive temperature that drives differentiation. The transcriptional signatures of SN4741 cells suggest that they are maintained in an undifferentiated state at the permissive temperature and differentiate into immature neurons at the non-permissive temperature; however, they may not be dopaminergic neuron precursors, as previously suggested. Additionally, the chromatin landscapes of SN4741 cells, in both the differentiated and undifferentiated states, are not concordant with the open chromatin profiles of ex vivo, mouse E15.5 forebrain- or midbrain-derived dopaminergic neurons. Overall, our data suggest that SN4741 cells may reflect early aspects of neuronal differentiation but are likely not a suitable proxy for dopaminergic neurons as previously thought. The implications of this study extend broadly, illuminating the need for robust biological and genomic rationale underpinning the use of in vitro models of molecular processes.


Subject(s)
Dopaminergic Neurons , Parkinson Disease , Mice , Humans , Animals , Dopaminergic Neurons/metabolism , Parkinson Disease/genetics , Parkinson Disease/metabolism , Mesencephalon/metabolism , Cell Line , Cell Differentiation , Chromatin/metabolism
3.
Hum Mol Genet ; 30(6): 485-499, 2021 04 30.
Article in English | MEDLINE | ID: mdl-33693707

ABSTRACT

Pancreatic ductal adenocarcinoma (PDAC) is an aggressive form of cancer with high mortality. The cellular origins of PDAC are largely unknown; however, ductal cells, especially centroacinar cells (CACs), have several characteristics in common with PDAC, such as expression of SOX9 and components of the Notch-signaling pathway. Mutations in KRAS and alterations to Notch signaling are common in PDAC, and both these pathways regulate the transcription factor SOX9. To identify genes regulated by SOX9, we performed siRNA knockdown of SOX9 followed by RNA-seq in PANC-1s, a human PDAC cell line. We report 93 differentially expressed (DE) genes, with convergence on alterations to Notch-signaling pathways and ciliogenesis. These results point to SOX9 and Notch activity being in a positive feedback loop and SOX9 regulating cilia production in PDAC. We additionally performed ChIP-seq in PANC-1s to identify direct targets of SOX9 binding and integrated these results with our DE gene list. Nine of the top 10 downregulated genes have evidence of direct SOX9 binding at their promoter regions. One of these targets was the cancer stem cell marker EpCAM. Using whole-mount in situ hybridization to detect epcam transcript in zebrafish larvae, we demonstrated that epcam is a CAC marker and that Sox9 regulation of epcam expression is conserved in zebrafish. Additionally, we generated an epcam null mutant and observed pronounced defects in ciliogenesis during development. Our results provide a link between SOX9, EpCAM and ciliary repression that can be exploited in improving our understanding of the cellular origins and mechanisms of PDAC.


Subject(s)
Biomarkers, Tumor/metabolism , Carcinoma, Pancreatic Ductal/pathology , Cilia/genetics , Epithelial Cell Adhesion Molecule/metabolism , Pancreatic Neoplasms/pathology , SOX9 Transcription Factor/metabolism , Animals , Biomarkers, Tumor/genetics , Carcinoma, Pancreatic Ductal/genetics , Carcinoma, Pancreatic Ductal/metabolism , Cell Movement , Cell Proliferation , Cilia/metabolism , Epithelial Cell Adhesion Molecule/genetics , Humans , Pancreatic Neoplasms/genetics , Pancreatic Neoplasms/metabolism , SOX9 Transcription Factor/genetics , Signal Transduction , Zebrafish
4.
Genome Res ; 30(4): 528-539, 2020 04.
Article in English | MEDLINE | ID: mdl-32303558

ABSTRACT

Genome-wide association studies have implicated thousands of noncoding variants across common human phenotypes. However, they cannot directly inform the cellular context in which disease-associated variants act. Here, we use open chromatin profiles from discrete mouse cell populations to address this challenge. We applied stratified linkage disequilibrium score regression and evaluated heritability enrichment in 64 genome-wide association studies, emphasizing schizophrenia. We provide evidence that mouse-derived human open chromatin profiles can serve as powerful proxies for difficult to obtain human cell populations, facilitating the illumination of common disease heritability enrichment across an array of human phenotypes. We demonstrate that signatures from discrete subpopulations of cortical excitatory and inhibitory neurons are significantly enriched for schizophrenia heritability with maximal enrichment in cortical layer V excitatory neurons. We also show that differences between schizophrenia and bipolar disorder are concentrated in excitatory neurons in cortical layers II-III, IV, and V, as well as the dentate gyrus. Finally, we leverage these data to fine-map variants in 177 schizophrenia loci nominating variants in 104/177. We integrate these data with transcription factor binding site, chromatin interaction, and validated enhancer data, placing variants in the cellular context where they may modulate risk.


Subject(s)
Cerebral Cortex/metabolism , Chromatin/genetics , Genetic Predisposition to Disease , Inheritance Patterns , Schizophrenia/genetics , Animals , Cerebral Cortex/pathology , Chromosome Mapping , Computational Biology/methods , Databases, Genetic , Diagnosis, Differential , Disease Models, Animal , Genetic Association Studies , Genome-Wide Association Study , Genomics/methods , Hippocampus/metabolism , Hippocampus/pathology , Humans , Mice , Neurons/metabolism , Polymorphism, Single Nucleotide , Schizophrenia/diagnosis
5.
J Neuroinflammation ; 19(1): 223, 2022 Sep 08.
Article in English | MEDLINE | ID: mdl-36076238

ABSTRACT

Multifactorial diseases are characterized by inter-individual variation in etiology, age of onset, and penetrance. These diseases tend to be relatively common and arise from the combined action of genetic and environmental factors; however, parsing the convoluted mechanisms underlying these gene-by-environment interactions presents a significant challenge to their study and management. For neurodegenerative disorders, resolving this challenge is imperative, given the enormous health and societal burdens they impose. The mechanisms by which genetic and environmental effects may act in concert to destabilize homeostasis and elevate risk has become a major research focus in the study of common disease. Emphasis is further being placed on determining the extent to which a unifying biological principle may account for the progressively diminishing capacity of a system to buffer disease phenotypes, as risk for disease increases. Data emerging from studies of common, neurodegenerative diseases are providing insights to pragmatically connect mechanisms of genetic and environmental risk that previously seemed disparate. In this review, we discuss evidence positing inflammation as a unifying biological principle of homeostatic destabilization affecting the risk, onset, and progression of neurodegenerative diseases. Specifically, we discuss how genetic variation associated with Alzheimer disease and Parkinson disease may contribute to pro-inflammatory responses, how such underlying predisposition may be exacerbated by environmental insults, and how this common theme is being leveraged in the ongoing search for effective therapeutic interventions.


Subject(s)
Alzheimer Disease , Neurodegenerative Diseases , Parkinson Disease , Alzheimer Disease/genetics , Humans , Neurodegenerative Diseases/genetics , Neuroinflammatory Diseases , Parkinson Disease/epidemiology , Parkinson Disease/genetics , Risk Factors
6.
Am J Hum Genet ; 102(3): 427-446, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29499164

ABSTRACT

Genetic variation modulating risk of sporadic Parkinson disease (PD) has been primarily explored through genome-wide association studies (GWASs). However, like many other common genetic diseases, the impacted genes remain largely unknown. Here, we used single-cell RNA-seq to characterize dopaminergic (DA) neuron populations in the mouse brain at embryonic and early postnatal time points. These data facilitated unbiased identification of DA neuron subpopulations through their unique transcriptional profiles, including a postnatal neuroblast population and substantia nigra (SN) DA neurons. We use these population-specific data to develop a scoring system to prioritize candidate genes in all 49 GWAS intervals implicated in PD risk, including genes with known PD associations and many with extensive supporting literature. As proof of principle, we confirm that the nigrostriatal pathway is compromised in Cplx1-null mice. Ultimately, this systematic approach establishes biologically pertinent candidates and testable hypotheses for sporadic PD, informing a new era of PD genetic research.


Subject(s)
Dopaminergic Neurons/metabolism , Genetic Association Studies , Parkinson Disease/genetics , Parkinson Disease/pathology , Sequence Analysis, RNA , Single-Cell Analysis/methods , Animals , Cell Separation , Gene Regulatory Networks , Genetic Loci , Genetic Markers , Genome-Wide Association Study , Mice, Knockout , Substantia Nigra/pathology
7.
Am J Hum Genet ; 103(6): 874-892, 2018 12 06.
Article in English | MEDLINE | ID: mdl-30503521

ABSTRACT

The progressive loss of midbrain (MB) dopaminergic (DA) neurons defines the motor features of Parkinson disease (PD), and modulation of risk by common variants in PD has been well established through genome-wide association studies (GWASs). We acquired open chromatin signatures of purified embryonic mouse MB DA neurons because we anticipated that a fraction of PD-associated genetic variation might mediate the variants' effects within this neuronal population. Correlation with >2,300 putative enhancers assayed in mice revealed enrichment for MB cis-regulatory elements (CREs), and these data were reinforced by transgenic analyses of six additional sequences in zebrafish and mice. One CRE, within intron 4 of the familial PD gene SNCA, directed reporter expression in catecholaminergic neurons from transgenic mice and zebrafish. Sequencing of this CRE in 986 individuals with PD and 992 controls revealed two common variants associated with elevated PD risk. To assess potential mechanisms of action, we screened >16,000 proteins for DNA binding capacity and identified a subset whose binding is impacted by these enhancer variants. Additional genotyping across the SNCA locus identified a single PD-associated haplotype, containing the minor alleles of both of the aforementioned PD-risk variants. Our work posits a model for how common variation at SNCA might modulate PD risk and highlights the value of cell-context-dependent guided searches for functional non-coding variation.


Subject(s)
Chromatin/genetics , Dopaminergic Neurons/pathology , Enhancer Elements, Genetic/genetics , Genetic Predisposition to Disease/genetics , Parkinson Disease/genetics , alpha-Synuclein/genetics , Adult , Aged , Aged, 80 and over , Alleles , Animals , Disease Models, Animal , Female , Genotype , Humans , Introns/genetics , Male , Mice , Mice, Transgenic , Middle Aged , Pregnancy , Zebrafish
8.
Genomics ; 112(3): 2379-2384, 2020 05.
Article in English | MEDLINE | ID: mdl-31962144

ABSTRACT

Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.


Subject(s)
Cell Line , Genome, Human , Haploidy , Genomic Structural Variation , Humans , Hybrid Cells , Nanopore Sequencing , Reference Standards
9.
Am J Hum Genet ; 98(1): 58-74, 2016 Jan 07.
Article in English | MEDLINE | ID: mdl-26749308

ABSTRACT

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.


Subject(s)
Autistic Disorder/genetics , DNA/genetics , Genome, Human , Exome , Female , Humans , Male , Pedigree , Polymorphism, Single Nucleotide
10.
Am J Hum Genet ; 96(4): 581-96, 2015 Apr 02.
Article in English | MEDLINE | ID: mdl-25839327

ABSTRACT

Innervation of the gut is segmentally lost in Hirschsprung disease (HSCR), a consequence of cell-autonomous and non-autonomous defects in enteric neuronal cell differentiation, proliferation, migration, or survival. Rare, high-penetrance coding variants and common, low-penetrance non-coding variants in 13 genes are known to underlie HSCR risk, with the most frequent variants in the ret proto-oncogene (RET). We used a genome-wide association (220 trios) and replication (429 trios) study to reveal a second non-coding variant distal to RET and a non-coding allele on chromosome 7 within the class 3 Semaphorin gene cluster. Analysis in Ret wild-type and Ret-null mice demonstrates specific expression of Sema3a, Sema3c, and Sema3d in the enteric nervous system (ENS). In zebrafish embryos, sema3 knockdowns show reduction of migratory ENS precursors with complete ablation under conjoint ret loss of function. Seven candidate receptors of Sema3 proteins are also expressed within the mouse ENS and their expression is also lost in the ENS of Ret-null embryos. Sequencing of SEMA3A, SEMA3C, and SEMA3D in 254 HSCR-affected subjects followed by in silico protein structure modeling and functional analyses identified five disease-associated alleles with loss-of-function defects in semaphorin dimerization and binding to their cognate neuropilin and plexin receptors. Thus, semaphorin 3C/3D signaling is an evolutionarily conserved regulator of ENS development whose dys-regulation is a cause of enteric aganglionosis.


Subject(s)
Epistasis, Genetic/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation , Hirschsprung Disease/genetics , Proto-Oncogene Proteins c-ret/genetics , Semaphorins/genetics , Animals , Base Sequence , Genome-Wide Association Study , Mice , Molecular Sequence Data , Semaphorins/deficiency , Semaphorins/metabolism , Sequence Analysis, DNA
11.
Arterioscler Thromb Vasc Biol ; 37(9): 1727-1731, 2017 09.
Article in English | MEDLINE | ID: mdl-28751573

ABSTRACT

OBJECTIVE: Previous genetic lineage tracing studies showed that Sox10+ cells differentiate into vascular mural cells, limited to neural crest-derived blood vessels in craniofacial tissues, aortic arch, pulmonary arch arteries, brachiocephalic, carotid arteries, and thymus. The purpose of this study was to investigate the contribution of Sox10+ cells to the vascular development in other tissues and organs and their relationship with neural crest. APPROACH AND RESULTS: Using genetic lineage tracing technique based on Cre/LoxP system, we examined blood vessels in the adult organs of the mice expressing Sox10-Cre/Rosa-LoxP-red fluorescent protein or Wnt1-Cre/Rosa-LoxP-red fluorescent protein by immunohistological analysis. In addition to previously reported tissues and organs derived from neural crest, we showed that Sox10+ cells also contributed to vascular mural cells in the lung, spleen, and kidney, which are derived from non-neural crest origin as evidenced by red fluorescent protein-negative blood vessels in these 3 organs of Wnt1-Cre/Rosa-LoxP-red fluorescent protein mice. CONCLUSIONS: This study demonstrates that Sox10+ cells contribute to pericytes and smooth muscle cells in most parts of the body, including those from neural crest and non-neural crest, which has significant implications in vascular remodeling under physiological and pathological conditions.


Subject(s)
Cell Lineage , Kidney/blood supply , Lung/blood supply , Muscle, Smooth, Vascular/metabolism , Myocytes, Smooth Muscle/metabolism , Neural Crest/metabolism , Pericytes/metabolism , SOXE Transcription Factors/metabolism , Spleen/blood supply , Animals , Fluorescent Antibody Technique , Genotype , Integrases/genetics , Luminescent Proteins/biosynthesis , Luminescent Proteins/genetics , Mice, Transgenic , Morphogenesis , Muscle, Smooth, Vascular/cytology , Neovascularization, Physiologic , Neural Crest/cytology , Phenotype , SOXE Transcription Factors/genetics , Vascular Remodeling , Wnt1 Protein/genetics , Red Fluorescent Protein
12.
Hum Mol Genet ; 24(19): 5433-50, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26206884

ABSTRACT

SOX10 is required for melanocyte development and maintenance, and has been linked to melanoma initiation and progression. However, the molecular mechanisms by which SOX10 guides the appropriate gene expression programs necessary to promote the melanocyte lineage are not fully understood. Here we employ genetic and epigenomic analysis approaches to uncover novel genomic targets and previously unappreciated molecular roles of SOX10 in melanocytes. Through global analysis of SOX10-binding sites and epigenetic characteristics of chromatin states, we uncover an extensive catalog of SOX10 targets genome-wide. Our findings reveal that SOX10 predominantly engages 'open' chromatin regions and binds to distal regulatory elements, including novel and previously known melanocyte enhancers. Integrated chromatin occupancy and transcriptome analysis suggest a role for SOX10 in both transcriptional activation and repression to regulate functionally distinct classes of genes. We demonstrate that distinct epigenetic signatures and cis-regulatory sequence motifs predicted to bind putative co-regulatory transcription factors define SOX10-activated and SOX10-repressed target genes. Collectively, these findings uncover a central role of SOX10 as a global regulator of gene expression in the melanocyte lineage by targeting diverse regulatory pathways.


Subject(s)
Gene Expression Profiling/methods , Gene Regulatory Networks , Melanocytes/metabolism , Oligonucleotide Array Sequence Analysis/methods , SOXE Transcription Factors/metabolism , Animals , Binding Sites , Cell Line , Chromatin/genetics , Chromatin/metabolism , Epigenomics/methods , Melanocytes/cytology , Mice , SOXE Transcription Factors/chemistry , SOXE Transcription Factors/genetics
13.
Genome Res ; 22(11): 2278-89, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22759862

ABSTRACT

Illuminating the primary sequence encryption of enhancers is central to understanding the regulatory architecture of genomes. We have developed a machine learning approach to decipher motif patterns of hindbrain enhancers and identify 40,000 sequences in the human genome that we predict display regulatory control that includes the hindbrain. Consistent with their roles in hindbrain patterning, MEIS1, NKX6-1, as well as HOX and POU family binding motifs contributed strongly to this enhancer model. Predicted hindbrain enhancers are overrepresented at genes expressed in hindbrain and associated with nervous system development, and primarily reside in the areas of open chromatin. In addition, 77 (0.2%) of these predictions are identified as hindbrain enhancers on the VISTA Enhancer Browser, and 26,000 (60%) overlap enhancer marks (H3K4me1 or H3K27ac). To validate these putative hindbrain enhancers, we selected 55 elements distributed throughout our predictions and six low scoring controls for evaluation in a zebrafish transgenic assay. When assayed in mosaic transgenic embryos, 51/55 elements directed expression in the central nervous system. Furthermore, 30/34 (88%) predicted enhancers analyzed in stable zebrafish transgenic lines directed expression in the larval zebrafish hindbrain. Subsequent analysis of sequence fragments selected based upon motif clustering further confirmed the critical role of the motifs contributing to the classifier. Our results demonstrate the existence of a primary sequence code characteristic to hindbrain enhancers. This code can be accurately extracted using machine-learning approaches and applied successfully for de novo identification of hindbrain enhancers. This study represents a critical step toward the dissection of regulatory control in specific neuronal subtypes.


Subject(s)
Enhancer Elements, Genetic , Rhombencephalon/metabolism , Sequence Analysis, DNA/methods , Transcription, Genetic , Algorithms , Animals , Chromatin/metabolism , Gene Expression Regulation, Developmental , Genome, Human , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Humans , POU Domain Factors/genetics , POU Domain Factors/metabolism , Rhombencephalon/growth & development , Transcription Factors/genetics , Transcription Factors/metabolism , Zebrafish
14.
Genome Res ; 22(11): 2290-301, 2012 Nov.
Article in English | MEDLINE | ID: mdl-23019145

ABSTRACT

We take a comprehensive approach to the study of regulatory control of gene expression in melanocytes that proceeds from large-scale enhancer discovery facilitated by ChIP-seq; to rigorous validation in silico, in vitro, and in vivo; and finally to the use of machine learning to elucidate a regulatory vocabulary with genome-wide predictive power. We identify 2489 putative melanocyte enhancer loci in the mouse genome by ChIP-seq for EP300 and H3K4me1. We demonstrate that these putative enhancers are evolutionarily constrained, enriched for sequence motifs predicted to bind key melanocyte transcription factors, located near genes relevant to melanocyte biology, and capable of driving reporter gene expression in melanocytes in culture (86%; 43/50) and in transgenic zebrafish (70%; 7/10). Next, using the sequences of these putative enhancers as a training set for a supervised machine learning algorithm, we develop a vocabulary of 6-mers predictive of melanocyte enhancer function. Lastly, we demonstrate that this vocabulary has genome-wide predictive power in both the mouse and human genomes. This study provides deep insight into the regulation of gene expression in melanocytes and demonstrates a powerful approach to the investigation of regulatory sequences that can be applied to other cell types.


Subject(s)
Artificial Intelligence , Chromatin Immunoprecipitation/methods , Enhancer Elements, Genetic , Melanocytes/metabolism , Algorithms , Animals , E1A-Associated p300 Protein/genetics , E1A-Associated p300 Protein/metabolism , Evolution, Molecular , Gene Expression Regulation , Genes, Reporter , Genome, Human , Histones/metabolism , Humans , Mice , Sequence Analysis, DNA/methods , Transcription Factors/metabolism , Zebrafish
15.
Nucleic Acids Res ; 41(Web Server issue): W544-56, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23771147

ABSTRACT

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.


Subject(s)
High-Throughput Nucleotide Sequencing , Regulatory Elements, Transcriptional , Sequence Analysis, DNA , Software , Support Vector Machine , Transcription Factors/metabolism , Animals , Binding Sites , Genomics , Humans , Internet , Mice
16.
Genome Res ; 21(7): 1139-49, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21628450

ABSTRACT

Plasticity of gene regulatory encryption can permit DNA sequence divergence without loss of function. Functional information is preserved through conservation of the composition of transcription factor binding sites (TFBS) in a regulatory element. We have developed a method that can accurately identify pairs of functional noncoding orthologs at evolutionarily diverged loci by searching for conserved TFBS arrangements. With an estimated 5% false-positive rate (FPR) in approximately 3000 human and zebrafish syntenic loci, we detected approximately 300 pairs of diverged elements that are likely to share common ancestry and have similar regulatory activity. By analyzing a pool of experimentally validated human enhancers, we demonstrated that 7/8 (88%) of their predicted functional orthologs retained in vivo regulatory control. Moreover, in 5/7 (71%) of assayed enhancer pairs, we observed concordant expression patterns. We argue that TFBS composition is often necessary to retain and sufficient to predict regulatory function in the absence of overt sequence conservation, revealing an entire class of functionally conserved, evolutionarily diverged regulatory elements that we term "covert."


Subject(s)
Conserved Sequence , Enhancer Elements, Genetic , Gene Expression Regulation, Developmental , Sequence Analysis, DNA/methods , Animals , Animals, Genetically Modified/genetics , Computational Biology/methods , Evolution, Molecular , Genetic Loci , Genome, Human , Humans , Models, Genetic , Oligonucleotide Array Sequence Analysis , Sequence Alignment , Synteny , Transcription Factors/genetics , Zebrafish/genetics
17.
HGG Adv ; 5(3): 100303, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38702885

ABSTRACT

Recent collaborative genome-wide association studies (GWAS) have identified >200 independent loci contributing to risk for schizophrenia (SCZ). The genes closest to these loci have diverse functions, supporting the potential involvement of multiple relevant biological processes, yet there is no direct evidence that individual variants are functional or directly linked to specific genes. Nevertheless, overlap with certain epigenetic marks suggest that most GWAS-implicated variants are regulatory. Based on the strength of association with SCZ and the presence of regulatory epigenetic marks, we chose one such variant near TSNARE1 and ADGRB1, rs4129585, to test for functional potential and assay differences that may drive the pathogenicity of the risk allele. We observed that the variant-containing sequence drives reporter expression in relevant neuronal populations in zebrafish. Next, we introduced each allele into human induced pluripotent cells and differentiated four isogenic clones homozygous for the risk allele and five clones homozygous for the non-risk allele into neural progenitor cells. Employing RNA sequencing, we found that the two alleles yield significant transcriptional differences in the expression of 109 genes at a false discovery rate (FDR) of <0.05 and 259 genes at a FDR of <0.1. We demonstrate that these genes are highly interconnected in pathways enriched for synaptic proteins, axon guidance, and regulation of synapse assembly. Exploration of genes near rs4129585 suggests that this variant does not regulate TSNARE1 transcripts, as previously thought, but may regulate the neighboring ADGRB1, a regulator of synaptogenesis. Our results suggest that rs4129585 is a functional common variant that functions in specific pathways likely involved in SCZ risk.

18.
Hum Mol Genet ; 20(19): 3746-56, 2011 Oct 01.
Article in English | MEDLINE | ID: mdl-21737465

ABSTRACT

RET, a gene causatively mutated in Hirschsprung disease and cancer, has recently been implicated in breast cancer estrogen (E2) independence and tamoxifen resistance. RET displays both E2 and retinoic acid (RA)-dependent transcriptional modulation in E2-responsive breast cancers. However, the regulatory elements through which the steroid hormone transcriptional regulation of RET is mediated are poorly defined. Recent genome-wide chromatin immunoprecipitation-based studies have identified 10 putative E2 receptor-alpha (ESR1) and RA receptor alpha-binding sites at the RET locus, of which we demonstrate only two (RET -49.8 and RET +32.8) display significant E2 regulatory response when assayed independently in MCF-7 breast cancer cells. We demonstrate that endogenous RET expression and RET -49.8 regulatory activity are cooperatively regulated by E2 and RA in breast cancer cells. We identify key sequences that are required for RET -49.8 and RET +32.8 E2 responsiveness, including motifs known to be bound by ESR1, FOXA1 and TFAP2C. We also report that both RET -49.8 regulatory activity and endogenous RET expression are completely dependent on ESR1 for their (E2)-induction and that ESR1 is sufficient to mediate the E2-induced enhancer activity of RET -49.8 and RET +32.8. Finally, using zebrafish transgenesis, we also demonstrate that RET -49.8 directs reporter expression in the central nervous system and peripheral nervous system consistent with the endogenous ret expression. Taken collectively, these data suggest that RET transcription in breast cancer cells is modulated by E2 via ESR1 acting on multiple elements collectively.


Subject(s)
Breast Neoplasms/genetics , Enhancer Elements, Genetic , Estradiol/metabolism , Estrogens/metabolism , Gene Expression Regulation, Neoplastic , Proto-Oncogene Proteins c-ret/genetics , Response Elements , Tretinoin/metabolism , Animals , Binding Sites , Breast Neoplasms/metabolism , Cell Line, Tumor , Estrogen Receptor alpha/genetics , Estrogen Receptor alpha/metabolism , Female , Humans , Male , Protein Binding , Proto-Oncogene Proteins c-ret/metabolism , Zebrafish
19.
Article in English | MEDLINE | ID: mdl-20438361

ABSTRACT

Transcriptional regulation of gene expression plays a significant role in establishing the diversity of human cell types and biological functions from a common set of genes. The components of regulatory control in the human genome include cis-acting elements that act across immense genomic distances to influence the spatial and temporal distribution of gene expression. Here we review the established categories of distant-acting regulatory elements, discussing the classical and contemporary evidence of their regulatory potential and clinical importance. Current efforts to identify regulatory sequences throughout the genome and elucidate their biological significance depend heavily on advances in sequence conservation-based analyses and on increasingly large-scale efforts applying transgenic technologies in model organisms. We discuss the advantages and limitations of sequence conservation as a predictor of regulatory function and present complementary emerging technologies now being applied to annotate regulatory elements in vertebrate genomes.


Subject(s)
Gene Expression Regulation , Genome, Human , Regulatory Sequences, Nucleic Acid , Transcription, Genetic , Vertebrates/genetics , Animals , Genomics , Humans , Vertebrates/metabolism
20.
Am J Hum Genet ; 87(1): 60-74, 2010 Jul 09.
Article in English | MEDLINE | ID: mdl-20598273

ABSTRACT

The major gene for Hirschsprung disease (HSCR) encodes the receptor tyrosine kinase RET. In a study of 690 European- and 192 Chinese-descent probands and their parents or controls, we demonstrate the ubiquity of a >4-fold susceptibility from a C-->T allele (rs2435357: p = 3.9 x 10(-43) in European ancestry; p = 1.1 x 10(-21) in Chinese samples) that probably arose once within the intronic RET enhancer MCS+9.7. With in vitro assays, we now show that the T variant disrupts a SOX10 binding site within MCS+9.7 that compromises RET transactivation. The T allele, with a control frequency of 20%-30%/47% and case frequency of 54%-62%/88% in European/Chinese-ancestry individuals, is involved in all forms of HSCR. It is marginally associated with proband gender (p = 0.13) and significantly so with length of aganglionosis (p = 7.6 x 10(-5)) and familiality (p = 6.2 x 10(-4)). The enhancer variant is more frequent in the common forms of male, short-segment, and simplex families whereas multiple, rare, coding mutations are the norm in the less common and more severe forms of female, long-segment, and multiplex families. The T variant also increases penetrance in patients with rare RET coding mutations. Thus, both rare and common mutations, individually and together, make contributions to the risk of HSCR. The distribution of RET variants in diverse HSCR patients suggests a "cellular-recessive" genetic model where both RET alleles' function is compromised. The RET allelic series, and its genotype-phenotype correlations, shows that success in variant identification in complex disorders may strongly depend on which patients are studied.


Subject(s)
Hirschsprung Disease/genetics , Proto-Oncogene Proteins c-ret/genetics , Asian People , Base Sequence , Case-Control Studies , Enhancer Elements, Genetic , Female , Gene Frequency , Genome-Wide Association Study , Haplotypes , Hirschsprung Disease/ethnology , Hirschsprung Disease/physiopathology , Humans , Male , Mutation , Penetrance , Polymorphism, Single Nucleotide , Protein Binding , Proto-Oncogene Proteins c-ret/metabolism , SOXE Transcription Factors/metabolism , Sex Factors , Transcriptional Activation , White People
SELECTION OF CITATIONS
SEARCH DETAIL