Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Sci Adv ; 10(21): eadj4452, 2024 May 24.
Article in English | MEDLINE | ID: mdl-38781344

ABSTRACT

Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.


Subject(s)
Brain , Epigenesis, Genetic , Regulatory Sequences, Nucleic Acid , Humans , Brain/metabolism , Regulatory Sequences, Nucleic Acid/genetics , Animals , Evolution, Molecular , Mental Disorders/genetics , Regulatory Elements, Transcriptional/genetics , Neurons/metabolism , Gene Expression Regulation , Transcription Factors/genetics , Transcription Factors/metabolism
2.
Hepatol Commun ; 7(10)2023 10 01.
Article in English | MEDLINE | ID: mdl-37756045

ABSTRACT

BACKGROUND: Genome-wide association studies (GWAS) have identified 30 risk loci for primary sclerosing cholangitis (PSC). Variants within these loci are found predominantly in noncoding regions of DNA making their mechanisms of conferring risk hard to define. Epigenomic studies have shown noncoding variants broadly impact regulatory element activity. The possible association of noncoding PSC variants with regulatory element activity has not been studied. We aimed to (1) determine if the noncoding risk variants in PSC impact regulatory element function and (2) if so, assess the role these regulatory elements have in explaining the genetic risk for PSC. METHODS: Available epigenomic datasets were integrated to build a comprehensive atlas of cell type-specific regulatory elements, emphasizing PSC-relevant cell types. RNA-seq and ATAC-seq were performed on peripheral CD4+ T cells from 10 PSC patients and 11 healthy controls. Computational techniques were used to (1) study the enrichment of PSC-risk variants within regulatory elements, (2) correlate risk genotype with differences in regulatory element activity, and (3) identify regulatory elements differentially active and genes differentially expressed between PSC patients and controls. RESULTS: Noncoding PSC-risk variants are strongly enriched within immune-specific enhancers, particularly ones involved in T-cell response to antigenic stimulation. In total, 250 genes and >10,000 regulatory elements were identified that are differentially active between patients and controls. CONCLUSIONS: Mechanistic effects are proposed for variants at 6 PSC-risk loci where genotype was linked with differential T-cell regulatory element activity. Regulatory elements are shown to play a key role in PSC pathophysiology.


Subject(s)
Cholangitis, Sclerosing , Genome-Wide Association Study , Humans , Cholangitis, Sclerosing/genetics , Chromatin Immunoprecipitation Sequencing , Genotype
3.
Science ; 380(6643): eabn7930, 2023 04 28.
Article in English | MEDLINE | ID: mdl-37104580

ABSTRACT

Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.


Subject(s)
Evolution, Molecular , Genome, Human , Mammals , Regulatory Elements, Transcriptional , Transcription Factors , Animals , Humans , Binding Sites , DNA Transposable Elements , Mammals/classification , Mammals/genetics , Primates/classification , Primates/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , Phylogeny
4.
Hum Mol Genet ; 31(R1): R114-R122, 2022 10 20.
Article in English | MEDLINE | ID: mdl-36083269

ABSTRACT

Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.


Subject(s)
Gene Expression Regulation , Regulatory Sequences, Nucleic Acid , Humans , Gene Expression Regulation/genetics , Genome, Human/genetics , Chromosome Mapping , DNA/genetics
6.
Nucleic Acids Res ; 50(D1): D141-D149, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34755879

ABSTRACT

The human genome contains ∼2000 transcriptional regulatory proteins, including ∼1600 DNA-binding transcription factors (TFs) recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX, and in vivo, using techniques including ChIP-seq. We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. Factorbook is publicly available at www.factorbook.org; we will continue to expand the resource as ENCODE Phase IV data are released.


Subject(s)
Databases, Genetic , Nucleotide Motifs/genetics , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/genetics , Binding Sites/genetics , Gene Expression Regulation/genetics , Humans , Transcription Factors/classification
7.
Genome Res ; 32(2): 389-402, 2022 02.
Article in English | MEDLINE | ID: mdl-34949670

ABSTRACT

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


Subject(s)
Gene Expression Regulation , Genome-Wide Association Study , Humans , Promoter Regions, Genetic , Transcription Initiation Site
8.
Prog Mol Biol Transl Sci ; 181: 31-43, 2021.
Article in English | MEDLINE | ID: mdl-34127199

ABSTRACT

The clustered, regularly interspersed, short palindromic repeats (CRISPR) technology is revolutionizing biological studies and holds tremendous promise for treating human diseases. However, a significant limitation of this technology is that modifications can occur on off-target sites lacking perfect complementarity to the single guide RNA (sgRNA) or canonical protospacer-adjacent motif (PAM) sequence. Several in vivo and in vitro genome-wide off-target profiling approaches have been developed to inform on the fidelity of gene editing. Of these, GUIDE-seq has become one of the most widely adopted and reproducible methods. To allow users to easily analyze GUIDE-seq data generated on any sequencing platform, we developed an open-source pipeline, GS-Preprocess, that takes standard base-call output in bcl format and generate all required input data for off-target identification using bioconductor package GUIDEseq for off-target identification. Furthermore, we created a Docker image with GS-Proprocess, GUIDE-seq, and all its R and system dependencies already installed. The bundled pipeline will empower end users to streamline the analysis of GUIDE-seq data and motivate their use of higher throughput sequencing with increased multiplexing for GUIDE-seq experiments.


Subject(s)
CRISPR-Cas Systems , RNA, Guide, Kinetoplastida , CRISPR-Cas Systems/genetics , Gene Editing , High-Throughput Nucleotide Sequencing , Humans
9.
Commun Biol ; 4(1): 239, 2021 02 22.
Article in English | MEDLINE | ID: mdl-33619351

ABSTRACT

The morphologically and functionally distinct cell types of a multicellular organism are maintained by their unique epigenomes and gene expression programs. Phase III of the ENCODE Project profiled 66 mouse epigenomes across twelve tissues at daily intervals from embryonic day 11.5 to birth. Applying the ChromHMM algorithm to these epigenomes, we annotated eighteen chromatin states with characteristics of promoters, enhancers, transcribed regions, repressed regions, and quiescent regions. Our integrative analyses delineate the tissue specificity and developmental trajectory of the loci in these chromatin states. Approximately 0.3% of each epigenome is assigned to a bivalent chromatin state, which harbors both active marks and the repressive mark H3K27me3. Highly evolutionarily conserved, these loci are enriched in silencers bound by polycomb repressive complex proteins, and the transcription start sites of their silenced target genes. This collection of chromatin state assignments provides a useful resource for studying mammalian development.


Subject(s)
Chromatin Assembly and Disassembly , Epigenesis, Genetic , Epigenome , Animals , Binding Sites , DNA Methylation , Epigenomics , Gene Expression Regulation, Developmental , Gestational Age , Histones/metabolism , Mice, Inbred C57BL , Polycomb Repressive Complex 2/genetics , Polycomb Repressive Complex 2/metabolism , Promoter Regions, Genetic
10.
Nature ; 583(7818): 699-710, 2020 07.
Article in English | MEDLINE | ID: mdl-32728249

ABSTRACT

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Subject(s)
DNA/genetics , Databases, Genetic , Genome/genetics , Genomics , Molecular Sequence Annotation , Registries , Regulatory Sequences, Nucleic Acid/genetics , Animals , Chromatin/genetics , Chromatin/metabolism , DNA/chemistry , DNA Footprinting , DNA Methylation/genetics , DNA Replication Timing , Deoxyribonuclease I/metabolism , Genome, Human , Histones/metabolism , Humans , Mice , Mice, Transgenic , RNA-Binding Proteins/genetics , Transcription, Genetic/genetics , Transposases/metabolism
11.
Genome Biol ; 21(1): 17, 2020 01 22.
Article in English | MEDLINE | ID: mdl-31969180

ABSTRACT

BACKGROUND: Many genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. RESULTS: To facilitate the development of computational methods for predicting target genes, we develop a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the recently developed Registry of cCREs with experimentally derived genomic interactions. We use BENGI to test several published computational methods for linking enhancers with genes, including signal correlation and the TargetFinder and PEP supervised learning methods. We find that while TargetFinder is the best-performing method, it is only modestly better than a baseline distance method for most benchmark datasets when trained and tested with the same cell type and that TargetFinder often does not outperform the distance method when applied across cell types. CONCLUSIONS: Our results suggest that current computational methods need to be improved and that BENGI presents a useful framework for method development and testing.


Subject(s)
Enhancer Elements, Genetic , Benchmarking , Data Curation , Gene Expression Regulation , Genomics , Machine Learning
12.
Nucleic Acids Res ; 46(21): 11184-11201, 2018 11 30.
Article in English | MEDLINE | ID: mdl-30137428

ABSTRACT

Enhancers are distal cis-regulatory elements that modulate gene expression. They are depleted of nucleosomes and enriched in specific histone modifications; thus, calling DNase-seq and histone mark ChIP-seq peaks can predict enhancers. We evaluated nine peak-calling algorithms for predicting enhancers validated by transgenic mouse assays. DNase and H3K27ac peaks were consistently more predictive than H3K4me1/2/3 and H3K9ac peaks. DFilter and Hotspot2 were the best DNase peak callers, while HOMER, MUSIC, MACS2, DFilter and F-seq were the best H3K27ac peak callers. We observed that the differential DNase or H3K27ac signals between two distant tissues increased the area under the precision-recall curve (PR-AUC) of DNase peaks by 17.5-166.7% and that of H3K27ac peaks by 7.1-22.2%. We further improved this differential signal method using multiple contrast tissues. Evaluated using a blind test, the differential H3K27ac signal method substantially improved PR-AUC from 0.48 to 0.75 for predicting heart enhancers. We further validated our approach using postnatal retina and cerebral cortex enhancers identified by massively parallel reporter assays, and observed improvements for both tissues. In summary, we compared nine peak callers and devised a superior method for predicting tissue-specific mouse developmental enhancers by reranking the called peaks.


Subject(s)
Algorithms , Chromatin/genetics , Computational Biology/methods , Enhancer Elements, Genetic/genetics , Histone Code/genetics , Animals , Binding Sites , Chromatin/metabolism , Histones/metabolism , Mice, Transgenic , Organ Specificity , Protein Processing, Post-Translational , Transcription Factors/metabolism
13.
J Immunol ; 190(11): 5578-87, 2013 Jun 01.
Article in English | MEDLINE | ID: mdl-23616578

ABSTRACT

Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated EST road map of gene expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes.


Subject(s)
Expressed Sequence Tags , Gene Expression Profiling , Genome-Wide Association Study , Immune System/metabolism , Animals , Cluster Analysis , Computational Biology/methods , DNA-Binding Proteins/genetics , Databases, Nucleic Acid , Gene Regulatory Networks , Genomics , Humans , Immune System Diseases/genetics , Lymphoma, B-Cell/genetics , Mice , Molecular Sequence Annotation , RNA, Long Noncoding/genetics , Reproducibility of Results , Transcription Factors/genetics , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...