Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nature ; 583(7818): 699-710, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728249

RESUMO

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Assuntos
DNA/genética , Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Sistema de Registros , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/genética , Cromatina/metabolismo , DNA/química , Pegada de DNA , Metilação de DNA/genética , Período de Replicação do DNA , Desoxirribonuclease I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Camundongos , Camundongos Transgênicos , Proteínas de Ligação a RNA/genética , Transcrição Gênica/genética , Transposases/metabolismo
2.
Genome Res ; 32(2): 389-402, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34949670

RESUMO

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


Assuntos
Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Regiões Promotoras Genéticas , Sítio de Iniciação de Transcrição
3.
Hum Mol Genet ; 31(R1): R114-R122, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-36083269

RESUMO

Every cell in the human body inherits a copy of the same genetic information. The three billion base pairs of DNA in the human genome, and the roughly 50 000 coding and non-coding genes they contain, must thus encode all the complexity of human development and cell and tissue type diversity. Differences in gene regulation, or the modulation of gene expression, enable individual cells to interpret the genome differently to carry out their specific functions. Here we discuss recent and ongoing efforts to build gene regulatory maps, which aim to characterize the regulatory roles of all sequences in a genome. Many researchers and consortia have identified such regulatory elements using functional assays and evolutionary analyses; we discuss the results, strengths and shortcomings of their approaches. We also discuss new techniques the field can leverage and emerging challenges it will face while striving to build gene regulatory maps of ever-increasing resolution and comprehensiveness.


Assuntos
Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Humanos , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Mapeamento Cromossômico , DNA/genética
4.
Nucleic Acids Res ; 50(D1): D141-D149, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34755879

RESUMO

The human genome contains ∼2000 transcriptional regulatory proteins, including ∼1600 DNA-binding transcription factors (TFs) recognizing characteristic sequence motifs to exert regulatory effects on gene expression. The binding specificities of these factors have been profiled both in vitro, using techniques such as HT-SELEX, and in vivo, using techniques including ChIP-seq. We previously developed Factorbook, a TF-centric database of annotations, motifs, and integrative analyses based on ChIP-seq data from Phase II of the ENCODE Project. Here we present an update to Factorbook which significantly expands the breadth of cell type and TF coverage. The update includes an expanded motif catalog derived from thousands of ENCODE Phase II and III ChIP-seq experiments and HT-SELEX experiments; this motif catalog is integrated with the ENCODE registry of candidate cis-regulatory elements to annotate a comprehensive collection of genome-wide candidate TF binding sites. The database also offers novel tools for applying the motif models within machine learning frameworks and using these models for integrative analysis, including annotation of variants and disease and trait heritability. Factorbook is publicly available at www.factorbook.org; we will continue to expand the resource as ENCODE Phase IV data are released.


Assuntos
Bases de Dados Genéticas , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/genética , Sítios de Ligação/genética , Regulação da Expressão Gênica/genética , Humanos , Fatores de Transcrição/classificação
6.
Nucleic Acids Res ; 46(21): 11184-11201, 2018 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-30137428

RESUMO

Enhancers are distal cis-regulatory elements that modulate gene expression. They are depleted of nucleosomes and enriched in specific histone modifications; thus, calling DNase-seq and histone mark ChIP-seq peaks can predict enhancers. We evaluated nine peak-calling algorithms for predicting enhancers validated by transgenic mouse assays. DNase and H3K27ac peaks were consistently more predictive than H3K4me1/2/3 and H3K9ac peaks. DFilter and Hotspot2 were the best DNase peak callers, while HOMER, MUSIC, MACS2, DFilter and F-seq were the best H3K27ac peak callers. We observed that the differential DNase or H3K27ac signals between two distant tissues increased the area under the precision-recall curve (PR-AUC) of DNase peaks by 17.5-166.7% and that of H3K27ac peaks by 7.1-22.2%. We further improved this differential signal method using multiple contrast tissues. Evaluated using a blind test, the differential H3K27ac signal method substantially improved PR-AUC from 0.48 to 0.75 for predicting heart enhancers. We further validated our approach using postnatal retina and cerebral cortex enhancers identified by massively parallel reporter assays, and observed improvements for both tissues. In summary, we compared nine peak callers and devised a superior method for predicting tissue-specific mouse developmental enhancers by reranking the called peaks.


Assuntos
Algoritmos , Cromatina/genética , Biologia Computacional/métodos , Elementos Facilitadores Genéticos/genética , Código das Histonas/genética , Animais , Sítios de Ligação , Cromatina/metabolismo , Histonas/metabolismo , Camundongos Transgênicos , Especificidade de Órgãos , Processamento de Proteína Pós-Traducional , Fatores de Transcrição/metabolismo
7.
J Immunol ; 190(11): 5578-87, 2013 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-23616578

RESUMO

Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated EST road map of gene expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes.


Assuntos
Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Sistema Imunitário/metabolismo , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Proteínas de Ligação a DNA/genética , Bases de Dados de Ácidos Nucleicos , Redes Reguladoras de Genes , Genômica , Humanos , Doenças do Sistema Imunitário/genética , Linfoma de Células B/genética , Camundongos , Anotação de Sequência Molecular , RNA Longo não Codificante/genética , Reprodutibilidade dos Testes , Fatores de Transcrição/genética , Transcriptoma
8.
Sci Adv ; 10(21): eadj4452, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38781344

RESUMO

Most genetic variants associated with psychiatric disorders are located in noncoding regions of the genome. To investigate their functional implications, we integrate epigenetic data from the PsychENCODE Consortium and other published sources to construct a comprehensive atlas of candidate brain cis-regulatory elements. Using deep learning, we model these elements' sequence syntax and predict how binding sites for lineage-specific transcription factors contribute to cell type-specific gene regulation in various types of glia and neurons. The elements' evolutionary history suggests that new regulatory information in the brain emerges primarily via smaller sequence mutations within conserved mammalian elements rather than entirely new human- or primate-specific sequences. However, primate-specific candidate elements, particularly those active during fetal brain development and in excitatory neurons and astrocytes, are implicated in the heritability of brain-related human traits. Additionally, we introduce PsychSCREEN, a web-based platform offering interactive visualization of PsychENCODE-generated genetic and epigenetic data from diverse brain cell types in individuals with psychiatric disorders and healthy controls.


Assuntos
Encéfalo , Epigênese Genética , Sequências Reguladoras de Ácido Nucleico , Humanos , Encéfalo/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Evolução Molecular , Transtornos Mentais/genética , Elementos Reguladores de Transcrição/genética , Neurônios/metabolismo , Regulação da Expressão Gênica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
9.
Science ; 380(6643): eabn7930, 2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37104580

RESUMO

Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.


Assuntos
Evolução Molecular , Genoma Humano , Mamíferos , Elementos Reguladores de Transcrição , Fatores de Transcrição , Animais , Humanos , Sítios de Ligação , Elementos de DNA Transponíveis , Mamíferos/classificação , Mamíferos/genética , Primatas/classificação , Primatas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Filogenia
10.
Hepatol Commun ; 7(10)2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37756045

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) have identified 30 risk loci for primary sclerosing cholangitis (PSC). Variants within these loci are found predominantly in noncoding regions of DNA making their mechanisms of conferring risk hard to define. Epigenomic studies have shown noncoding variants broadly impact regulatory element activity. The possible association of noncoding PSC variants with regulatory element activity has not been studied. We aimed to (1) determine if the noncoding risk variants in PSC impact regulatory element function and (2) if so, assess the role these regulatory elements have in explaining the genetic risk for PSC. METHODS: Available epigenomic datasets were integrated to build a comprehensive atlas of cell type-specific regulatory elements, emphasizing PSC-relevant cell types. RNA-seq and ATAC-seq were performed on peripheral CD4+ T cells from 10 PSC patients and 11 healthy controls. Computational techniques were used to (1) study the enrichment of PSC-risk variants within regulatory elements, (2) correlate risk genotype with differences in regulatory element activity, and (3) identify regulatory elements differentially active and genes differentially expressed between PSC patients and controls. RESULTS: Noncoding PSC-risk variants are strongly enriched within immune-specific enhancers, particularly ones involved in T-cell response to antigenic stimulation. In total, 250 genes and >10,000 regulatory elements were identified that are differentially active between patients and controls. CONCLUSIONS: Mechanistic effects are proposed for variants at 6 PSC-risk loci where genotype was linked with differential T-cell regulatory element activity. Regulatory elements are shown to play a key role in PSC pathophysiology.


Assuntos
Colangite Esclerosante , Estudo de Associação Genômica Ampla , Humanos , Colangite Esclerosante/genética , Sequenciamento de Cromatina por Imunoprecipitação , Genótipo
11.
Prog Mol Biol Transl Sci ; 181: 31-43, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34127199

RESUMO

The clustered, regularly interspersed, short palindromic repeats (CRISPR) technology is revolutionizing biological studies and holds tremendous promise for treating human diseases. However, a significant limitation of this technology is that modifications can occur on off-target sites lacking perfect complementarity to the single guide RNA (sgRNA) or canonical protospacer-adjacent motif (PAM) sequence. Several in vivo and in vitro genome-wide off-target profiling approaches have been developed to inform on the fidelity of gene editing. Of these, GUIDE-seq has become one of the most widely adopted and reproducible methods. To allow users to easily analyze GUIDE-seq data generated on any sequencing platform, we developed an open-source pipeline, GS-Preprocess, that takes standard base-call output in bcl format and generate all required input data for off-target identification using bioconductor package GUIDEseq for off-target identification. Furthermore, we created a Docker image with GS-Proprocess, GUIDE-seq, and all its R and system dependencies already installed. The bundled pipeline will empower end users to streamline the analysis of GUIDE-seq data and motivate their use of higher throughput sequencing with increased multiplexing for GUIDE-seq experiments.


Assuntos
Sistemas CRISPR-Cas , RNA Guia de Cinetoplastídeos , Sistemas CRISPR-Cas/genética , Edição de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
12.
Commun Biol ; 4(1): 239, 2021 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-33619351

RESUMO

The morphologically and functionally distinct cell types of a multicellular organism are maintained by their unique epigenomes and gene expression programs. Phase III of the ENCODE Project profiled 66 mouse epigenomes across twelve tissues at daily intervals from embryonic day 11.5 to birth. Applying the ChromHMM algorithm to these epigenomes, we annotated eighteen chromatin states with characteristics of promoters, enhancers, transcribed regions, repressed regions, and quiescent regions. Our integrative analyses delineate the tissue specificity and developmental trajectory of the loci in these chromatin states. Approximately 0.3% of each epigenome is assigned to a bivalent chromatin state, which harbors both active marks and the repressive mark H3K27me3. Highly evolutionarily conserved, these loci are enriched in silencers bound by polycomb repressive complex proteins, and the transcription start sites of their silenced target genes. This collection of chromatin state assignments provides a useful resource for studying mammalian development.


Assuntos
Montagem e Desmontagem da Cromatina , Epigênese Genética , Epigenoma , Animais , Sítios de Ligação , Metilação de DNA , Epigenômica , Regulação da Expressão Gênica no Desenvolvimento , Idade Gestacional , Histonas/metabolismo , Camundongos Endogâmicos C57BL , Complexo Repressor Polycomb 2/genética , Complexo Repressor Polycomb 2/metabolismo , Regiões Promotoras Genéticas
13.
Genome Biol ; 21(1): 17, 2020 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-31969180

RESUMO

BACKGROUND: Many genome-wide collections of candidate cis-regulatory elements (cCREs) have been defined using genomic and epigenomic data, but it remains a major challenge to connect these elements to their target genes. RESULTS: To facilitate the development of computational methods for predicting target genes, we develop a Benchmark of candidate Enhancer-Gene Interactions (BENGI) by integrating the recently developed Registry of cCREs with experimentally derived genomic interactions. We use BENGI to test several published computational methods for linking enhancers with genes, including signal correlation and the TargetFinder and PEP supervised learning methods. We find that while TargetFinder is the best-performing method, it is only modestly better than a baseline distance method for most benchmark datasets when trained and tested with the same cell type and that TargetFinder often does not outperform the distance method when applied across cell types. CONCLUSIONS: Our results suggest that current computational methods need to be improved and that BENGI presents a useful framework for method development and testing.


Assuntos
Elementos Facilitadores Genéticos , Benchmarking , Curadoria de Dados , Regulação da Expressão Gênica , Genômica , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA