RESUMO
Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.
Assuntos
Células Sanguíneas/citologia , Doença/genética , Regiões Promotoras Genéticas , Linhagem da Célula , Separação Celular , Cromatina , Elementos Facilitadores Genéticos , Epigenômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Hematopoese , Humanos , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.
Assuntos
Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Animais , Sítios de Ligação , Cromatina/genética , Cromatina/metabolismo , Metilação de DNA , Bases de Dados Genéticas/normas , Bases de Dados Genéticas/tendências , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Genômica/normas , Genômica/tendências , Histonas/metabolismo , Humanos , Camundongos , Anotação de Sequência Molecular/normas , Controle de Qualidade , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismoRESUMO
OBJECTIVE: Epigenetic mechanisms, including DNA methylation (DNAm), have been proposed to play a key role in Crohn's disease (CD) pathogenesis. However, the specific cell types and pathways affected as well as their potential impact on disease phenotype and outcome remain unknown. We set out to investigate the role of intestinal epithelial DNAm in CD pathogenesis. DESIGN: We generated 312 intestinal epithelial organoids (IEOs) from mucosal biopsies of 168 patients with CD (n=72), UC (n=23) and healthy controls (n=73). We performed genome-wide molecular profiling including DNAm, bulk as well as single-cell RNA sequencing. Organoids were subjected to gene editing and the functional consequences of DNAm changes evaluated using an organoid-lymphocyte coculture and a nucleotide-binding oligomerisation domain, leucine-rich repeat and CARD domain containing 5 (NLRC5) dextran sulphate sodium (DSS) colitis knock-out mouse model. RESULTS: We identified highly stable, CD-associated loss of DNAm at major histocompatibility complex (MHC) class 1 loci including NLRC5 and cognate gene upregulation. Single-cell RNA sequencing of primary mucosal tissue and IEOs confirmed the role of NLRC5 as transcriptional transactivator in the intestinal epithelium. Increased mucosal MHC-I and NLRC5 expression in adult and paediatric patients with CD was validated in additional cohorts and the functional role of MHC-I highlighted by demonstrating a relative protection from DSS-mediated mucosal inflammation in NLRC5-deficient mice. MHC-I DNAm in IEOs showed a significant correlation with CD disease phenotype and outcomes. Application of machine learning approaches enabled the development of a disease prognostic epigenetic molecular signature. CONCLUSIONS: Our study has identified epigenetically regulated intestinal epithelial MHC-I as a novel mechanism in CD pathogenesis.
Assuntos
Doença de Crohn , Metilação de DNA , Epigênese Genética , Mucosa Intestinal , Organoides , Humanos , Doença de Crohn/genética , Doença de Crohn/patologia , Doença de Crohn/metabolismo , Organoides/metabolismo , Organoides/patologia , Mucosa Intestinal/metabolismo , Mucosa Intestinal/patologia , Camundongos , Animais , Feminino , Masculino , Camundongos Knockout , Bancos de Espécimes Biológicos , Adulto , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/metabolismo , Modelos Animais de Doenças , Peptídeos e Proteínas de Sinalização Intracelular/genética , Peptídeos e Proteínas de Sinalização Intracelular/metabolismoRESUMO
The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.
Assuntos
COVID-19/virologia , Bases de Dados Genéticas , SARS-CoV-2/genética , Navegador , Coronaviridae/genética , Variação Genética , Genoma Viral , Humanos , Anotação de Sequência MolecularRESUMO
Our understanding of the human genome has continuously expanded since its draft publication in 2001. Over the years, novel assays have allowed us to progressively overlay layers of knowledge above the raw sequence of A's, T's, G's, and C's. The reference human genome sequence is now a complex knowledge base maintained under the shared stewardship of multiple specialist communities. Its complexity stems from the fact that it is simultaneously a template for transcription, a record of evolution, a vehicle for genetics, and a functional molecule. In short, the human genome serves as a frame of reference at the intersection of a diversity of scientific fields. In recent years, the progressive fall in sequencing costs has given increasing importance to the quality of the human reference genome, as hundreds of thousands of individuals are being sequenced yearly, often for clinical applications. Also, novel sequencing-based assays shed light on novel functions of the genome, especially with respect to gene expression regulation. Keeping the human genome annotation up to date and accurate is therefore an ongoing partnership between reference annotation projects and the greater community worldwide.
Assuntos
Genoma Humano , Anotação de Sequência Molecular/métodos , Anotação de Sequência Molecular/normas , HumanosRESUMO
BACKGROUND & AIMS: Gene expression patterns of CD8+ T cells have been reported to correlate with clinical outcomes of adults with inflammatory bowel diseases (IBD). We aimed to validate these findings in independent patient cohorts. METHODS: We obtained peripheral blood samples from 112 children with a new diagnosis of IBD (71 with Crohn's disease and 41 with ulcerative colitis) and 19 children without IBD (controls) and recorded medical information on disease activity and outcomes. CD8+ T cells were isolated from blood samples by magnetic bead sorting at the point of diagnosis and during the course of disease. Genome-wide transcription (n = 192) and DNA methylation (n = 66) profiles were generated using Affymetrix and Illumina arrays, respectively. Publicly available transcriptomes and DNA methylomes of CD8+ T cells from 3 adult patient cohorts with and without IBD were included in data analyses. RESULTS: Previously reported CD8+ T-cell prognostic expression and exhaustion signatures were only found in the original adult IBD patient cohort. These signatures could not be detected in either a pediatric or a second adult IBD cohort. In contrast, an association between CD8+ T-cell gene expression with age and sex was detected across all 3 cohorts. CD8+ gene transcription was clearly associated with IBD in the 2 cohorts that included non-IBD controls. Lastly, DNA methylation profiles of CD8+ T cells from children with Crohn's disease correlated with age but not with disease outcome. CONCLUSIONS: We were unable to validate previously reported findings of an association between CD8+ T-cell gene transcription and disease outcome in IBD. Our findings reveal the challenges of developing prognostic biomarkers for patients with IBD and the importance of their validation in large, independent cohorts before clinical application.
Assuntos
Linfócitos T CD8-Positivos/fisiologia , Doenças Inflamatórias Intestinais/diagnóstico , Doenças Inflamatórias Intestinais/etiologia , Adolescente , Adulto , Fatores Etários , Estudos de Casos e Controles , Criança , Pré-Escolar , Metilação de DNA , Feminino , Humanos , Masculino , Valor Preditivo dos Testes , Prognóstico , Transcrição Gênica , Adulto JovemRESUMO
MOTIVATION: Compared to traditional haploid reference genomes, graph genomes are an efficient and compact data structure for storing multiple genomic sequences, for storing polymorphisms or for mapping sequencing reads with greater sensitivity. Further, graphs are well-studied computer science objects that can be efficiently analyzed. However, their adoption in genomic research is slow, in part because of the cognitive difficulty in interpreting graphs. RESULTS: We present an intuitive graphical representation for graph genomes that re-uses well-honed techniques developed to display public transport networks, and demonstrate it as a web tool. AVAILABILITY AND IMPLEMENTATION: Code: https://github.com/vgteam/sequenceTubeMap. DEMONSTRATION: https://vgteam.github.io/sequenceTubeMap/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Genoma , Software , Genômica , Análise de Sequência de DNARESUMO
The Ensembl project has been aggregating, processing, integrating and redistributing genomic datasets since the initial releases of the draft human genome, with the aim of accelerating genomics research through rapid open distribution of public data. Large amounts of raw data are thus transformed into knowledge, which is made available via a multitude of channels, in particular our browser (http://www.ensembl.org). Over time, we have expanded in multiple directions. First, our resources describe multiple fields of genomics, in particular gene annotation, comparative genomics, genetics and epigenomics. Second, we cover a growing number of genome assemblies; Ensembl Release 90 contains exactly 100. Third, our databases feed simultaneously into an array of services designed around different use cases, ranging from quick browsing to genome-wide bioinformatic analysis. We present here the latest developments of the Ensembl project, with a focus on managing an increasing number of assemblies, supporting efforts in genome interpretation and improving our browser.
Assuntos
Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Disseminação de Informação , Animais , Epigenômica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Vertebrados/genética , NavegadorRESUMO
Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Ferramenta de Busca , Software , Navegador , Animais , Mineração de Dados , Evolução Molecular , Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Humanos , Anotação de Sequência Molecular , Especificidade da Espécie , VertebradosRESUMO
The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license.
Assuntos
Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Animais , Genes , Variação Genética , Humanos , Internet , Camundongos , Proteínas/genética , Ratos , Sequências Reguladoras de Ácido Nucleico , SoftwareRESUMO
Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Animais , Epigênese Genética , Variação Genética , Genoma Humano , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico , SoftwareRESUMO
BACKGROUND: The study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich mathematical models. However, it has been observed that genomes are highly plastic, and that whole regions can be moved, removed or duplicated in bulk. These structural variants (SV) have been shown to have significant impact on phenotype, but their study has been held back by the combinatorial complexity of the underlying models. RESULTS: We describe here a general model of structural variation that encompasses both balanced rearrangements and arbitrary copy-number variants (CNV). CONCLUSIONS: In this model, we show that the space of possible evolutionary histories that explain the structural differences between any two genomes can be sampled ergodically.
RESUMO
MOTIVATION: Using high-throughput sequencing, researchers are now generating hundreds of whole-genome assays to measure various features such as transcription factor binding, histone marks, DNA methylation or RNA transcription. Displaying so much data generally leads to a confusing accumulation of plots. We describe here a multithreaded library that computes statistics on large numbers of datasets (Wiggle, BigWig, Bed, BigBed and BAM), generating statistical summaries within minutes with limited memory requirements, whether on the whole genome or on selected regions. AVAILABILITY AND IMPLEMENTATION: The code is freely available under Apache 2.0 license at www.github.com/Ensembl/Wiggletools
Assuntos
Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Genômica , Internet , SoftwareRESUMO
BACKGROUND: Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. RESULTS: We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves. CONCLUSION: This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.
Assuntos
Evolução Molecular , Genoma , Algoritmos , Funções Verossimilhança , Modelos GenéticosRESUMO
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
Assuntos
Genoma/fisiologia , Genômica/métodos , Análise de Sequência de DNA/métodosRESUMO
MOTIVATION: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo, taking into account possible alternative isoforms and the dynamic range of expression values. RESULTS: We present a software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers. AVAILABILITY AND IMPLEMENTATION: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/.
Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Processamento Alternativo , Animais , Humanos , Camundongos , RNA Mensageiro/genéticaRESUMO
BACKGROUND & AIMS: Human intestinal epithelial organoids (IEOs) are a powerful tool to model major aspects of intestinal development, health, and diseases because patient-derived cultures retain many features found in vivo. A necessary aspect of the organoid model is the requirement to expand cultures in vitro through several rounds of passaging. This is of concern because the passaging of cells has been shown to affect cell morphology, ploidy, and function. METHODS: Here, we analyzed 173 human IEO lines derived from the small and large bowel and examined the effect of culture duration on DNA methylation (DNAm). Furthermore, we tested the potential impact of DNAm changes on gene expression and cellular function. RESULTS: Our analyses show a reproducible effect of culture duration on DNAm in a large discovery cohort as well as 2 publicly available validation cohorts generated in different laboratories. Although methylation changes were seen in only approximately 8% of tested cytosine-phosphate-guanine dinucleotides (CpGs) and global cellular function remained stable, a subset of methylation changes correlated with altered gene expression at baseline as well as in response to inflammatory cytokine exposure and withdrawal of Wnt agonists. Importantly, epigenetic changes were found to be enriched in genomic regions associated with colonic cancer and distant to the site of replication, indicating similarities to malignant transformation. CONCLUSIONS: Our study shows distinct culture-associated epigenetic changes in mucosa-derived human IEOs, some of which appear to impact gene transcriptomic and cellular function. These findings highlight the need for future studies in this area and the importance of considering passage number as a potentially confounding factor.
Assuntos
Metilação de DNA , Organoides , Humanos , Intestinos , Epigênese Genética , Mucosa IntestinalRESUMO
As computational modeling becomes more essential to analyze and understand biological regulatory mechanisms, governance of the many databases and knowledge bases that support this domain is crucial to guarantee reliability and interoperability of resources. To address this, the COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various steps in the knowledge management process that focuses on understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to standardize and update existing knowledge management workflows and involve end-users in the process of designing the Gene Regulation Knowledge Commons (GRKC). Here the GREEKC consortium describes its main achievements in improving this GRKC.
Assuntos
Regulação da Expressão Gênica , Reprodutibilidade dos TestesRESUMO
Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.