Pesquisa | BVS Aleitamento Materno

1.

Single-cell transcriptome and accessible chromatin dynamics during endocrine pancreas development.

Duvall, Eliza; Benitez, Cecil M; Tellez, Krissie; Enge, Martin; Pauerstein, Philip T; Li, Lingyu; Baek, Songjoon; Quake, Stephen R; Smith, Jason P; Sheffield, Nathan C; Kim, Seung K; Arda, H Efsun.

Proc Natl Acad Sci U S A ; 119(26): e2201267119, 2022 06 28.

Artigo em Inglês | MEDLINE | ID: mdl-35733248

RESUMO

Delineating gene regulatory networks that orchestrate cell-type specification is a continuing challenge for developmental biologists. Single-cell analyses offer opportunities to address these challenges and accelerate discovery of rare cell lineage relationships and mechanisms underlying hierarchical lineage decisions. Here, we describe the molecular analysis of mouse pancreatic endocrine cell differentiation using single-cell transcriptomics, chromatin accessibility assays coupled to genetic labeling, and cytometry-based cell purification. We uncover transcription factor networks that delineate ß-, α-, and Î´-cell lineages. Through genomic footprint analysis, we identify transcription factor-regulatory DNA interactions governing pancreatic cell development at unprecedented resolution. Our analysis suggests that the transcription factor Neurog3 may act as a pioneer transcription factor to specify the pancreatic endocrine lineage. These findings could improve protocols to generate replacement endocrine cells from renewable sources, like stem cells, for diabetes therapy.

Assuntos

Fatores de Transcrição Hélice-Alça-Hélice Básicos , Cromatina , Ilhotas Pancreáticas , Proteínas do Tecido Nervoso , Transcriptoma , Animais , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Diferenciação Celular/genética , Linhagem da Célula/genética , Cromatina/genética , Cromatina/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Ilhotas Pancreáticas/crescimento & desenvolvimento , Ilhotas Pancreáticas/metabolismo , Camundongos , Proteínas do Tecido Nervoso/genética , Proteínas do Tecido Nervoso/metabolismo , Análise de Célula Única

2.

GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA.

Khoroshevskyi, Oleksandr; LeRoy, Nathan; Reuter, Vincent P; Sheffield, Nathan C.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36857584

RESUMO

MOTIVATION: The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata from Gene Expression Omnibus (GEO) in a standardized annotation format. RESULTS: To address this, we present GEOfetch-a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data. AVAILABILITY AND IMPLEMENTATION: GEOfetch is available on Bioconda and the Python Package Index (PyPI).

Assuntos

Expressão Gênica , Metadados , Biologia Computacional

3.

excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies.

Ogata, Jonathan D; Mu, Wancen; Davis, Eric S; Xue, Bingjie; Harrell, J Chuck; Sheffield, Nathan C; Phanstiel, Douglas H; Love, Michael I; Dozmorov, Mikhail G.

Bioinformatics ; 39(4)2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-37067481

RESUMO

SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.

Assuntos

Genoma Humano , Software , Animais , Humanos , Camundongos , Incerteza

4.

GenomicDistributions: fast analysis of genomic intervals with Bioconductor.

Kupkova, Kristyna; Mosquera, Jose Verdezoto; Smith, Jason P; Stolarczyk, Michal; Danehy, Tessa L; Lawson, John T; Xue, Bingjie; Stubbs, John T; LeRoy, Nathan; Sheffield, Nathan C.

BMC Genomics ; 23(1): 299, 2022 Apr 12.

Artigo em Inglês | MEDLINE | ID: mdl-35413804

RESUMO

BACKGROUND: Epigenome analysis relies on defined sets of genomic regions output by widely used assays such as ChIP-seq and ATAC-seq. Statistical analysis and visualization of genomic region sets is essential to answer biological questions in gene regulation. As the epigenomics community continues generating data, there will be an increasing need for software tools that can efficiently deal with more abundant and larger genomic region sets. Here, we introduce GenomicDistributions, an R package for fast and easy summarization and visualization of genomic region data. RESULTS: GenomicDistributions offers a broad selection of functions to calculate properties of genomic region sets, such as feature distances, genomic partition overlaps, and more. GenomicDistributions functions are meticulously optimized for best-in-class speed and generally outperform comparable functions in existing R packages. GenomicDistributions also offers plotting functions that produce editable ggplot objects. All GenomicDistributions functions follow a uniform naming scheme and can handle either single or multiple region set inputs. CONCLUSIONS: GenomicDistributions offers a fast and scalable tool for exploratory genomic region set analysis and visualization. GenomicDistributions excels in user-friendliness, flexibility of outputs, breadth of functions, and computational performance. GenomicDistributions is available from Bioconductor ( https://bioconductor.org/packages/release/bioc/html/GenomicDistributions.html ).

Assuntos

Genômica , Software , Sequenciamento de Cromatina por Imunoprecipitação , Epigenômica , Genoma

5.

IGD: high-performance search for large-scale genomic interval datasets.

Feng, Jianglin; Sheffield, Nathan C.

Bioinformatics ; 37(1): 118-120, 2021 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-33367484

RESUMO

SUMMARY: Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions. AVAILABILITYAND IMPLEMENTATION: https://github.com/databio/IGD. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.

Embeddings of genomic region sets capture rich biological associations in lower dimensions.

Gharavi, Erfaneh; Gu, Aaron; Zheng, Guangtao; Smith, Jason P; Cho, Hyun Jae; Zhang, Aidong; Brown, Donald E; Sheffield, Nathan C.

Bioinformatics ; 37(23): 4299-4306, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-34156475

RESUMO

MOTIVATION: Genomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis. RESULTS: We propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 without significant loss in classification performance. The vector representation could identify cell line, antibody and tissue type with over 90% accuracy. We also found that the vectors could quantitatively summarize simulated random perturbations to region sets and are more robust to subsampling the data derived from different peak calling thresholds. Our evaluations demonstrate that the vectors retain useful biological information in relatively lower-dimensional spaces. We propose that vector representation of region sets is a promising approach for efficient analysis of genomic region data. AVAILABILITY AND IMPLEMENTATION: https://github.com/databio/regionset-embedding. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Ligação Proteica

7.

Refget: standardized access to reference sequences.

Yates, Andrew D; Adams, Jeremy; Chaturvedi, Somesh; Davies, Robert M; Laird, Matthew; Leinonen, Rasko; Nag, Rishi; Sheffield, Nathan C; Hofmann, Oliver; Keane, Thomas M.

Bioinformatics ; 38(1): 299-300, 2021 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-34260694

RESUMO

MOTIVATION: Reference sequences are essential in creating a baseline of knowledge for many common bioinformatics methods, especially those using genomic sequencing. RESULTS: We have created refget, a Global Alliance for Genomics and Health API specification to access reference sequences and sub-sequences using an identifier derived from the sequence itself. We present four reference implementations across in-house and cloud infrastructure, a compliance suite and a web report used to ensure specification conformity across implementations. AVAILABILITY AND IMPLEMENTATION: The refget specification can be found at: https://w3id.org/ga4gh/refget. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software

8.

Augmented Interval List: a novel data structure for efficient genomic interval search.

Feng, Jianglin; Ratan, Aakrosh; Sheffield, Nathan C.

Bioinformatics ; 35(23): 4907-4911, 2019 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-31150060

RESUMO

MOTIVATION: Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary. RESULTS: We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5-18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4-60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis. AVAILABILITY AND IMPLEMENTATION: An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software , Algoritmos , Genoma

9.

Renin Cell Development: Insights From Chromatin Accessibility and Single-Cell Transcriptomics.

Martini, Alexandre G; Smith, Jason P; Medrano, Silvia; Finer, Gal; Sheffield, Nathan C; Sequeira-Lopez, Maria Luisa S; Gomez, R Ariel.

Circ Res ; 133(4): 369-371, 2023 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-37395102

Assuntos

Cromatina , Renina , Renina/genética , Renina/metabolismo , Cromatina/genética , Transcriptoma , Rim/metabolismo , Perfilação da Expressão Gênica , Análise de Célula Única

10.

LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis.

Nagraj, V P; Magee, Neal E; Sheffield, Nathan C.

Nucleic Acids Res ; 46(W1): W194-W199, 2018 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-29878235

RESUMO

The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, new genomic data can be easily connected to annotations from external data sources. Here, we present an interactive interface for enrichment analysis of genomic locus overlaps using a web server called LOLAweb. LOLAweb accepts a set of genomic ranges from the user and tests it for enrichment against a database of region sets. LOLAweb renders results in an R Shiny application to provide interactive visualization features, enabling users to filter, sort, and explore enrichment results dynamically. LOLAweb is built and deployed in a Linux container, making it scalable to many concurrent users on our servers and also enabling users to download and run LOLAweb locally.

Assuntos

Genômica/métodos , Software , Animais , Bases de Dados Genéticas , Humanos , Internet , Camundongos , Sequências Reguladoras de Ácido Nucleico , Interface Usuário-Computador

11.

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K.

Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.

Assuntos

Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador

12.

MIRA: an R package for DNA methylation-based inference of regulatory activity.

Lawson, John T; Tomazou, Eleni M; Bock, Christoph; Sheffield, Nathan C.

Bioinformatics ; 34(15): 2649-2650, 2018 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-29506020

RESUMO

Summary: DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for a given region set with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for the region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of region sets to be leveraged for novel insight into the regulatory state of DNA methylation datasets. Availability and implementation: http://bioconductor.org/packages/MIRA.

Assuntos

Metilação de DNA , Epigenômica/métodos , Análise de Sequência de DNA/métodos , Software , Ontologias Biológicas , Biologia Computacional/métodos

13.

BART: a transcription factor prediction tool with query gene sets or epigenomic profiles.

Wang, Zhenjia; Civelek, Mete; Miller, Clint L; Sheffield, Nathan C; Guertin, Michael J; Zang, Chongzhi.

Bioinformatics ; 34(16): 2867-2869, 2018 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-29608647

RESUMO

Summary: Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive enough to detect factors binding at distal enhancers. Here, we present binding analysis for regulation of transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for functional genomics research. Availability and implementation: BART is implemented in Python and available at http://faculty.virginia.edu/zanglab/bart. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Epigenômica , Software , Fatores de Transcrição/análise , Animais , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Camundongos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genética

14.

ChIPmentation: fast, robust, low-input ChIP-seq for histones and transcription factors.

Schmidl, Christian; Rendeiro, André F; Sheffield, Nathan C; Bock, Christoph.

Nat Methods ; 12(10): 963-965, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26280331

RESUMO

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is widely used to map histone marks and transcription factor binding throughout the genome. Here we present ChIPmentation, a method that combines chromatin immunoprecipitation with sequencing library preparation by Tn5 transposase ('tagmentation'). ChIPmentation introduces sequencing-compatible adaptors in a single-step reaction directly on bead-bound chromatin, which reduces time, cost and input requirements, thus providing a convenient and broadly useful alternative to existing ChIP-seq protocols.

Assuntos

Imunoprecipitação da Cromatina/métodos , Histonas/metabolismo , Fatores de Transcrição/metabolismo , Imunoprecipitação da Cromatina/economia , Imunoprecipitação da Cromatina/instrumentação , Genoma Humano , Humanos , Células K562 , Fatores de Transcrição/análise

15.

The accessible chromatin landscape of the human genome.

Thurman, Robert E; Rynes, Eric; Humbert, Richard; Vierstra, Jeff; Maurano, Matthew T; Haugen, Eric; Sheffield, Nathan C; Stergachis, Andrew B; Wang, Hao; Vernot, Benjamin; Garg, Kavita; John, Sam; Sandstrom, Richard; Bates, Daniel; Boatman, Lisa; Canfield, Theresa K; Diegel, Morgan; Dunn, Douglas; Ebersol, Abigail K; Frum, Tristan; Giste, Erika; Johnson, Audra K; Johnson, Ericka M; Kutyavin, Tanya; Lajoie, Bryan; Lee, Bum-Kyu; Lee, Kristen; London, Darin; Lotakis, Dimitra; Neph, Shane; Neri, Fidencio; Nguyen, Eric D; Qu, Hongzhu; Reynolds, Alex P; Roach, Vaughn; Safi, Alexias; Sanchez, Minerva E; Sanyal, Amartya; Shafer, Anthony; Simon, Jeremy M; Song, Lingyun; Vong, Shinny; Weaver, Molly; Yan, Yongqi; Zhang, Zhancheng; Zhang, Zhuzhu; Lenhard, Boris; Tewari, Muneesh; Dorschner, Michael O; Hansen, R Scott.

Nature ; 489(7414): 75-82, 2012 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-22955617

RESUMO

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify â¼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect â¼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.

Assuntos

Cromatina/genética , Cromatina/metabolismo , DNA/genética , Enciclopédias como Assunto , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Pegada de DNA , Metilação de DNA , Proteínas de Ligação a DNA/metabolismo , Desoxirribonuclease I/metabolismo , Evolução Molecular , Genômica , Humanos , Taxa de Mutação , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição , Transcrição Gênica

16.

LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor.

Sheffield, Nathan C; Bock, Christoph.

Bioinformatics ; 32(4): 587-9, 2016 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-26508757

RESUMO

UNLABELLED: Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data. AVAILABILITY AND IMPLEMENTATION: R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org.

Assuntos

Biologia Computacional/métodos , Epigenômica/métodos , Genoma Humano , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico/genética , Software , Bases de Dados Factuais , Redes Reguladoras de Genes , Humanos , Análise de Sequência de DNA

17.

Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions.

Sheffield, Nathan C; Thurman, Robert E; Song, Lingyun; Safi, Alexias; Stamatoyannopoulos, John A; Lenhard, Boris; Crawford, Gregory E; Furey, Terrence S.

Genome Res ; 23(5): 777-88, 2013 May.

Artigo em Inglês | MEDLINE | ID: mdl-23482648

RESUMO

Regulatory elements recruit transcription factors that modulate gene expression distinctly across cell types, but the relationships among these remains elusive. To address this, we analyzed matched DNase-seq and gene expression data for 112 human samples representing 72 cell types. We first defined more than 1800 clusters of DNase I hypersensitive sites (DHSs) with similar tissue specificity of DNase-seq signal patterns. We then used these to uncover distinct associations between DHSs and promoters, CpG islands, conserved elements, and transcription factor motif enrichment. Motif analysis within clusters identified known and novel motifs in cell-type-specific and ubiquitous regulatory elements and supports a role for AP-1 regulating open chromatin. We developed a classifier that accurately predicts cell-type lineage based on only 43 DHSs and evaluated the tissue of origin for cancer cell types. A similar classifier identified three sex-specific loci on the X chromosome, including the XIST lincRNA locus. By correlating DNase I signal and gene expression, we predicted regulated genes for more than 500K DHSs. Finally, we introduce a web resource to enable researchers to use these results to explore these regulatory patterns and better understand how expression is modulated within and across human cell types.

Assuntos

Células/metabolismo , Proteínas de Ligação a DNA/genética , Desoxirribonuclease I/genética , Elementos Reguladores de Transcrição/genética , Sequências Reguladoras de Ácido Nucleico/genética , Sítios de Ligação/genética , Células/classificação , Células/citologia , Cromatina/genética , Mapeamento Cromossômico , Regulação da Expressão Gênica , Genoma Humano , Humanos , Hipersensibilidade , Especificidade de Órgãos , Ligação Proteica/genética , Fator de Transcrição AP-1/genética

18.

Predicting cell-type-specific gene expression from regions of open chromatin.

Natarajan, Anirudh; Yardimci, Galip Gürkan; Sheffield, Nathan C; Crawford, Gregory E; Ohler, Uwe.

Genome Res ; 22(9): 1711-22, 2012 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-22955983

RESUMO

Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.

Assuntos

Montagem e Desmontagem da Cromatina , Pegada de DNA/métodos , Regulação da Expressão Gênica , Sequências Reguladoras de Ácido Nucleico , Composição de Bases , Sítios de Ligação/genética , Linhagem Celular , Análise por Conglomerados , Desoxirribonuclease I/metabolismo , Perfilação da Expressão Gênica , Genoma , Humanos , Especificidade de Órgãos/genética , Regiões Promotoras Genéticas , Ligação Proteica , Fatores de Transcrição/metabolismo , Sítio de Iniciação de Transcrição

19.

Extensive evolutionary changes in regulatory element activity during human origins are associated with altered gene expression and positive selection.

Shibata, Yoichiro; Sheffield, Nathan C; Fedrigo, Olivier; Babbitt, Courtney C; Wortham, Matthew; Tewari, Alok K; London, Darin; Song, Lingyun; Lee, Bum-Kyu; Iyer, Vishwanath R; Parker, Stephen C J; Margulies, Elliott H; Wray, Gregory A; Furey, Terrence S; Crawford, Gregory E.

PLoS Genet ; 8(6): e1002789, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22761590

RESUMO

Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species.

Assuntos

Desoxirribonuclease I/genética , Evolução Molecular , Primatas/genética , Sequências Reguladoras de Ácido Nucleico/genética , Transcrição Gênica , Animais , Sítios de Ligação/genética , Linhagem Celular , Cromatina/genética , Regulação da Expressão Gênica , Genoma Humano , Humanos , Mutação , Motivos de Nucleotídeos , Fenótipo , Seleção Genética , Especificidade da Espécie , Fatores de Transcrição/genética

20.

Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity.

Song, Lingyun; Zhang, Zhancheng; Grasfeder, Linda L; Boyle, Alan P; Giresi, Paul G; Lee, Bum-Kyu; Sheffield, Nathan C; Gräf, Stefan; Huss, Mikael; Keefe, Damian; Liu, Zheng; London, Darin; McDaniell, Ryan M; Shibata, Yoichiro; Showers, Kimberly A; Simon, Jeremy M; Vales, Teresa; Wang, Tianyuan; Winter, Deborah; Zhang, Zhuzhu; Clarke, Neil D; Birney, Ewan; Iyer, Vishwanath R; Crawford, Gregory E; Lieb, Jason D; Furey, Terrence S.

Genome Res ; 21(10): 1757-67, 2011 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-21750106

RESUMO

The human body contains thousands of unique cell types, each with specialized functions. Cell identity is governed in large part by gene transcription programs, which are determined by regulatory elements encoded in DNA. To identify regulatory elements active in seven cell lines representative of diverse human cell types, we used DNase-seq and FAIRE-seq (Formaldehyde Assisted Isolation of Regulatory Elements) to map "open chromatin." Over 870,000 DNaseI or FAIRE sites, which correspond tightly to nucleosome-depleted regions, were identified across the seven cell lines, covering nearly 9% of the genome. The combination of DNaseI and FAIRE is more effective than either assay alone in identifying likely regulatory elements, as judged by coincidence with transcription factor binding locations determined in the same cells. Open chromatin common to all seven cell types tended to be at or near transcription start sites and to be coincident with CTCF binding sites, while open chromatin sites found in only one cell type were typically located away from transcription start sites and contained DNA motifs recognized by regulators of cell-type identity. We show that open chromatin regions bound by CTCF are potent insulators. We identified clusters of open regulatory elements (COREs) that were physically near each other and whose appearance was coordinated among one or more cell types. Gene expression and RNA Pol II binding data support the hypothesis that COREs control gene activity required for the maintenance of cell-type identity. This publicly available atlas of regulatory elements may prove valuable in identifying noncoding DNA sequence variants that are causally linked to human disease.

Assuntos

Cromatina/metabolismo , Mapeamento Cromossômico , Elementos Reguladores de Transcrição , Análise de Sequência de DNA/métodos , Sequência de Bases , Sítios de Ligação , Fator de Ligação a CCCTC , Diferenciação Celular/genética , Linhagem Celular , Regulação da Expressão Gênica , Humanos , Ligação Proteica , Proteínas Repressoras/metabolismo , Transcrição Gênica , Ativação Transcricional

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA