Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-37001506

RESUMEN

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Asunto(s)
Epigenoma , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo , Genómica , Fenotipo , Polimorfismo de Nucleótido Simple
2.
Cell ; 177(2): 231-242, 2019 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-30951667

RESUMEN

The Extracellular RNA Communication Consortium (ERCC) was launched to accelerate progress in the new field of extracellular RNA (exRNA) biology and to establish whether exRNAs and their carriers, including extracellular vesicles (EVs), can mediate intercellular communication and be utilized for clinical applications. Phase 1 of the ERCC focused on exRNA/EV biogenesis and function, discovery of exRNA biomarkers, development of exRNA/EV-based therapeutics, and construction of a robust set of reference exRNA profiles for a variety of biofluids. Here, we present progress by ERCC investigators in these areas, and we discuss collaborative projects directed at development of robust methods for EV/exRNA isolation and analysis and tools for sharing and computational analysis of exRNA profiling data.


Asunto(s)
Ácidos Nucleicos Libres de Células/genética , Ácidos Nucleicos Libres de Células/metabolismo , Vesículas Extracelulares/genética , Biomarcadores , Humanos , Bases del Conocimiento , MicroARNs/genética , ARN/genética
3.
Cell ; 177(2): 463-477.e15, 2019 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-30951672

RESUMEN

To develop a map of cell-cell communication mediated by extracellular RNA (exRNA), the NIH Extracellular RNA Communication Consortium created the exRNA Atlas resource (https://exrna-atlas.org). The Atlas version 4P1 hosts 5,309 exRNA-seq and exRNA qPCR profiles from 19 studies and a suite of analysis and visualization tools. To analyze variation between profiles, we apply computational deconvolution. The analysis leads to a model with six exRNA cargo types (CT1, CT2, CT3A, CT3B, CT3C, CT4), each detectable in multiple biofluids (serum, plasma, CSF, saliva, urine). Five of the cargo types associate with known vesicular and non-vesicular (lipoprotein and ribonucleoprotein) exRNA carriers. To validate utility of this model, we re-analyze an exercise response study by deconvolution to identify physiologically relevant response pathways that were not detected previously. To enable wide application of this model, as part of the exRNA Atlas resource, we provide tools for deconvolution and analysis of user-provided case-control studies.


Asunto(s)
Comunicación Celular/fisiología , ARN/metabolismo , Adulto , Líquidos Corporales/química , Ácidos Nucleicos Libres de Células/metabolismo , MicroARN Circulante/metabolismo , Vesículas Extracelulares/metabolismo , Femenino , Humanos , Masculino , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos , Programas Informáticos
4.
Nature ; 583(7818): 699-710, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728249

RESUMEN

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Asunto(s)
ADN/genética , Bases de Datos Genéticas , Genoma/genética , Genómica , Anotación de Secuencia Molecular , Sistema de Registros , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Cromatina/genética , Cromatina/metabolismo , ADN/química , Huella de ADN , Metilación de ADN/genética , Momento de Replicación del ADN , Desoxirribonucleasa I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Ratones , Ratones Transgénicos , Proteínas de Unión al ARN/genética , Transcripción Genética/genética , Transposasas/metabolismo
5.
Nat Methods ; 17(8): 807-814, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32737473

RESUMEN

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.


Asunto(s)
Epigénesis Genética/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Animales , Línea Celular , Drosophila , Histonas/genética , Histonas/metabolismo , Humanos , Ratones , Ratones Transgénicos , Reproducibilidad de los Resultados
7.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164755

RESUMEN

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Animales , Caenorhabditis elegans/embriología , Caenorhabditis elegans/crecimiento & desarrollo , Cromatina/genética , Análisis por Conglomerados , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crecimiento & desarrollo , Modelos Genéticos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Pupa/genética , Pupa/crecimiento & desarrollo , ARN no Traducido/genética , Análisis de Secuencia de ARN
8.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164757

RESUMEN

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Evolución Molecular , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Caenorhabditis elegans/crecimiento & desarrollo , Inmunoprecipitación de Cromatina , Secuencia Conservada/genética , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Genoma/genética , Humanos , Anotación de Secuencia Molecular , Motivos de Nucleótidos/genética , Especificidad de Órganos/genética , Factores de Transcripción/genética
9.
Trends Genet ; 32(5): 251-253, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27005445

RESUMEN

The emergence of collective creative enterprise such as large scientific consortia is a unique feature in modern scientific research. We analyzed the temporal co-authorship network structures of ENCODE and modENCODE consortia. Our analysis revealed that the consortium members work closely as a community whereas non-members collaborate in the scale of a few laboratories. We also identified a few brokers playing an important role to facilitate collaborations with outside researchers.


Asunto(s)
Conducta Cooperativa , Revisión de la Investigación por Pares/tendencias , Humanos
10.
Bioinformatics ; 34(1): 1-8, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28961734

RESUMEN

Motivation: Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. Results: We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Availability and implementation: Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. Contact: dtww@ucla.edu.


Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , ARN , Saliva/química
11.
J Proteome Res ; 17(10): 3431-3444, 2018 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-30125121

RESUMEN

Cellular control of gene expression is a complex process that is subject to multiple levels of regulation, but ultimately it is the protein produced that determines the biosynthetic state of the cell. One way that a cell can regulate the protein output from each gene is by expressing alternate isoforms with distinct amino acid sequences. These isoforms may exhibit differences in localization and binding interactions that can have profound functional implications. High-throughput liquid chromatography tandem mass spectrometry proteomics (LC-MS/MS) relies on enzymatic digestion and has lower coverage and sensitivity than transcriptomic profiling methods such as RNA-seq. Digestion results in predictable fragmentation of a protein, which can limit the generation of peptides capable of distinguishing between isoforms. Here we exploit transcript-level expression from RNA-seq to set prior likelihoods and enable protein isoform abundances to be directly estimated from LC-MS/MS, an approach derived from the principle that most genes appear to be expressed as a single dominant isoform in a given cell type or tissue. Through this deep integration of RNA-seq and LC-MS/MS data from the same sample, we show that a principal isoform can be identified in >80% of gene products in homogeneous HEK293 cell culture and >70% of proteins detected in complex human brain tissue. We demonstrate that the incorporation of translatome data from ribosome profiling further refines this process. Defining isoforms in experiments with matched RNA-seq/translatome and proteomic data increases the functional relevance of such data sets and will further broaden our understanding of multilevel control of gene expression.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteoma/metabolismo , Proteómica/métodos , Algoritmos , Empalme Alternativo , Cromatografía Liquida/métodos , Células HEK293 , Humanos , Biosíntesis de Proteínas/genética , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteoma/genética , Reproducibilidad de los Resultados , Ribosomas/genética , Ribosomas/metabolismo , Espectrometría de Masas en Tándem/métodos
12.
BMC Genomics ; 19(1): 331, 2018 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-29728066

RESUMEN

BACKGROUND: Evolving interest in comprehensively profiling the full range of small RNAs present in small tissue biopsies and in circulating biofluids, and how the profile differs with disease, has launched small RNA sequencing (RNASeq) into more frequent use. However, known biases associated with small RNASeq, compounded by low RNA inputs, have been both a significant concern and a hurdle to widespread adoption. As RNASeq is becoming a viable choice for the discovery of small RNAs in low input samples and more labs are employing it, there should be benchmark datasets to test and evaluate the performance of new sequencing protocols and operators. In a recent publication from the National Institute of Standards and Technology, Pine et al., 2018, the investigators used a commercially available set of three tissues and tested performance across labs and platforms. RESULTS: In this paper, we further tested the performance of low RNA input in three commonly used and commercially available RNASeq library preparation kits; NEB Next, NEXTFlex, and TruSeq small RNA library preparation. We evaluated the performance of the kits at two different sites, using three different tissues (brain, liver, and placenta) with high (1 µg) and low RNA (10 ng) input from tissue samples, or 5.0, 3.0, 2.0, 1.0, 0.5, and 0.2 ml starting volumes of plasma. As there has been a lack of robust validation platforms for differentially expressed miRNAs, we also compared low input RNASeq data with their expression profiles on three different platforms (Abcam Fireplex, HTG EdgeSeq, and Qiagen miRNome). CONCLUSIONS: The concordance of RNASeq results on these three platforms was dependent on the RNA expression level; the higher the expression, the better the reproducibility. The results provide an extensive analysis of small RNASeq kit performance using low RNA input, and replication of these data on three downstream technologies.


Asunto(s)
Biblioteca de Genes , ARN/metabolismo , Encéfalo/metabolismo , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Hígado/metabolismo , MicroARNs/análisis , MicroARNs/química , Placenta/metabolismo , Embarazo , Análisis de Componente Principal , ARN/química , Juego de Reactivos para Diagnóstico , Análisis de Secuencia de ARN
13.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955619

RESUMEN

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Alelos , Línea Celular , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Células K562 , Especificidad de Órganos , Fosforilación/genética , Polimorfismo de Nucleótido Simple/genética , Mapas de Interacción de Proteínas , ARN no Traducido/genética , ARN no Traducido/metabolismo , Selección Genética/genética , Sitio de Iniciación de la Transcripción
14.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955620

RESUMEN

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Transcripción Genética/genética , Transcriptoma/genética , Alelos , Línea Celular , ADN Intergénico/genética , Elementos de Facilitación Genéticos , Exones/genética , Perfilación de la Expresión Génica , Genes/genética , Genómica , Humanos , Poliadenilación/genética , Isoformas de Proteínas/genética , ARN/biosíntesis , ARN/genética , Edición de ARN/genética , Empalme del ARN/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ARN
15.
Stroke ; 48(4): 828-834, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28289238

RESUMEN

BACKGROUND AND PURPOSE: There is increasing interest in extracellular RNAs (ex-RNAs), with numerous reports of associations between selected microRNAs (miRNAs) and a variety of cardiovascular disease phenotypes. Previous studies of ex-RNAs in relation to risk for cardiovascular disease have investigated small numbers of patients and assayed only candidate miRNAs. No human studies have investigated links between novel ex-RNAs and stroke. METHODS: We conducted unbiased next-generation sequencing using plasma from 40 participants of the FHS (Framingham Heart Study; Offspring Cohort Exam 8) followed by high-throughput polymerase chain reaction of 471 ex-RNAs. The reverse transcription quantitative polymerase chain reaction included 331 of the most abundant miRNAs, 43 small nucleolar RNAs, and 97 piwi-interacting RNAs in 2763 additional FHS participants and explored the relations of ex-RNAs and prevalent (n=63) and incident (n=51) stroke and coronary heart disease (prevalent=286, incident=69). RESULTS: After adjustment for multiple cardiovascular disease risk factors, 7 ex-RNAs were associated with stroke prevalence or incidence; there were no ex-RNA associated with prevalent or incident coronary heart disease. Statistically significant ex-RNA associations with stroke were specific, with no overlap between prevalent and incident events. CONCLUSIONS: This is the largest study of ex-RNAs in relation to stroke using an unbiased approach in an observational cohort and the first large study to examine human small noncoding RNAs beyond miRNAs. These results demonstrate that when studied in a large observational cohort, extracellular miRNAs are associated with stroke risk.


Asunto(s)
Enfermedad Coronaria/sangre , MicroARNs/sangre , ARN Interferente Pequeño/sangre , ARN Nucleolar Pequeño/sangre , Accidente Cerebrovascular/sangre , Anciano , Estudios de Cohortes , Enfermedad Coronaria/epidemiología , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Incidencia , Masculino , Massachusetts/epidemiología , Persona de Mediana Edad , Prevalencia , Accidente Cerebrovascular/epidemiología
16.
Proc Natl Acad Sci U S A ; 111(37): 13361-6, 2014 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-25157146

RESUMEN

Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Filogenia , Seudogenes/genética , Animales , Evolución Molecular , Estudios de Asociación Genética , Humanos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Homología de Secuencia de Ácido Nucleico
17.
PLoS Comput Biol ; 11(4): e1004132, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25884877

RESUMEN

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet's observed gene expression pattern across many conditions. We make Loregic available as a general-purpose tool (github.com/gersteinlab/loregic). We validate it with known yeast transcription-factor knockout experiments. Next, using human ENCODE ChIP-Seq and TCGA RNA-Seq data, we are able to demonstrate how Loregic characterizes complex circuits involving both proximally and distally regulating transcription factors (TFs) and also miRNAs. Furthermore, we show that MYC, a well-known oncogenic driving TF, can be modeled as acting independently from other TFs (e.g., using OR gates) but antagonistically with repressing miRNAs. Finally, we inter-relate Loregic's gate logic with other aspects of regulation, such as indirect binding via protein-protein interactions, feed-forward loop motifs and global regulatory hierarchy.


Asunto(s)
Redes Reguladoras de Genes/genética , Genes Reguladores/genética , Modelos Logísticos , Modelos Genéticos , Factores de Transcripción/genética , Activación Transcripcional/genética , Algoritmos , Animales , Simulación por Computador , Regulación de la Expresión Génica/genética , Humanos , Leucemia/genética , MicroARNs/genética
18.
Nat Rev Genet ; 11(8): 559-71, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20628352

RESUMEN

Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.


Asunto(s)
ADN Intergénico/genética , Genoma Humano , Genómica/métodos , Animales , Mapeo Cromosómico , Secuencia Conservada , Elementos Transponibles de ADN , Genómica/tendencias , Humanos , Seudogenes , Elementos Reguladores de la Transcripción , Alineación de Secuencia , Análisis de Secuencia de ADN , Secuencias Repetidas en Tándem
19.
Genome Res ; 22(9): 1658-67, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955978

RESUMEN

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.


Asunto(s)
Regulación de la Expresión Génica , Genómica , Factores de Transcripción/metabolismo , Transcripción Genética , Composición de Base , Sitios de Unión/genética , Línea Celular , Cromatina/genética , Cromatina/metabolismo , Biología Computacional/métodos , Histonas/genética , Humanos , Modelos Biológicos , Regiones Promotoras Genéticas , Unión Proteica/genética , Sitio de Iniciación de la Transcripción
20.
Genome Res ; 22(9): 1813-31, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955991

RESUMEN

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Bases de Datos Genéticas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Genoma/genética , Genómica/métodos , Guías como Asunto , Histonas/metabolismo , Humanos , Internet , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA