Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
1.
Cell ; 155(3): 713-24, 2013 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-24243024

RESUMEN

Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.


Asunto(s)
Inteligencia Artificial , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Sitios de Unión , Línea Celular , Inmunoprecipitación de Cromatina , Redes Reguladoras de Genes , Humanos , Secuencias Reguladoras de Ácidos Nucleicos
2.
Cell ; 148(6): 1293-307, 2012 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-22424236

RESUMEN

Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.


Asunto(s)
Genoma Humano , Genómica , Medicina de Precisión , Diabetes Mellitus Tipo 2/genética , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Metabolómica , Persona de Mediana Edad , Mutación , Proteómica , Virus Sincitiales Respiratorios/aislamiento & purificación , Rhinovirus/aislamiento & purificación
3.
Genome Res ; 33(5): 741-749, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-37156622

RESUMEN

Recombinant plasmid vectors are versatile tools that have facilitated discoveries in molecular biology, genetics, proteomics, and many other fields. As the enzymatic and bacterial processes used to create recombinant DNA can introduce errors, sequence validation is an essential step in plasmid assembly. Sanger sequencing is the current standard for plasmid validation; however, this method is limited by an inability to sequence through complex secondary structure and lacks scalability when applied to full-plasmid sequencing of multiple plasmids owing to read-length limits. Although high-throughput sequencing does provide full-plasmid sequencing at scale, it is impractical and costly when used outside of library-scale validation. Here, we present Oxford nanopore-based rapid analysis of multiplexed plasmids (OnRamp), an alternative method for routine plasmid validation that combines the advantages of high-throughput sequencing's full-plasmid coverage and scalability with Sanger's affordability and accessibility by leveraging nanopore's long-read sequencing technology. We include customized wet-laboratory protocols for plasmid preparation along with a pipeline designed for analysis of read data obtained using these protocols. This analysis pipeline is deployed on the OnRamp web app, which generates alignments between actual and predicted plasmid sequences, quality scores, and read-level views. OnRamp is designed to be broadly accessible regardless of programming experience to facilitate more widespread adoption of long-read sequencing for routine plasmid validation. Here we describe the OnRamp protocols and pipeline and show our ability to obtain full sequences from pooled plasmids while detecting sequence variation even in regions of high secondary structure at less than half the cost of equivalent Sanger sequencing.


Asunto(s)
Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Plásmidos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Proteómica
4.
Nucleic Acids Res ; 50(1): e6, 2022 01 11.
Artículo en Inglés | MEDLINE | ID: mdl-34648033

RESUMEN

Understanding the functional consequences of genetic variation in the non-coding regions of the human genome remains a challenge. We introduce h ere a computational tool, TURF, to prioritize regulatory variants with tissue-specific function by leveraging evidence from functional genomics experiments, including over 3000 functional genomics datasets from the ENCODE project provided in the RegulomeDB database. TURF is able to generate prediction scores at both organism and tissue/organ-specific levels for any non-coding variant on the genome. We present that TURF has an overall top performance in prediction by using validated variants from MPRA experiments. We also demonstrate how TURF can pick out the regulatory variants with tissue-specific function over a candidate list from associate studies. Furthermore, we found that various GWAS traits showed the enrichment of regulatory variants predicted by TURF scores in the trait-relevant organs, which indicates that these variants can be a valuable source for future studies.


Asunto(s)
Genoma Humano , Genómica/métodos , Programas Informáticos , Línea Celular , Análisis de Datos , Humanos
5.
Genome Res ; 30(7): 1040-1046, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32660981

RESUMEN

Transcription is tightly regulated by cis-regulatory DNA elements where transcription factors (TFs) can bind. Thus, identification of TF binding sites (TFBSs) is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches used for TFBS prediction, such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), are widely used but have their drawbacks, including high false-positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns; however, these also have limitations. We have developed a footprinting method to predict TF footprints in active chromatin elements (TRACE) to improve the prediction of TFBS footprints. TRACE incorporates DNase-seq data and PWMs within a multivariate hidden Markov model (HMM) to detect footprint-like regions with matching motifs. TRACE is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement for pregenerated candidate binding sites or ChIP-seq training data. Compared with published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.


Asunto(s)
Cromatina/metabolismo , Huella de ADN/métodos , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo , Sitios de Unión , Línea Celular , Desoxirribonucleasas , Humanos , Células K562 , Cadenas de Markov , Motivos de Nucleótidos
6.
Cell ; 132(2): 311-22, 2008 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-18243105

RESUMEN

Mapping DNase I hypersensitive (HS) sites is an accurate method of identifying the location of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. We employed high-throughput sequencing and whole-genome tiled array strategies to identify DNase I HS sites within human primary CD4+ T cells. Combining these two technologies, we have created a comprehensive and accurate genome-wide open chromatin map. Surprisingly, only 16%-21% of the identified 94,925 DNase I HS sites are found in promoters or first exons of known genes, but nearly half of the most open sites are in these regions. In conjunction with expression, motif, and chromatin immunoprecipitation data, we find evidence of cell-type-specific characteristics, including the ability to identify transcription start sites and locations of different chromatin marks utilized in these cells. In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure.


Asunto(s)
Cromatina/genética , Genoma Humano/genética , Algoritmos , Área Bajo la Curva , Sitios de Unión , Linfocitos T CD4-Positivos/citología , Núcleo Celular/metabolismo , Inmunoprecipitación de Cromatina , Mapeo Cromosómico/métodos , Cromosomas Humanos , Desoxirribonucleasa I/química , Desoxirribonucleasa I/farmacología , Genoma Humano/inmunología , Histonas/química , Humanos , Nucleosomas/química , Análisis de Secuencia por Matrices de Oligonucleótidos , Regiones Promotoras Genéticas , Curva ROC , Sensibilidad y Especificidad , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
7.
Proc Natl Acad Sci U S A ; 117(48): 30799-30804, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-33199612

RESUMEN

Eukaryotic genomes are pervasively transcribed, yet most transcribed sequences lack conservation or known biological functions. In Arabidopsis thaliana, RNA polymerase V (Pol V) produces noncoding transcripts, which base pair with small interfering RNA (siRNA) and allow specific establishment of RNA-directed DNA methylation (RdDM) on transposable elements. Here, we show that Pol V transcribes much more broadly than previously expected, including subsets of both heterochromatic and euchromatic regions. At already established RdDM targets, Pol V and siRNA work together to maintain silencing. In contrast, some euchromatic sequences do not give rise to siRNA but are covered by low levels of Pol V transcription, which is needed to establish RdDM de novo if a transposon is reactivated. We propose a model where Pol V surveils the genome to make it competent to silence newly activated or integrated transposons. This indicates that pervasive transcription of nonconserved sequences may serve an essential role in maintenance of genome integrity.


Asunto(s)
ARN Polimerasas Dirigidas por ADN/metabolismo , Genoma , ARN no Traducido , Transcripción Genética , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/metabolismo , Elementos Transponibles de ADN , Regulación de la Expresión Génica de las Plantas , Silenciador del Gen , Modelos Biológicos , Complejos Multiproteicos/metabolismo , Especificidad por Sustrato
8.
BMC Bioinformatics ; 23(1): 317, 2022 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-35927613

RESUMEN

MOTIVATION: Aberrant DNA methylation in transcription factor binding sites has been shown to lead to anomalous gene regulation that is strongly associated with human disease. However, the majority of methylation-sensitive positions within transcription factor binding sites remain unknown. Here we introduce SEMplMe, a computational tool to generate predictions of the effect of methylation on transcription factor binding strength in every position within a transcription factor's motif. RESULTS: SEMplMe uses ChIP-seq and whole genome bisulfite sequencing to predict effects of methylation within binding sites. SEMplMe validates known methylation sensitive and insensitive positions within a binding motif, identifies cell type specific transcription factor binding driven by methylation, and outperforms SELEX-based predictions for CTCF. These predictions can be used to identify aberrant sites of DNA methylation contributing to human disease. AVAILABILITY AND IMPLEMENTATION: SEMplMe is available from https://github.com/Boyle-Lab/SEMplMe .


Asunto(s)
Metilación de ADN , Factores de Transcripción , Sitios de Unión , Regulación de la Expresión Génica , Humanos , Unión Proteica , Factores de Transcripción/metabolismo
9.
Am J Hum Genet ; 102(1): 103-115, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29290336

RESUMEN

Atrial fibrillation (AF) is a common cardiac arrhythmia and a major risk factor for stroke, heart failure, and premature death. The pathogenesis of AF remains poorly understood, which contributes to the current lack of highly effective treatments. To understand the genetic variation and biology underlying AF, we undertook a genome-wide association study (GWAS) of 6,337 AF individuals and 61,607 AF-free individuals from Norway, including replication in an additional 30,679 AF individuals and 278,895 AF-free individuals. Through genotyping and dense imputation mapping from whole-genome sequencing, we tested almost nine million genetic variants across the genome and identified seven risk loci, including two novel loci. One novel locus (lead single-nucleotide variant [SNV] rs12614435; p = 6.76 × 10-18) comprised intronic and several highly correlated missense variants situated in the I-, A-, and M-bands of titin, which is the largest protein in humans and responsible for the passive elasticity of heart and skeletal muscle. The other novel locus (lead SNV rs56202902; p = 1.54 × 10-11) covered a large, gene-dense chromosome 1 region that has previously been linked to cardiac conduction. Pathway and functional enrichment analyses suggested that many AF-associated genetic variants act through a mechanism of impaired muscle cell differentiation and tissue formation during fetal heart development.


Asunto(s)
Fibrilación Atrial/genética , Sitios Genéticos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Corazón/embriología , Secuencias Reguladoras de Ácidos Nucleicos/genética , Humanos , Patrón de Herencia/genética , Herencia Multifactorial/genética , Especificidad de Órganos/genética , Mapeo Físico de Cromosoma , Sitios de Carácter Cuantitativo/genética , Reproducibilidad de los Resultados , Factores de Riesgo
10.
Bioinformatics ; 36(2): 364-372, 2020 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-31373606

RESUMEN

MOTIVATION: Genome-wide association studies have revealed that 88% of disease-associated single-nucleotide polymorphisms (SNPs) reside in noncoding regions. However, noncoding SNPs remain understudied, partly because they are challenging to prioritize for experimental validation. To address this deficiency, we developed the SNP effect matrix pipeline (SEMpl). RESULTS: SEMpl estimates transcription factor-binding affinity by observing differences in chromatin immunoprecipitation followed by deep sequencing signal intensity for SNPs within functional transcription factor-binding sites (TFBSs) genome-wide. By cataloging the effects of every possible mutation within the TFBS motif, SEMpl can predict the consequences of SNPs to transcription factor binding. This knowledge can be used to identify potential disease-causing regulatory loci. AVAILABILITY AND IMPLEMENTATION: SEMpl is available from https://github.com/Boyle-Lab/SEM_CPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Sitios de Unión , Inmunoprecipitación de Cromatina , Unión Proteica , Factores de Transcripción
11.
BMC Bioinformatics ; 21(1): 416, 2020 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-32962625

RESUMEN

BACKGROUND: Comparative genomics studies are growing in number partly because of their unique ability to provide insight into shared and divergent biology between species. Of particular interest is the use of phylogenetic methods to infer the evolutionary history of cis-regulatory sequence features, which contribute strongly to phenotypic divergence and are frequently gained and lost in eutherian genomes. Understanding the mechanisms by which cis-regulatory element turnover generate emergent phenotypes is crucial to our understanding of adaptive evolution. Ancestral reconstruction methods can place species-specific cis-regulatory features in their evolutionary context, thus increasing our understanding of the process of regulatory sequence turnover. However, applying these methods to gain and loss of cis-regulatory features historically required complex workflows, preventing widespread adoption by the broad scientific community. RESULTS: MapGL simplifies phylogenetic inference of the evolutionary history of short genomic sequence features by combining the necessary steps into a single piece of software with a simple set of inputs and outputs. We show that MapGL can reliably disambiguate the mechanisms underlying differential regulatory sequence content across a broad range of phylogenetic topologies and evolutionary distances. Thus, MapGL provides the necessary context to evaluate how genomic sequence gain and loss contribute to species-specific divergence. CONCLUSIONS: MapGL makes phylogenetic inference of species-specific sequence gain and loss easy for both expert and non-expert users, making it a powerful tool for gaining novel insights into genome evolution.


Asunto(s)
Evolución Molecular , Genoma/genética , Genómica/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Programas Informáticos , Animales , Humanos , Mamíferos/genética , Fenotipo , Filogenia
12.
Trends Genet ; 33(1): 34-45, 2017 01.
Artículo en Inglés | MEDLINE | ID: mdl-27939749

RESUMEN

One of the formative goals of genetics research is to understand how genetic variation leads to phenotypic differences and human disease. Genome-wide association studies (GWASs) bring us closer to this goal by linking variation with disease faster than ever before. Despite this, GWASs alone are unable to pinpoint disease-causing single nucleotide polymorphisms (SNPs). Noncoding SNPs, which represent the majority of GWAS SNPs, present a particular challenge. To address this challenge, an array of computational tools designed to prioritize and predict the function of noncoding GWAS SNPs have been developed. However, fewer than 40% of GWAS publications from 2015 utilized these tools. We discuss several leading methods for annotating noncoding variants and how they can be integrated into research pipelines in hopes that they will be broadly applied in future GWAS analyses.


Asunto(s)
Biología Computacional , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Predisposición Genética a la Enfermedad , Humanos , Anotación de Secuencia Molecular
13.
Nature ; 512(7515): 400-5, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164749

RESUMEN

Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.


Asunto(s)
Caenorhabditis elegans/crecimiento & desarrollo , Caenorhabditis elegans/genética , Regulación del Desarrollo de la Expresión Génica/genética , Genoma de los Helmintos/genética , Análisis Espacio-Temporal , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Caenorhabditis elegans/citología , Caenorhabditis elegans/embriología , Proteínas de Caenorhabditis elegans/metabolismo , Linaje de la Célula , Inmunoprecipitación de Cromatina , Genómica , Larva/citología , Larva/genética , Larva/crecimiento & desarrollo , Larva/metabolismo , Unión Proteica
14.
Nature ; 515(7527): 371-375, 2014 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-25409826

RESUMEN

To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.


Asunto(s)
Secuencia Conservada/genética , Genoma/genética , Genómica , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Animales , Línea Celular , Cromatina/genética , Cromatina/metabolismo , Elementos de Facilitación Genéticos/genética , Humanos , Ratones , Polimorfismo de Nucleótido Simple/genética
15.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164757

RESUMEN

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Evolución Molecular , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Caenorhabditis elegans/crecimiento & desarrollo , Inmunoprecipitación de Cromatina , Secuencia Conservada/genética , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Genoma/genética , Humanos , Anotación de Secuencia Molecular , Motivos de Nucleótidos/genética , Especificidad de Órganos/genética , Factores de Transcripción/genética
16.
Nucleic Acids Res ; 46(4): 1878-1894, 2018 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-29361190

RESUMEN

The mouse is widely used as system to study human genetic mechanisms. However, extensive rewiring of transcriptional regulatory networks often confounds translation of findings between human and mouse. Site-specific gain and loss of individual transcription factor binding sites (TFBS) has caused functional divergence of orthologous regulatory loci, and so we must look beyond this positional conservation to understand common themes of regulatory control. Fortunately, transcription factor co-binding patterns shared across species often perform conserved regulatory functions. These can be compared to 'regulatory sentences' that retain the same meanings regardless of sequence and species context. By analyzing TFBS co-occupancy patterns observed in four human and mouse cell types, we learned a regulatory grammar: the rules by which TFBS are combined into meaningful regulatory sentences. Different parts of this grammar associate with specific sets of functional annotations regardless of sequence conservation and predict functional signatures more accurately than positional conservation. We further show that both species-specific and conserved portions of this grammar are involved in gene expression divergence and human disease risk. These findings expand our understanding of transcriptional regulatory mechanisms, suggesting that phenotypic divergence and disease risk are driven by a complex interplay between deeply conserved and species-specific transcriptional regulatory pathways.


Asunto(s)
Regulación de la Expresión Génica , Ratones/genética , Factores de Transcripción/metabolismo , Animales , Secuencia de Bases , Sitios de Unión , Cromatina , Secuencia Conservada , Enfermedad/genética , Evolución Molecular , Sitios Genéticos , Humanos , Sistema Inmunológico , Polimorfismo de Nucleótido Simple , Especificidad de la Especie
17.
Hum Mutat ; 40(9): 1292-1298, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31228310

RESUMEN

Here we present a computational model, Score of Unified Regulatory Features (SURF), that predicts functional variants in enhancer and promoter elements. SURF is trained on data from massively parallel reporter assays and predicts the effect of variants on reporter expression levels. It achieved the top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" challenge. We also show that features queried through RegulomeDB, which are direct annotations from functional genomics data, help improve prediction accuracy beyond transfer learning features from DNA sequence-based deep learning models. Some of the most important features include DNase footprints, especially when coupled with complementary ChIP-seq data. Furthermore, we found our model achieved good performance in predicting allele-specific transcription factor binding events. As an extension to the current scoring system in RegulomeDB, we expect our computational model to prioritize variants in regulatory regions, thus help the understanding of functional variants in noncoding regions that lead to disease.


Asunto(s)
Elementos de Facilitación Genéticos , Variación Genética , Genómica/métodos , Regiones Promotoras Genéticas , Aprendizaje Profundo , Predisposición Genética a la Enfermedad , Genoma Humano , Humanos , Modelos Genéticos , Análisis de Secuencia de ADN/métodos
18.
Hum Mutat ; 40(9): 1280-1291, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31106481

RESUMEN

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.


Asunto(s)
ADN/química , Epigenómica/métodos , Mutación Puntual , Sitios de Unión , Línea Celular , Cromatina/genética , ADN/metabolismo , Elementos de Facilitación Genéticos , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Regiones Promotoras Genéticas , Factores de Transcripción/metabolismo
19.
Trends Genet ; 32(4): 238-249, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26962025

RESUMEN

The ENCODE project represents a major leap from merely describing and comparing genomic sequences to surveying them for direct indicators of function. The astounding quantity of data produced by the ENCODE consortium can serve as a map to locate specific landmarks, guide hypothesis generation, and lead us to principles and mechanisms underlying genome biology. Despite its broad appeal, the size and complexity of the repository can be intimidating to prospective users. We present here some background about the ENCODE data, survey the resources available for accessing them, and describe a few simple principles to help prospective users choose the data type(s) that best suit their needs, where to get them, and how to use them to their best advantage.


Asunto(s)
Genómica , Bases de Datos Genéticas , Humanos , Internet , Polimorfismo de Nucleótido Simple
20.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955619

RESUMEN

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Alelos , Línea Celular , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Células K562 , Especificidad de Órganos , Fosforilación/genética , Polimorfismo de Nucleótido Simple/genética , Mapas de Interacción de Proteínas , ARN no Traducido/genética , ARN no Traducido/metabolismo , Selección Genética/genética , Sitio de Iniciación de la Transcripción
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA