Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955619

RESUMEN

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo , Alelos , Línea Celular , Factor de Transcripción GATA1/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Células K562 , Especificidad de Órganos , Fosforilación/genética , Polimorfismo de Nucleótido Simple/genética , Mapas de Interacción de Proteínas , ARN no Traducido/genética , ARN no Traducido/metabolismo , Selección Genética/genética , Sitio de Iniciación de la Transcripción
2.
Genome Res ; 22(9): 1658-67, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22955978

RESUMEN

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.


Asunto(s)
Regulación de la Expresión Génica , Genómica , Factores de Transcripción/metabolismo , Transcripción Genética , Composición de Base , Sitios de Unión/genética , Línea Celular , Cromatina/genética , Cromatina/metabolismo , Biología Computacional/métodos , Histonas/genética , Humanos , Modelos Biológicos , Regiones Promotoras Genéticas , Unión Proteica/genética , Sitio de Iniciación de la Transcripción
3.
Nucleic Acids Res ; 40(Database issue): D687-94, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22009677

RESUMEN

About one-fifth of the genes in the budding yeast are essential for haploid viability and cannot be functionally assessed using standard genetic approaches such as gene deletion. To facilitate genetic analysis of essential genes, we and others have assembled collections of yeast strains expressing temperature-sensitive (ts) alleles of essential genes. To explore the phenotypes caused by essential gene mutation we used a panel of genetically engineered fluorescent markers to explore the morphology of cells in the ts strain collection using high-throughput microscopy. Here, we describe the design and implementation of an online database, PhenoM (Phenomics of yeast Mutants), for storing, retrieving, visualizing and data mining the quantitative single-cell measurements extracted from micrographs of the ts mutant cells. PhenoM allows users to rapidly search and retrieve raw images and their quantified morphological data for genes of interest. The database also provides several data-mining tools, including a PhenoBlast module for phenotypic comparison between mutant strains and a Gene Ontology module for functional enrichment analysis of gene sets showing similar morphological alterations. The current PhenoM version 1.0 contains 78,194 morphological images and 1,909,914 cells covering six subcellular compartments or structures for 775 ts alleles spanning 491 essential genes. PhenoM is freely available at http://phenom.ccbr.utoronto.ca/.


Asunto(s)
Bases de Datos Genéticas , Genes Esenciales , Genes Fúngicos , Mutación , Fenotipo , Saccharomyces cerevisiae/genética , Minería de Datos , Saccharomyces cerevisiae/citología
4.
Proc Natl Acad Sci U S A ; 107(23): 10472-7, 2010 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-20489180

RESUMEN

Gene regulation is a process with many steps allowing for stochastic biochemical reactions, which leads to expression noise-i.e., the cell-to-cell stochastic fluctuation in protein abundance. Such expression noise can give rise to drastically diverse phenotypes, even within isogenic cell populations. Although numerous biophysical approaches had been proposed to model the origin and propagation of expression noise in biological networks, these models essentially characterize the innate stochastic dynamics in gene regulation in a mechanistic way. In this work, by investigating expression noise in the context of yeast cellular networks, we place the biophysical formulism onto solid genetic ground. At the sequence level, we show that extremely noisy genes are highly conserved in their coding sequences. At the level of cellular networks where natural selection is manifested by the topological constraints, we show that genes with varying expression noise are modularly organized in the protein interaction network and are positioned orderly in the gene regulatory network. We demonstrate that these topological constraints are highly predictive of stochastic gene expression, with which we were able to confidently predict stochastic expression for more than 2,000 yeast genes whose expression noise was previously not known. We validated the predictions by high-content cell imaging. Our approach makes feasible genome-wide prediction of stochastic gene expression, and such predictability in turn suggests that expression noise is an evolvable genetic trait.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Genoma Fúngico , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN/métodos , Redes Reguladoras de Genes
5.
Bioinformatics ; 27(23): 3221-7, 2011 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-22039215

RESUMEN

MOTIVATION: ChIP-seq and ChIP-chip experiments have been widely used to identify transcription factor (TF) binding sites and target genes. Conventionally, a fairly 'simple' approach is employed for target gene identification e.g. finding genes with binding sites within 2 kb of a transcription start site (TSS). However, this does not take into account the number of sites upstream of the TSS, their exact positioning or the fact that different TFs appear to act at different characteristic distances from the TSS. RESULTS: Here we propose a probabilistic model called target identification from profiles (TIP) that quantitatively measures the regulatory relationships between TFs and target genes. For each TF, our model builds a characteristic, averaged profile of binding around the TSS and then uses this to weight the sites associated with a given gene, providing a continuous-valued 'regulatory' score relating each TF and potential target. Moreover, the score can readily be turned into a ranked list of target genes and an estimate of significance, which is useful for case-dependent downstream analysis. CONCLUSION: We show the advantages of TIP by comparing it to the 'simple' approach on several representative datasets, using motif occurrence and relationship to knock-out experiments as metrics of validation. Moreover, we show that the probabilistic model is not as sensitive to various experimental parameters (including sequencing depth and peak-calling method) as the simple approach; in fact, the lesser dependence on sequencing depth potentially utilizes the result of a ChIP-seq experiment in a more 'cost-effective' manner. CONTACT: mark.gerstein@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Modelos Estadísticos , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos , Animales , Sitios de Unión , Inmunoprecipitación de Cromatina , Receptor alfa de Estrógeno/metabolismo , Regulación de la Expresión Génica , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Unión Proteica , Factor de Transcripción STAT4/metabolismo , Análisis de Secuencia de ADN , Factores de Transcripción/química , Factores de Transcripción/genética , Sitio de Iniciación de la Transcripción
6.
PLoS Comput Biol ; 6(8)2010 Aug 26.
Artículo en Inglés | MEDLINE | ID: mdl-20865155

RESUMEN

Variations in gene expression level might lead to phenotypic diversity across individuals or populations. Although many human genes are found to have differential mRNA levels between populations, the extent of gene expression that could vary within and between populations largely remains elusive. To investigate the dynamic range of gene expression, we analyzed the expression variability of ∼18, 000 human genes across individuals within HapMap populations. Although ∼20% of human genes show differentiated mRNA levels between populations, our results show that expression variability of most human genes in one population is not significantly deviant from another population, except for a small fraction that do show substantially higher expression variability in a particular population. By associating expression variability with sequence polymorphism, intriguingly, we found SNPs in the untranslated regions (5' and 3'UTRs) of these variable genes show consistently elevated population heterozygosity. We performed differential expression analysis on a genome-wide scale, and found substantially reduced expression variability for a large number of genes, prohibiting them from being differentially expressed between populations. Functional analysis revealed that genes with the greatest within-population expression variability are significantly enriched for chemokine signaling in HIV-1 infection, and for HIV-interacting proteins that control viral entry, replication, and propagation. This observation combined with the finding that known human HIV host factors show substantially elevated expression variability, collectively suggest that gene expression variability might explain differential HIV susceptibility across individuals.


Asunto(s)
Perfilación de la Expresión Génica , Predisposición Genética a la Enfermedad , Variación Genética , Infecciones por VIH/genética , VIH-1 , Modelos Genéticos , Quimiocinas/genética , Femenino , Heterocigoto , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Población/genética , Regiones no Traducidas/genética , Internalización del Virus , Replicación Viral/genética
7.
Neural Netw ; 93: 219-229, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28668660

RESUMEN

Stochastic Gradient Descent (SGD) updates Convolutional Neural Network (CNN) with a noisy gradient computed from a random batch, and each batch evenly updates the network once in an epoch. This model applies the same training effort to each batch, but it overlooks the fact that the gradient variance, induced by Sampling Bias and Intrinsic Image Difference, renders different training dynamics on batches. In this paper, we develop a new training strategy for SGD, referred to as Inconsistent Stochastic Gradient Descent (ISGD) to address this problem. The core concept of ISGD is the inconsistent training, which dynamically adjusts the training effort w.r.t the loss. ISGD models the training as a stochastic process that gradually reduces down the mean of batch's loss, and it utilizes a dynamic upper control limit to identify a large loss batch on the fly. ISGD stays on the identified batch to accelerate the training with additional gradient updates, and it also has a constraint to penalize drastic parameter changes. ISGD is straightforward, computationally efficient and without requiring auxiliary memories. A series of empirical evaluations on real world datasets and networks demonstrate the promising performance of inconsistent training.


Asunto(s)
Redes Neurales de la Computación , Procesos Estocásticos
8.
Nat Biotechnol ; 29(4): 361-7, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21441928

RESUMEN

Conditional temperature-sensitive (ts) mutations are valuable reagents for studying essential genes in the yeast Saccharomyces cerevisiae. We constructed 787 ts strains, covering 497 (∼45%) of the 1,101 essential yeast genes, with ∼30% of the genes represented by multiple alleles. All of the alleles are integrated into their native genomic locus in the S288C common reference strain and are linked to a kanMX selectable marker, allowing further genetic manipulation by synthetic genetic array (SGA)-based, high-throughput methods. We show two such manipulations: barcoding of 440 strains, which enables chemical-genetic suppression analysis, and the construction of arrays of strains carrying different fluorescent markers of subcellular structure, which enables quantitative analysis of phenotypes using high-content screening. Quantitative analysis of a GFP-tubulin marker identified roles for cohesin and condensin genes in spindle disassembly. This mutant collection should facilitate a wide range of systematic studies aimed at understanding the functions of essential genes.


Asunto(s)
Genes Esenciales , Genoma Fúngico , Saccharomyces cerevisiae/genética , Temperatura , Alelos , Bases de Datos Genéticas , Genes Fúngicos , Genes Letales , Ingeniería Genética/métodos , Sitios Genéticos , Espectrometría de Masas/métodos , Análisis por Micromatrices/métodos , Microscopía Confocal , Mutación , Fenotipo , Plásmidos , ARN Mensajero , Saccharomyces cerevisiae/crecimiento & desarrollo , Análisis de la Célula Individual , Tubulina (Proteína)/análisis
9.
J Bioinform Comput Biol ; 7(6): 955-72, 2009 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20014473

RESUMEN

Due to the difficulties in identifying microRNA (miRNA) targets experimentally in a high-throughput manner, several computational approaches have been proposed. To this date, most leading algorithms are based on sequence information alone. However, there has been limited overlap between these predictions, implying high false-positive rates, which underlines the limitation of sequence-based approaches. Considering the repressive nature of miRNAs at the mRNA translational level, here we describe a probabilistic model to make predictions by combining sequence complementarity, miRNA expression level, and protein abundance. Our underlying assumption is that, given sequence complementarity between a miRNA and its putative mRNA targets, the miRNA expression level should be high and the protein abundance of the mRNA should be low. Having identified a set of confident predictions, we then built a second probabilistic model to trace back to the mRNA expression of the confident targets to investigate the mechanisms of the miRNA-mediated post-transcriptional regulation. Our results suggest that translational repression (which has no effect on mRNA level), instead of mRNA degradation, is the dominant mechanism in miRNA regulation. This observation explained the previously observed discordant correlation between mRNA expression and protein abundance.


Asunto(s)
Algoritmos , Marcación de Gen/métodos , MicroARNs/genética , Modelos Genéticos , Modelos Estadísticos , Proteoma/genética , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Simulación por Computador , Datos de Secuencia Molecular
10.
J Comput Biol ; 16(3): 457-74, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19254184

RESUMEN

Biological sequence classification (such as protein remote homology detection) solely based on sequence data is an important problem in computational biology, especially in the current genomics era, when large amount of sequence data are becoming available. Support vector machines (SVMs) based on mismatch string kernels were previously applied to solve this problem, achieving reasonable success. However, they still perform poorly on difficult protein families. In this paper, we propose two approaches to solve the protein remote homology detection problem: one uses a convex combination of random-walk kernels to approximate the random-walk kernel with the optimal random steps, and the other constructs an empirical-map kernel using a profile kernel. Both resulting kernels make use of a large number of pairwise sequence similarity information and unlabeled data; and have much better prediction performance than the best profile kernel directly derived from protein sequences. On a competitive Structural Classification Of Proteins (SCOP) benchmark dataset, the overall mean ROC(50) scores on 54 protein families we obtained using both approaches are above 0.90, which significantly outperform previous published results.


Asunto(s)
Biología Computacional/métodos , Proteínas/química , Proteínas/clasificación , Secuencia de Aminoácidos , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína , Homología de Secuencia de Aminoácido
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA