Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
1.
Nat Commun ; 15(1): 4110, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38750024

RESUMEN

Maturation of eukaryotic pre-mRNAs via splicing and polyadenylation is modulated across cell types and conditions by a variety of RNA-binding proteins (RBPs). Although there exist over 1,500 RBPs in human cells, their binding motifs and functions still remain to be elucidated, especially in the complex environment of tissues and in the context of diseases. To overcome the lack of methods for the systematic and automated detection of sequence motif-guided pre-mRNA processing regulation from RNA sequencing (RNA-Seq) data we have developed MAPP (Motif Activity on Pre-mRNA Processing). Applying MAPP to RBP knock-down experiments reveals that many RBPs regulate both splicing and polyadenylation of nascent transcripts by acting on similar sequence motifs. MAPP not only infers these sequence motifs, but also unravels the position-dependent impact of the RBPs on pre-mRNA processing. Interestingly, all investigated RBPs that act on both splicing and 3' end processing exhibit a consistently repressive or activating effect on both processes, providing a first glimpse on the underlying mechanism. Applying MAPP to normal and malignant brain tissue samples unveils that the motifs bound by the PTBP1 and RBFOX RBPs coordinately drive the oncogenic splicing program active in glioblastomas demonstrating that MAPP paves the way for characterizing pre-mRNA processing regulators under physiological and pathological conditions.


Asunto(s)
Poliadenilación , Precursores del ARN , Empalme del ARN , Proteínas de Unión al ARN , Humanos , Proteínas de Unión al ARN/metabolismo , Proteínas de Unión al ARN/genética , Precursores del ARN/metabolismo , Precursores del ARN/genética , Regulación Neoplásica de la Expresión Génica , Neoplasias/genética , Neoplasias/metabolismo , Motivos de Nucleótidos , Proteína de Unión al Tracto de Polipirimidina/metabolismo , Proteína de Unión al Tracto de Polipirimidina/genética , Factores de Empalme de ARN/metabolismo , Factores de Empalme de ARN/genética , Ribonucleoproteínas Nucleares Heterogéneas/metabolismo , Ribonucleoproteínas Nucleares Heterogéneas/genética , ARN Mensajero/metabolismo , ARN Mensajero/genética
3.
Genome Biol ; 24(1): 77, 2023 04 17.
Artículo en Inglés | MEDLINE | ID: mdl-37069586

RESUMEN

We present RCRUNCH, an end-to-end solution to CLIP data analysis for identification of binding sites and sequence specificity of RNA-binding proteins. RCRUNCH can analyze not only reads that map uniquely to the genome but also those that map to multiple genome locations or across splice boundaries and can consider various types of background in the estimation of read enrichment. By applying RCRUNCH to the eCLIP data from the ENCODE project, we have constructed a comprehensive and homogeneous resource of in-vivo-bound RBP sequence motifs. RCRUNCH automates the reproducible analysis of CLIP data, enabling studies of post-transcriptional control of gene expression.


Asunto(s)
Proteínas de Unión al ARN , ARN , ARN/metabolismo , Análisis de Secuencia de ARN , Sitios de Unión/genética , Unión Proteica , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo
4.
Proc Natl Acad Sci U S A ; 120(8): e2211091120, 2023 02 21.
Artículo en Inglés | MEDLINE | ID: mdl-36780518

RESUMEN

Microbes in the wild face highly variable and unpredictable environments and are naturally selected for their average growth rate across environments. Apart from using sensory regulatory systems to adapt in a targeted manner to changing environments, microbes employ bet-hedging strategies where cells in an isogenic population switch stochastically between alternative phenotypes. Yet, bet-hedging suffers from a fundamental trade-off: Increasing the phenotype-switching rate increases the rate at which maladapted cells explore alternative phenotypes but also increases the rate at which cells switch out of a well-adapted state. Consequently, it is currently believed that bet-hedging strategies are effective only when the number of possible phenotypes is limited and when environments last for sufficiently many generations. However, recent experimental results show that gene expression noise generally decreases with growth rate, suggesting that phenotype-switching rates may systematically decrease with growth rate. Such growth rate dependent stability (GRDS) causes cells to be more explorative when maladapted and more phenotypically stable when well-adapted, and we show that GRDS can almost completely overcome the trade-off that limits bet-hedging, allowing for effective adaptation even when environments are diverse and change rapidly. We further show that even a small decrease in switching rates of faster-growing phenotypes can substantially increase long-term fitness of bet-hedging strategies. Together, our results suggest that stochastic strategies may play an even bigger role for microbial adaptation than hitherto appreciated.


Asunto(s)
Aclimatación , Evolución Biológica , Fenotipo , Adaptación Fisiológica/genética
5.
EMBO J ; 41(24): e111132, 2022 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-36345783

RESUMEN

The cerebral cortex contains billions of neurons, and their disorganization or misspecification leads to neurodevelopmental disorders. Understanding how the plethora of projection neuron subtypes are generated by cortical neural stem cells (NSCs) is a major challenge. Here, we focused on elucidating the transcriptional landscape of murine embryonic NSCs, basal progenitors (BPs), and newborn neurons (NBNs) throughout cortical development. We uncover dynamic shifts in transcriptional space over time and heterogeneity within each progenitor population. We identified signature hallmarks of NSC, BP, and NBN clusters and predict active transcriptional nodes and networks that contribute to neural fate specification. We find that the expression of receptors, ligands, and downstream pathway components is highly dynamic over time and throughout the lineage implying differential responsiveness to signals. Thus, we provide an expansive compendium of gene expression during cortical development that will be an invaluable resource for studying neural developmental processes and neurodevelopmental disorders.


Asunto(s)
Células-Madre Neurales , Neuronas , Animales , Ratones , Diferenciación Celular , Linaje de la Célula/genética , Corteza Cerebral , Células Madre Embrionarias , Neurogénesis/genética , Neuronas/metabolismo
6.
Nat Genet ; 54(7): 1037-1050, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35789323

RESUMEN

Zebrafish, a popular organism for studying embryonic development and for modeling human diseases, has so far lacked a systematic functional annotation program akin to those in other animal models. To address this, we formed the international DANIO-CODE consortium and created a central repository to store and process zebrafish developmental functional genomic data. Our data coordination center ( https://danio-code.zfin.org ) combines a total of 1,802 sets of unpublished and re-analyzed published genomic data, which we used to improve existing annotations and show its utility in experimental design. We identified over 140,000 cis-regulatory elements throughout development, including classes with distinct features dependent on their activity in time and space. We delineated the distinct distance topology and chromatin features between regulatory elements active during zygotic genome activation and those active during organogenesis. Finally, we matched regulatory elements and epigenomic landscapes between zebrafish and mouse and predicted functional relationships between them beyond sequence similarity, thus extending the utility of zebrafish developmental genomics to mammals.


Asunto(s)
Bases de Datos Genéticas , Regulación del Desarrollo de la Expresión Génica , Genoma , Genómica , Secuencias Reguladoras de Ácidos Nucleicos , Proteínas de Pez Cebra , Pez Cebra , Animales , Cromatina/genética , Genoma/genética , Humanos , Ratones , Anotación de Secuencia Molecular , Organogénesis/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Pez Cebra/embriología , Pez Cebra/genética , Proteínas de Pez Cebra/genética
7.
PLoS Biol ; 19(12): e3001491, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34919538

RESUMEN

Although it is well appreciated that gene expression is inherently noisy and that transcriptional noise is encoded in a promoter's sequence, little is known about the extent to which noise levels of individual promoters vary across growth conditions. Using flow cytometry, we here quantify transcriptional noise in Escherichia coli genome-wide across 8 growth conditions and find that noise levels systematically decrease with growth rate, with a condition-dependent lower bound on noise. Whereas constitutive promoters consistently exhibit low noise in all conditions, regulated promoters are both more noisy on average and more variable in noise across conditions. Moreover, individual promoters show highly distinct variation in noise across conditions. We show that a simple model of noise propagation from regulators to their targets can explain a significant fraction of the variation in relative noise levels and identifies TFs that most contribute to both condition-specific and condition-independent noise propagation. In addition, analysis of the genome-wide correlation structure of various gene properties shows that gene regulation, expression noise, and noise plasticity are all positively correlated genome-wide and vary independently of variations in absolute expression, codon bias, and evolutionary rate. Together, our results show that while absolute expression noise tends to decrease with growth rate, relative noise levels of genes are highly condition-dependent and determined by the propagation of noise through the gene regulatory network.


Asunto(s)
Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica/genética , Regiones Promotoras Genéticas/genética , Proteínas de Escherichia coli/genética , Expresión Génica/genética , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Genes Reporteros/genética , Transcriptoma/genética
9.
Nat Biotechnol ; 39(8): 1008-1016, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33927416

RESUMEN

Despite substantial progress in single-cell RNA-seq (scRNA-seq) data analysis methods, there is still little agreement on how to best normalize such data. Starting from the basic requirements that inferred expression states should correct for both biological and measurement sampling noise and that changes in expression should be measured in terms of fold changes, we here derive a Bayesian normalization procedure called Sanity (SAmpling-Noise-corrected Inference of Transcription activitY) from first principles. Sanity estimates expression values and associated error bars directly from raw unique molecular identifier (UMI) counts without any tunable parameters. Using simulated and real scRNA-seq datasets, we show that Sanity outperforms other normalization methods on downstream tasks, such as finding nearest-neighbor cells and clustering cells into subtypes. Moreover, we show that by systematically overestimating the expression variability of genes with low expression and by introducing spurious correlations through mapping the data to a lower-dimensional representation, other methods yield severely distorted pictures of the data.


Asunto(s)
RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Transcriptoma/genética , Animales , Teorema de Bayes , Células Cultivadas , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Ratones , Modelos Estadísticos
10.
Elife ; 102021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33416498

RESUMEN

Although recombination is accepted to be common in bacteria, for many species robust phylogenies with well-resolved branches can be reconstructed from whole genome alignments of strains, and these are generally interpreted to reflect clonal relationships. Using new methods based on the statistics of single-nucleotide polymorphism (SNP) splits, we show that this interpretation is incorrect. For many species, each locus has recombined many times along its line of descent, and instead of many loci supporting a common phylogeny, the phylogeny changes many thousands of times along the genome alignment. Analysis of the patterns of allele sharing among strains shows that bacterial populations cannot be approximated as either clonal or freely recombining but are structured such that recombination rates between lineages vary over several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect distributions of recombination rates.


Asunto(s)
Bacterias/genética , Genoma Bacteriano , Filogenia , Recombinación Genética , Bacillus subtilis/clasificación , Bacillus subtilis/genética , Bacterias/clasificación , Escherichia coli/clasificación , Escherichia coli/genética , Evolución Molecular , Helicobacter pylori/clasificación , Helicobacter pylori/genética , Mycobacterium tuberculosis/clasificación , Mycobacterium tuberculosis/genética , Polimorfismo de Nucleótido Simple , Salmonella enterica/clasificación , Salmonella enterica/genética , Análisis de Secuencia de ADN , Staphylococcus aureus/clasificación , Staphylococcus aureus/genética , Secuenciación Completa del Genoma
11.
PLoS Biol ; 18(12): e3000952, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33270631

RESUMEN

Populations of bacteria often undergo a lag in growth when switching conditions. Because growth lags can be large compared to typical doubling times, variations in growth lag are an important but often overlooked component of bacterial fitness in fluctuating environments. We here explore how growth lag variation is determined for the archetypical switch from glucose to lactose as a carbon source in Escherichia coli. First, we show that single-cell lags are bimodally distributed and controlled by a single-molecule trigger. That is, gene expression noise causes the population before the switch to divide into subpopulations with zero and nonzero lac operon expression. While "sensorless" cells with zero preexisting lac expression at the switch have long lags because they are unable to sense the lactose signal, any nonzero lac operon expression suffices to ensure a short lag. Second, we show that the growth lag at the population level depends crucially on the fraction of sensorless cells and that this fraction in turn depends sensitively on the growth condition before the switch. Consequently, even small changes in basal expression can significantly affect the fraction of sensorless cells, thereby population lags and fitness under switching conditions, and may thus be subject to significant natural selection. Indeed, we show that condition-dependent population lags vary across wild E. coli isolates. Since many sensory genes are naturally low expressed in conditions where their inducer is not present, bimodal responses due to subpopulations of sensorless cells may be a general mechanism inducing phenotypic heterogeneity and controlling population lags in switching environments. This mechanism also illustrates how gene expression noise can turn even a simple sensory gene circuit into a bet hedging module and underlines the profound role of gene expression noise in regulatory responses.


Asunto(s)
Escherichia coli/metabolismo , Regulación Bacteriana de la Expresión Génica/genética , Aptitud Genética/fisiología , Bacterias/genética , Bacterias/metabolismo , Ambiente , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Regulación Bacteriana de la Expresión Génica/fisiología , Redes Reguladoras de Genes/genética , Interacción Gen-Ambiente , Aptitud Genética/genética , Glucosa/metabolismo , Operón Lac , Lactosa/metabolismo , Fenotipo
12.
PLoS One ; 15(10): e0240233, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33045012

RESUMEN

Fluorescence flow cytometry is increasingly being used to quantify single-cell expression distributions in bacteria in high-throughput. However, there has been no systematic investigation into the best practices for quantitative analysis of such data, what systematic biases exist, and what accuracy and sensitivity can be obtained. We investigate these issues by measuring the same E. coli strains carrying fluorescent reporters using both flow cytometry and microscopic setups and systematically comparing the resulting single-cell expression distributions. Using these results, we develop methods for rigorous quantitative inference of single-cell expression distributions from fluorescence flow cytometry data. First, we present a Bayesian mixture model to separate debris from viable cells using all scattering signals. Second, we show that cytometry measurements of fluorescence are substantially affected by autofluorescence and shot noise, which can be mistaken for intrinsic noise in gene expression, and present methods to correct for these using calibration measurements. Finally, we show that because forward- and side-scatter signals scale non-linearly with cell size, and are also affected by a substantial shot noise component that cannot be easily calibrated unless independent measurements of cell size are available, it is not possible to accurately estimate the variability in the sizes of individual cells using flow cytometry measurements alone. To aid other researchers with quantitative analysis of flow cytometry expression data in bacteria, we distribute E-Flow, an open-source R package that implements our methods for filtering debris and for estimating true biological expression means and variances from the fluorescence signal. The package is available at https://github.com/vanNimwegenLab/E-Flow.


Asunto(s)
Escherichia coli/genética , Citometría de Flujo , Genes Bacterianos , Análisis de la Célula Individual , Transcriptoma , Citometría de Flujo/métodos , Fluorescencia , Proteínas Fluorescentes Verdes/genética , Microscopía Fluorescente
13.
Sci Rep ; 10(1): 4625, 2020 03 13.
Artículo en Inglés | MEDLINE | ID: mdl-32170161

RESUMEN

Neural stem cells (NSCs) generate neurons of the cerebral cortex with distinct morphologies and functions. How specific neuron production, differentiation and migration are orchestrated is unclear. Hippo signaling regulates gene expression through Tead transcription factors (TFs). We show that Hippo transcriptional coactivators Yap1/Taz and the Teads have distinct functions during cortical development. Yap1/Taz promote NSC maintenance and Satb2+ neuron production at the expense of Tbr1+ neuron generation. However, Teads have moderate effects on NSC maintenance and do not affect Satb2+ neuron differentiation. Conversely, whereas Tead2 blocks Tbr1+ neuron formation, Tead1 and Tead3 promote this early fate. In addition, we found that Hippo effectors regulate neuronal migration to the cortical plate (CP) in a reciprocal fashion, that ApoE, Dab2 and Cyr61 are Tead targets, and these contribute to neuronal fate determination and migration. Our results indicate that multifaceted Hippo signaling is pivotal in different aspects of cortical development.


Asunto(s)
Corteza Cerebral/crecimiento & desarrollo , Proteínas de Unión al ADN/genética , Transducción de Señal , Factores de Transcripción/metabolismo , Animales , Moléculas de Adhesión Celular Neuronal/genética , Línea Celular , Corteza Cerebral/metabolismo , Inmunoprecipitación de Cromatina , Proteínas de Unión al ADN/metabolismo , Proteínas de la Matriz Extracelular/genética , Femenino , Vía de Señalización Hippo , Humanos , Ratones , Proteínas del Tejido Nervioso/genética , Células-Madre Neurales , Especificidad de Órganos , Proteínas Serina-Treonina Quinasas/genética , Proteína Reelina , Serina Endopeptidasas/genética , Factores de Transcripción de Dominio TEA , Factores de Transcripción/genética
14.
Elife ; 82019 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-31710292

RESUMEN

Living cells proliferate by completing and coordinating two cycles, a division cycle controlling cell size and a DNA replication cycle controlling the number of chromosomal copies. It remains unclear how bacteria such as Escherichia coli tightly coordinate those two cycles across a wide range of growth conditions. Here, we used time-lapse microscopy in combination with microfluidics to measure growth, division and replication in single E. coli cells in both slow and fast growth conditions. To compare different phenomenological cell cycle models, we introduce a statistical framework assessing their ability to capture the correlation structure observed in the data. In combination with stochastic simulations, our data indicate that the cell cycle is driven from one initiation event to the next rather than from birth to division and is controlled by two adder mechanisms: the added volume since the last initiation event determines the timing of both the next division and replication initiation events.


Asunto(s)
Ciclo Celular/genética , Cromosomas Bacterianos/genética , Replicación del ADN/genética , ADN Bacteriano/genética , Escherichia coli/genética , División Celular/genética , Escherichia coli/citología , Escherichia coli/crecimiento & desarrollo , Técnicas Analíticas Microfluídicas/métodos , Microscopía Fluorescente , Microscopía de Contraste de Fase , Modelos Genéticos , Análisis de la Célula Individual/métodos , Imagen de Lapso de Tiempo/métodos
15.
Genome Res ; 29(7): 1164-1177, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31138617

RESUMEN

Although ChIP-seq has become a routine experimental approach for quantitatively characterizing the genome-wide binding of transcription factors (TFs), computational analysis procedures remain far from standardized, making it difficult to compare ChIP-seq results across experiments. In addition, although genome-wide binding patterns must ultimately be determined by local constellations of DNA-binding sites, current analysis is typically limited to identifying enriched motifs in ChIP-seq peaks. Here we present Crunch, a completely automated computational method that performs all ChIP-seq analysis from quality control through read mapping and peak detecting and that integrates comprehensive modeling of the ChIP signal in terms of known and novel binding motifs, quantifying the contribution of each motif and annotating which combinations of motifs explain each binding peak. By applying Crunch to 128 data sets from the ENCODE Project, we show that Crunch outperforms current peak finders and find that TFs naturally separate into "solitary TFs," for which a single motif explains the ChIP-peaks, and "cobinding TFs," for which multiple motifs co-occur within peaks. Moreover, for most data sets, the motifs that Crunch identified de novo outperform known motifs, and both the set of cobinding motifs and the top motif of solitary TFs are consistent across experiments and cell lines. Crunch is implemented as a web server, enabling standardized analysis of any collection of ChIP-seq data sets by simply uploading raw sequencing data. Results are provided both in a graphical web interface and as downloadable files.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Factores de Transcripción/metabolismo , Secuencias de Aminoácidos , Animales , Sitios de Unión , Conjuntos de Datos como Asunto , Humanos , Motivos de Nucleótidos , Control de Calidad , Secuencias Reguladoras de Ácidos Nucleicos
16.
Mol Syst Biol ; 14(8): e8266, 2018 08 27.
Artículo en Inglés | MEDLINE | ID: mdl-30150282

RESUMEN

miRNAs are small RNAs that regulate gene expression post-transcriptionally. By repressing the translation and promoting the degradation of target mRNAs, miRNAs may reduce the cell-to-cell variability in protein expression, induce correlations between target expression levels, and provide a layer through which targets can influence each other's expression as "competing RNAs" (ceRNAs). However, experimental evidence for these behaviors is limited. Combining mathematical modeling with RNA sequencing of individual human embryonic kidney cells in which the expression of two distinct miRNAs was induced over a wide range, we have inferred parameters describing the response of hundreds of miRNA targets to miRNA induction. Individual targets have widely different response dynamics, and only a small proportion of predicted targets exhibit high sensitivity to miRNA induction. Our data reveal for the first time the response parameters of the entire network of endogenous miRNA targets to miRNA induction, demonstrating that miRNAs correlate target expression and at the same time increase the variability in expression of individual targets across cells. The approach is generalizable to other miRNAs and post-transcriptional regulators to improve the understanding of gene expression dynamics in individual cell types.


Asunto(s)
Redes Reguladoras de Genes/genética , MicroARNs/genética , ARN Mensajero/genética , Análisis de la Célula Individual , Biología Computacional , Perfilación de la Expresión Génica , Regulación de la Expresión Génica/genética , Células HEK293 , Humanos , Modelos Teóricos , Análisis de Secuencia de ARN
17.
Genome Biol ; 19(1): 44, 2018 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-29592812

RESUMEN

3' Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers. In particular, our analysis points to polypyrimidine tract binding protein 1 as a regulator of poly(A) site choice in glioblastoma.


Asunto(s)
Regiones no Traducidas 3' , Poliadenilación , Análisis de Secuencia de ARN , Glioblastoma/genética , Glioblastoma/metabolismo , Humanos , Masculino , Motivos de Nucleótidos , Proteína de Unión al Tracto de Polipirimidina/metabolismo , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Proteínas de Unión al ARN/metabolismo , Factores de Escisión y Poliadenilación de ARNm/metabolismo
18.
Nat Commun ; 9(1): 212, 2018 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-29335514

RESUMEN

Much is still not understood about how gene regulatory interactions control cell fate decisions in single cells, in part due to the difficulty of directly observing gene regulatory processes in vivo. We introduce here a novel integrated setup consisting of a microfluidic chip and accompanying analysis software that enable long-term quantitative tracking of growth and gene expression in single cells. The dual-input Mother Machine (DIMM) chip enables controlled and continuous variation of external conditions, allowing direct observation of gene regulatory responses to changing conditions in single cells. The Mother Machine Analyzer (MoMA) software achieves unprecedented accuracy in segmenting and tracking cells, and streamlines high-throughput curation with a novel leveraged editing procedure. We demonstrate the power of the method by uncovering several novel features of an iconic gene regulatory program: the induction of Escherichia coli's lac operon in response to a switch from glucose to lactose.


Asunto(s)
Regulación Bacteriana de la Expresión Génica , Técnicas Analíticas Microfluídicas/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Algoritmos , Rastreo Celular/instrumentación , Rastreo Celular/métodos , Escherichia coli/citología , Escherichia coli/efectos de los fármacos , Escherichia coli/genética , Glucosa/farmacología , Operón Lac/genética , Lactosa/farmacología , Análisis de la Célula Individual/instrumentación
19.
PLoS Comput Biol ; 13(7): e1005176, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28753602

RESUMEN

Gene regulatory networks are ultimately encoded by the sequence-specific binding of (TFs) to short DNA segments. Although it is customary to represent the binding specificity of a TF by a position-specific weight matrix (PSWM), which assumes each position within a site contributes independently to the overall binding affinity, evidence has been accumulating that there can be significant dependencies between positions. Unfortunately, methodological challenges have so far hindered the development of a practical and generally-accepted extension of the PSWM model. On the one hand, simple models that only consider dependencies between nearest-neighbor positions are easy to use in practice, but fail to account for the distal dependencies that are observed in the data. On the other hand, models that allow for arbitrary dependencies are prone to overfitting, requiring regularization schemes that are difficult to use in practice for non-experts. Here we present a new regulatory motif model, called dinucleotide weight tensor (DWT), that incorporates arbitrary pairwise dependencies between positions in binding sites, rigorously from first principles, and free from tunable parameters. We demonstrate the power of the method on a large set of ChIP-seq data-sets, showing that DWTs outperform both PSWMs and motif models that only incorporate nearest-neighbor dependencies. We also demonstrate that DWTs outperform two previously proposed methods. Finally, we show that DWTs inferred from ChIP-seq data also outperform PSWMs on HT-SELEX data for the same TF, suggesting that DWTs capture inherent biophysical properties of the interactions between the DNA binding domains of TFs and their binding sites. We make a suite of DWT tools available at dwt.unibas.ch, that allow users to automatically perform 'motif finding', i.e. the inference of DWT motifs from a set of sequences, binding site prediction with DWTs, and visualization of DWT 'dilogo' motifs.


Asunto(s)
Sitios de Unión/genética , Biología Computacional/métodos , ADN , Motivos de Nucleótidos/genética , Factores de Transcripción , ADN/química , ADN/genética , ADN/metabolismo , Modelos Estadísticos , ARN/química , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ADN , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA