Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Nat Neurosci ; 26(2): 339-349, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36635497

RESUMO

Recent experiments have revealed that neural population codes in many brain areas continuously change even when animals have fully learned and stably perform their tasks. This representational 'drift' naturally leads to questions about its causes, dynamics and functions. Here we explore the hypothesis that neural representations optimize a representational objective with a degenerate solution space, and noisy synaptic updates drive the network to explore this (near-)optimal space causing representational drift. We illustrate this idea and explore its consequences in simple, biologically plausible Hebbian/anti-Hebbian network models of representation learning. We find that the drifting receptive fields of individual neurons can be characterized by a coordinated random walk, with effective diffusion constants depending on various parameters such as learning rate, noise amplitude and input statistics. Despite such drift, the representational similarity of population codes is stable over time. Our model recapitulates experimental observations in the hippocampus and posterior parietal cortex and makes testable predictions that can be probed in future experiments.


Assuntos
Encéfalo , Aprendizagem , Animais , Aprendizagem/fisiologia , Neurônios/fisiologia , Hipocampo , Cabeça , Modelos Neurológicos
2.
Phys Rev Lett ; 129(13): 136402, 2022 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-36206431

RESUMO

We perform a data-driven dimensionality reduction of the scale-dependent four-point vertex function characterizing the functional renormalization group (FRG) flow for the widely studied two-dimensional t-t^{'} Hubbard model on the square lattice. We demonstrate that a deep learning architecture based on a neural ordinary differential equation solver in a low-dimensional latent space efficiently learns the FRG dynamics that delineates the various magnetic and d-wave superconducting regimes of the Hubbard model. We further present a dynamic mode decomposition analysis that confirms that a small number of modes are indeed sufficient to capture the FRG dynamics. Our Letter demonstrates the possibility of using artificial intelligence to extract compact representations of the four-point vertex functions for correlated electrons, a goal of utmost importance for the success of cutting-edge quantum field theoretical methods for tackling the many-electron problem.

3.
Neural Comput ; 34(4): 891-938, 2022 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-35026035

RESUMO

The brain must extract behaviorally relevant latent variables from the signals streamed by the sensory organs. Such latent variables are often encoded in the dynamics that generated the signal rather than in the specific realization of the waveform. Therefore, one problem faced by the brain is to segment time series based on underlying dynamics. We present two algorithms for performing this segmentation task that are biologically plausible, which we define as acting in a streaming setting and all learning rules being local. One algorithm is model based and can be derived from an optimization problem involving a mixture of autoregressive processes. This algorithm relies on feedback in the form of a prediction error and can also be used for forecasting future samples. In some brain regions, such as the retina, the feedback connections necessary to use the prediction error for learning are absent. For this case, we propose a second, model-free algorithm that uses a running estimate of the autocorrelation structure of the signal to perform the segmentation. We show that both algorithms do well when tasked with segmenting signals drawn from autoregressive models with piecewise-constant parameters. In particular, the segmentation accuracy is similar to that obtained from oracle-like methods in which the ground-truth parameters of the autoregressive models are known. We also test our methods on data sets generated by alternating snippets of voice recordings. We provide implementations of our algorithms at https://github.com/ttesileanu/bio-time-series.


Assuntos
Algoritmos , Encéfalo , Processamento de Imagem Assistida por Computador/métodos , Aprendizagem , Fatores de Tempo
4.
Neural Comput ; 33(9): 2309-2352, 2021 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-34412114

RESUMO

Cortical pyramidal neurons receive inputs from multiple distinct neural populations and integrate these inputs in separate dendritic compartments. We explore the possibility that cortical microcircuits implement canonical correlation analysis (CCA), an unsupervised learning method that projects the inputs onto a common subspace so as to maximize the correlations between the projections. To this end, we seek a multichannel CCA algorithm that can be implemented in a biologically plausible neural network. For biological plausibility, we require that the network operates in the online setting and its synaptic update rules are local. Starting from a novel CCA objective function, we derive an online optimization algorithm whose optimization steps can be implemented in a single-layer neural network with multicompartmental neurons and local non-Hebbian learning rules. We also derive an extension of our online CCA algorithm with adaptive output rank and output whitening. Interestingly, the extension maps onto a neural network whose neural architecture and synaptic updates resemble neural circuitry and non-Hebbian plasticity observed in the cortex.


Assuntos
Análise de Correlação Canônica , Redes Neurais de Computação , Algoritmos , Neurônios
5.
Front Comput Neurosci ; 14: 55, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32694989

RESUMO

Normative models of neural computation offer simplified yet lucid mathematical descriptions of murky biological phenomena. Previously, online Principal Component Analysis (PCA) was used to model a network of single-compartment neurons accounting for weighted summation of upstream neural activity in the soma and Hebbian/anti-Hebbian synaptic learning rules. However, synaptic plasticity in biological neurons often depends on the integration of synaptic currents over a dendritic compartment rather than total current in the soma. Motivated by this observation, we model a pyramidal neuronal network using online Canonical Correlation Analysis (CCA). Given two related datasets represented by distal and proximal dendritic inputs, CCA projects them onto the subspace which maximizes the correlation between their projections. First, adopting a normative approach and starting from a single-channel CCA objective function, we derive an online gradient-based optimization algorithm whose steps can be interpreted as the operation of a pyramidal neuron. To model networks of pyramidal neurons, we introduce a novel multi-channel CCA objective function, and derive from it an online gradient-based optimization algorithm whose steps can be interpreted as the operation of a pyramidal neuron network including its architecture, dynamics, and synaptic learning rules. Next, we model a neuron with more than two dendritic compartments by deriving its operation from a known objective function for multi-view CCA. Finally, we confirm the functionality of our networks via numerical simulations. Overall, our work presents a simplified but informative abstraction of learning in a pyramidal neuron network, and demonstrates how such networks can integrate multiple sources of inputs.

6.
Neural Comput ; 30(1): 84-124, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28957017

RESUMO

Modeling self-organization of neural networks for unsupervised learning using Hebbian and anti-Hebbian plasticity has a long history in neuroscience. Yet derivations of single-layer networks with such local learning rules from principled optimization objectives became possible only recently, with the introduction of similarity matching objectives. What explains the success of similarity matching objectives in deriving neural networks with local learning rules? Here, using dimensionality reduction as an example, we introduce several variable substitutions that illuminate the success of similarity matching. We show that the full network objective may be optimized separately for each synapse using local learning rules in both the offline and online settings. We formalize the long-standing intuition of the rivalry between Hebbian and anti-Hebbian rules by formulating a min-max optimization problem. We introduce a novel dimensionality reduction objective using fractional matrix exponents. To illustrate the generality of our approach, we apply it to a novel formulation of dimensionality reduction combined with whitening. We confirm numerically that the networks with learning rules derived from principled objectives perform better than those with heuristic learning rules.


Assuntos
Aprendizagem/fisiologia , Modelos Neurológicos , Vias Neurais/fisiologia , Neurônios/fisiologia , Sinapses/fisiologia , Algoritmos , Teoria dos Jogos , Humanos
7.
Synth Biol (Oxf) ; 2(1): ysx005, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32995506

RESUMO

Quantifying the effect of vital resources on transcription (TX) and translation (TL) helps to understand the degree to which the concentration of each resource must be regulated for achieving homeostasis. Utilizing the synthetic TX-TL system, we study the impact of nucleotide triphosphates (NTPs) and magnesium (Mg2+) on gene expression. Recent observations of the counter-intuitive phenomenon of suppression of gene expression at high NTP concentrations have led to the speculation that such suppression is due to the consumption of resources by TX, hence leaving fewer resources for TL. In this work, we investigate an alternative hypothesis: direct suppression of the TL rate via stoichiometric mismatch in necessary reagents. We observe NTP-dependent suppression even in the early phase of gene expression, contradicting the resource-limitation argument. To further decouple the contributions of TX and TL, we performed gene expression experiments with purified messenger RNA (mRNA). Simultaneously monitoring mRNA and protein abundances allowed us to extract a time-dependent translation rate. Measuring TL rates for different Mg2+ and NTP concentrations, we observe a complex resource dependence. We demonstrate that TL is the rate-limiting process that is directly inhibited by high NTP concentrations. Additional Mg2+ can partially reverse this inhibition. In several experiments, we observe two maxima of the TL rate viewed as a function of both Mg2+ and NTP concentration, which can be explained in terms of an NTP-independent effect on the ribosome complex and an NTP-Mg2+ titration effect. The non-trivial compensatory effects of abundance of different vital resources signal the presence of complex regulatory mechanisms to achieve optimal gene expression.

8.
BMC Genomics ; 16: 982, 2015 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-26589460

RESUMO

BACKGROUND: Circular chromosome conformation capture (4C) has provided important insights into three dimensional (3D) genome organization and its critical impact on the regulation of gene expression. We developed a new quantitative framework based on polymer physics for the analysis of paired-end sequencing 4C (PE-4Cseq) data. We applied this strategy to the study of chromatin interaction changes upon a 4.3 Mb DNA deletion in mouse region 4E2. RESULTS: A significant number of differentially interacting regions (DIRs) and chromatin compaction changes were detected in the deletion chromosome compared to a wild-type (WT) control. Selected DIRs were validated by 3D DNA FISH experiments, demonstrating the robustness of our pipeline. Interestingly, significant overlaps of DIRs with CTCF/Smc1 binding sites and differentially expressed genes were observed. CONCLUSIONS: Altogether, our PE-4Cseq analysis pipeline provides a comprehensive characterization of DNA deletion effects on chromatin structure and function.


Assuntos
Cromatina/genética , Cromatina/metabolismo , Biologia Computacional , Deleção de Sequência , Alelos , Animais , Cromossomos de Mamíferos , Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Expressão Gênica , Genômica/métodos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Hibridização in Situ Fluorescente , Camundongos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes
9.
Phys Rev Lett ; 115(4): 048101, 2015 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-26252709

RESUMO

The dynamics of proteins in the unfolded state can be quantified in computer simulations by calculating a spectrum of relaxation times which describes the time scales over which the population fluctuations decay to equilibrium. If the unfolded state space is discretized, we can evaluate the relaxation time of each state. We derive a simple relation that shows the mean first passage time to any state is equal to the relaxation time of that state divided by the equilibrium population. This explains why mean first passage times from state to state within the unfolded ensemble can be very long but the energy landscape can still be smooth (minimally frustrated). In fact, when the folding kinetics is two-state, all of the unfolded state relaxation times within the unfolded free energy basin are faster than the folding time. This result supports the well-established funnel energy landscape picture and resolves an apparent contradiction between this model and the recently proposed kinetic hub model of protein folding. We validate these concepts by analyzing a Markov state model of the kinetics in the unfolded state and folding of the miniprotein NTL9 (where NTL9 is the N-terminal domain of the ribosomal protein L9), constructed from a 2.9 ms simulation provided by D. E. Shaw Research.


Assuntos
Modelos Químicos , Proteínas/química , Proteínas de Arabidopsis/química , Cinética , Cadeias de Markov , Dobramento de Proteína , Termodinâmica , Fatores de Transcrição/química
10.
PLoS One ; 9(12): e113516, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25536038

RESUMO

In addition to gene network switches, local epigenetic modifications to DNA and histones play an important role in all-or-none cellular decision-making. Here, we study the dynamical design of a well-characterized epigenetic chromatin switch: the yeast SIR system, in order to understand the origin of the stability of epigenetic states. We study hysteresis in this system by perturbing it with a histone deacetylase inhibitor. We find that SIR silencing has many characteristics of a non-linear bistable system, as observed in conventional genetic switches, which are based on activities of a few promoters affecting each other through the abundance of their gene products. Quite remarkably, our experiments in yeast telomeric silencing show a very distinctive pattern when it comes to the transition from bistability to monostability. In particular, the loss of the stable silenced state, upon increasing the inhibitor concentration, does not seem to show the expected saddle node behavior, instead looking like a supercritical pitchfork bifurcation. In other words, the 'off' state merges with the 'on' state at a threshold concentration leading to a single state, as opposed to the two states remaining distinct up to the threshold and exhibiting a discontinuous jump from the 'off' to the 'on' state. We argue that this is an inevitable consequence of silenced and active regions coexisting with dynamic domain boundaries. The experimental observations in our study therefore have broad implications for the understanding of chromatin silencing in yeast and beyond.


Assuntos
Cromatina/metabolismo , Epigênese Genética , Inativação Gênica , Saccharomyces cerevisiae/genética , Telômero/genética , Regulação Fúngica da Expressão Gênica , Modelos Genéticos , Proteínas de Saccharomyces cerevisiae/genética , Sirtuína 2/genética
11.
PLoS Comput Biol ; 9(7): e1003121, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23874171

RESUMO

We introduce and analyze a minimal model of epigenetic silencing in budding yeast, built upon known biomolecular interactions in the system. Doing so, we identify the epigenetic marks essential for the bistability of epigenetic states. The model explicitly incorporates two key chromatin marks, namely H4K16 acetylation and H3K79 methylation, and explores whether the presence of multiple marks lead to a qualitatively different systems behavior. We find that having both modifications is important for the robustness of epigenetic silencing. Besides the silenced and transcriptionally active fate of chromatin, our model leads to a novel state with bivalent (i.e., both active and silencing) marks under certain perturbations (knock-out mutations, inhibition or enhancement of enzymatic activity). The bivalent state appears under several perturbations and is shown to result in patchy silencing. We also show that the titration effect, owing to a limited supply of silencing proteins, can result in counter-intuitive responses. The design principles of the silencing system is systematically investigated and disparate experimental observations are assessed within a single theoretical framework. Specifically, we discuss the behavior of Sir protein recruitment, spreading and stability of silenced regions in commonly-studied mutants (e.g., sas2[Formula: see text], dot1[Formula: see text]) illuminating the controversial role of Dot1 in the systems biology of yeast silencing.


Assuntos
Cromatina/genética , Epigênese Genética , Inativação Gênica , Acetilação , Metilação de DNA , Histonas/metabolismo , Sirtuínas/genética , Sirtuínas/metabolismo
12.
Phys Biol ; 10(3): 036005, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23588040

RESUMO

Epigenetic mechanisms of silencing via heritable chromatin modifications play a major role in gene regulation and cell fate specification. We consider a model of epigenetic chromatin silencing in budding yeast and study the bifurcation diagram and characterize the bistable and the monostable regimes. The main focus of this paper is to examine how the perturbations altering the activity of histone modifying enzymes affect the epigenetic states. We analyze the implications of having the total number of silencing proteins, given by the sum of proteins bound to the nucleosomes and the ones available in the ambient, to be constant. This constraint couples different regions of chromatin through the shared reservoir of ambient silencing proteins. We show that the response of the system to perturbations depends dramatically on the titration effect caused by the above constraint. In particular, for a certain range of overall abundance of silencing proteins, the hysteresis loop changes qualitatively with certain jump replaced by continuous merger of different states. In addition, we find a nonmonotonic dependence of gene expression on the rate of histone deacetylation activity of Sir2. We discuss how these qualitative predictions of our model could be compared with experimental studies of the yeast system under anti-silencing drugs.


Assuntos
Cromatina/genética , Regulação Fúngica da Expressão Gênica , Inativação Gênica , Modelos Genéticos , Saccharomycetales/genética , Acetilação , Cromatina/metabolismo , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Saccharomycetales/metabolismo , Sirtuína 2/genética , Sirtuína 2/metabolismo , Processos Estocásticos
13.
J Comput Biol ; 19(10): 1162-75, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23057825

RESUMO

Scaffolding is an important subproblem in de novo genome assembly, in which mate pair data are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ, a set of simple linear inequalities derived from the geometry of contigs on the line that can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex mammalian genomes and compared the results to the majority voting procedure used by many other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs with high accuracy in all cases and gave more accurate position predictions than majority voting for complex genomes, in particular the human genome. Finally, we present a simple scaffolding algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting both contig positions and orientations for real data sets.


Assuntos
Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Análise de Sequência de DNA/métodos , Humanos
14.
J Biol Chem ; 287(24): 20248-57, 2012 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-22518845

RESUMO

Action across long distances on chromatin is a hallmark of eukaryotic transcriptional regulation. Although chromatin structure per se can support long-range interactions, the mechanisms of efficient communication between widely spaced DNA modules in chromatin remain a mystery. The molecular simulations described herein suggest that transient binary internucleosomal interactions can mediate distant communication in chromatin. Electrostatic interactions between the N-terminal tails of the core histones and DNA enhance the computed probability of juxtaposition of sites that lie far apart along the DNA sequence. Experimental analysis of the rates of communication in chromatin constructs confirms that long-distance communication occurs efficiently and independently of distance on tail-containing, but not on tailless, chromatin. Taken together, our data suggest that internucleosomal interactions involving the histone tails are essential for highly efficient, long-range communication between regulatory elements and their targets in eukaryotic genomes.


Assuntos
DNA/química , Modelos Moleculares , Nucleossomos/química , DNA/metabolismo , Eucariotos/química , Eucariotos/metabolismo , Histonas , Nucleossomos/metabolismo , Eletricidade Estática
15.
Proc Natl Acad Sci U S A ; 108(50): 19919-24, 2011 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-22123989

RESUMO

Long-distance regulatory interactions between enhancers and their target genes are commonplace in higher eukaryotes. Interposed boundaries or insulators are able to block these long-distance regulatory interactions. The mechanistic basis for insulator activity and how it relates to enhancer action-at-a-distance remains unclear. Here we explore the idea that topological loops could simultaneously account for regulatory interactions of distal enhancers and the insulating activity of boundary elements. We show that while loop formation is not in itself sufficient to explain action at a distance, incorporating transient nonspecific and moderate attractive interactions between the chromatin fibers strongly enhances long-distance regulatory interactions and is sufficient to generate a euchromatin-like state. Under these same conditions, the subdivision of the loop into two topologically independent loops by insulators inhibits interdomain interactions. The underlying cause of this effect is a suppression of crossings in the contact map at intermediate distances. Thus our model simultaneously accounts for regulatory interactions at a distance and the insulator activity of boundary elements. This unified model of the regulatory roles of chromatin loops makes several testable predictions that could be confronted with in vitro experiments, as well as genomic chromatin conformation capture and fluorescent microscopic approaches.


Assuntos
Cromatina/metabolismo , Elementos Facilitadores Genéticos , Elementos Isolantes/genética , Modelos Biológicos , Modelos Moleculares , Fatores de Tempo
16.
Mol Cell Biol ; 31(8): 1701-9, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21300780

RESUMO

Transcriptome profiling studies have recently uncovered a large number of noncoding RNA transcripts (ncRNAs) in eukaryotic organisms, and there is growing interest in their role in the cell. For example, in haploid Saccharomyces cerevisiae cells, the expression of an overlapping antisense ncRNA, referred to here as RME2 (Regulator of Meiosis 2), prevents IME4 expression. In diploid cells, the a1-α2 complex represses the transcription of RME2, allowing IME4 to be induced during meiosis. In this study we show that antisense transcription across the IME4 promoter region does not block transcription factors from binding and is not required for repression. Mutational analyses found that sequences within the IME4 open reading frame (ORF) are required for the repression mediated by RME2 transcription. These results support a model where transcription of RME2 blocks the elongation of the full-length IME4 transcript but not its initiation. We have found that another antisense transcript, called RME3, represses ZIP2 in a cell-type-specific manner. These results suggest that regulated antisense transcription may be a widespread mechanism for the control of gene expression and may account for the roles of some of the previously uncharacterized ncRNAs in yeast.


Assuntos
DNA Antissenso/genética , Regulação Fúngica da Expressão Gênica , Saccharomyces cerevisiae/genética , Transcrição Gênica , Fases de Leitura Aberta , Regiões Promotoras Genéticas , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
17.
J Stat Phys ; 142(6): 1187-1205, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22851788

RESUMO

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

18.
Biosystems ; 102(1): 49-54, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20655355

RESUMO

Current biological models of epigenetic switches built on chromatin modifications lead to strong constraints on the repertoire of dynamic behaviors for the system. We use the structure of the bifurcation diagram of the underlying dynamical system to explain the existing single cell data in silencing by the SIR system in yeast.


Assuntos
Cromatina/genética , Epigênese Genética , Inativação Gênica , Modelos Teóricos , Saccharomyces cerevisiae/genética
19.
BMC Bioinformatics ; 11: 345, 2010 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-20576136

RESUMO

BACKGROUND: High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. RESULTS: We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. CONCLUSIONS: Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.


Assuntos
Algoritmos , Análise de Sequência de DNA/métodos , Bactérias/genética , Mapeamento de Sequências Contíguas , Genoma Bacteriano
20.
BMC Bioinformatics ; 10: 208, 2009 Jul 07.
Artigo em Inglês | MEDLINE | ID: mdl-19583839

RESUMO

BACKGROUND: DNA sequence binding motifs for several important transcription factors happen to be self-overlapping. Many of the current regulatory site identification methods do not explicitly take into account the overlapping sites. Moreover, most methods use arbitrary thresholds and fail to provide a biophysical interpretation of statistical quantities. In addition, commonly used approaches do not include the location of a site with respect to the transcription start site (TSS) in an integrated probabilistic framework while identifying sites. Ignoring these features can lead to inaccurate predictions as well as incorrect design and interpretation of experimental results. RESULTS: We have developed a tool based on a Hidden Markov Model (HMM) that identifies binding location of transcription factors with preference for self-overlapping DNA motifs by combining the effects of their alternative binding modes. Interpreting HMM parameters as biophysical quantities, this method uses the occupancy probability of a transcription factor on a DNA sequence as the discriminant function, earning the algorithm the name OHMM: Occupancy via Hidden Markov Model. OHMM learns the classification threshold by training emission probabilities using unaligned sequences containing known sites and estimating transition probabilities to reflect site density in all promoters in a genome. While identifying sites, it adjusts parameters to model site density changing with the distance from the transcription start site. Moreover, it provides guidance for designing padding sequences in gel shift experiments. In the context of binding sites to transcription factor NF-kappaB, we find that the occupancy probability predicted by OHMM correlates well with the binding affinity in gel shift experiments. High evolutionary conservation scores and enrichment in experimentally verified regulated genes suggest that NF-kappaB binding sites predicted by our method are likely to be functional. CONCLUSION: Our method deals specifically with identifying locations with multiple overlapping binding sites by computing the local occupancy of the transcription factor. Moreover, considering OHMM as a biophysical model allows us to learn the classification threshold in a principled manner. Another feature of OHMM is that we allow transition probabilities to change with location relative to the TSS. OHMM could be used to predict physical occupancy, and provides guidance for proper design of gel-shift experiments. Based upon our predictions, new insights into NF-kappaB function and regulation and possible new biological roles of NF-kappaB were uncovered.


Assuntos
Biologia Computacional/métodos , Cadeias de Markov , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Algoritmos , Sequência de Bases , Sítios de Ligação , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA