Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
J Neurosci ; 40(1): 171-190, 2020 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-31694962

RESUMO

Origin and functions of intermittent transitions among sleep stages, including brief awakenings and arousals, constitute a challenge to the current homeostatic framework for sleep regulation, focusing on factors modulating sleep over large time scales. Here we propose that the complex micro-architecture characterizing sleep on scales of seconds and minutes results from intrinsic non-equilibrium critical dynamics. We investigate θ- and δ-wave dynamics in control rats and in rats where the sleep-promoting ventrolateral preoptic nucleus (VLPO) is lesioned (male Sprague-Dawley rats). We demonstrate that bursts in θ and δ cortical rhythms exhibit complex temporal organization, with long-range correlations and robust duality of power-law (θ-bursts, active phase) and exponential-like (δ-bursts, quiescent phase) duration distributions, features typical of non-equilibrium systems self-organizing at criticality. We show that such non-equilibrium behavior relates to anti-correlated coupling between θ- and δ-bursts, persists across a range of time scales, and is independent of the dominant physiologic state; indications of a basic principle in sleep regulation. Further, we find that VLPO lesions lead to a modulation of cortical dynamics resulting in altered dynamical parameters of θ- and δ-bursts and significant reduction in θ-δ coupling. Our empirical findings and model simulations demonstrate that θ-δ coupling is essential for the emerging non-equilibrium critical dynamics observed across the sleep-wake cycle, and indicate that VLPO neurons may have dual role for both sleep and arousal/brief wake activation. The uncovered critical behavior in sleep- and wake-related cortical rhythms indicates a mechanism essential for the micro-architecture of spontaneous sleep-stage and arousal transitions within a novel, non-homeostatic paradigm of sleep regulation.SIGNIFICANCE STATEMENT We show that the complex micro-architecture of sleep-stage/arousal transitions arises from intrinsic non-equilibrium critical dynamics, connecting the temporal organization of dominant cortical rhythms with empirical observations across scales. We link such behavior to sleep-promoting neuronal population, and demonstrate that VLPO lesion (model of insomnia) alters dynamical features of θ and δ rhythms, and leads to significant reduction in θ-δ coupling. This indicates that VLPO neurons may have dual role for both sleep and arousal/brief wake control. The reported empirical findings and modeling simulations constitute first evidences of a neurophysiological fingerprint of self-organization and criticality in sleep- and wake-related cortical rhythms; a mechanism essential for spontaneous sleep-stage and arousal transitions that lays the bases for a novel, non-homeostatic paradigm of sleep regulation.


Assuntos
Sono/fisiologia , Vigília/fisiologia , Animais , Ritmo Delta , Eletroencefalografia , Masculino , Área Pré-Óptica/lesões , Área Pré-Óptica/fisiologia , Ratos , Ratos Sprague-Dawley , Fases do Sono/fisiologia , Organismos Livres de Patógenos Específicos , Ritmo Teta
2.
Entropy (Basel) ; 24(1)2021 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-35052087

RESUMO

Detrended Fluctuation Analysis (DFA) has become a standard method to quantify the correlations and scaling properties of real-world complex time series. For a given scale ℓ of observation, DFA provides the function F(ℓ), which quantifies the fluctuations of the time series around the local trend, which is substracted (detrended). If the time series exhibits scaling properties, then F(ℓ)∼ℓα asymptotically, and the scaling exponent α is typically estimated as the slope of a linear fitting in the logF(ℓ) vs. log(ℓ) plot. In this way, α measures the strength of the correlations and characterizes the underlying dynamical system. However, in many cases, and especially in a physiological time series, the scaling behavior is different at short and long scales, resulting in logF(ℓ) vs. log(ℓ) plots with two different slopes, α1 at short scales and α2 at large scales of observation. These two exponents are usually associated with the existence of different mechanisms that work at distinct time scales acting on the underlying dynamical system. Here, however, and since the power-law behavior of F(ℓ) is asymptotic, we question the use of α1 to characterize the correlations at short scales. To this end, we show first that, even for artificial time series with perfect scaling, i.e., with a single exponent α valid for all scales, DFA provides an α1 value that systematically overestimates the true exponent α. In addition, second, when artificial time series with two different scaling exponents at short and large scales are considered, the α1 value provided by DFA not only can severely underestimate or overestimate the true short-scale exponent, but also depends on the value of the large scale exponent. This behavior should prevent the use of α1 to describe the scaling properties at short scales: if DFA is used in two time series with the same scaling behavior at short scales but very different scaling properties at large scales, very different values of α1 will be obtained, although the short scale properties are identical. These artifacts may lead to wrong interpretations when analyzing real-world time series: on the one hand, for time series with truly perfect scaling, the spurious value of α1 could lead to wrongly thinking that there exists some specific mechanism acting only at short time scales in the dynamical system. On the other hand, for time series with true different scaling at short and large scales, the incorrect α1 value would not characterize properly the short scale behavior of the dynamical system.

3.
Chaos ; 30(8): 083140, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32872793

RESUMO

The observable outputs of many complex dynamical systems consist of time series exhibiting autocorrelation functions of great diversity of behaviors, including long-range power-law autocorrelation functions, as a signature of interactions operating at many temporal or spatial scales. Often, numerical algorithms able to generate correlated noises reproducing the properties of real time series are used to study and characterize such systems. Typically, many of those algorithms produce a Gaussian time series. However, the real, experimentally observed time series are often non-Gaussian and may follow distributions with a diversity of behaviors concerning the support, the symmetry, or the tail properties. It is always possible to transform a correlated Gaussian time series into a time series with a different marginal distribution, but the question is how this transformation affects the behavior of the autocorrelation function. Here, we study analytically and numerically how the Pearson's correlation of two Gaussian variables changes when the variables are transformed to follow a different destination distribution. Specifically, we consider bounded and unbounded distributions, symmetric and non-symmetric distributions, and distributions with different tail properties from decays faster than exponential to heavy-tail cases including power laws, and we find how these properties affect the correlation of the final variables. We extend these results to a Gaussian time series, which are transformed to have a different marginal distribution, and show how the autocorrelation function of the final non-Gaussian time series depends on the Gaussian correlations and on the final marginal distribution. As an application of our results, we propose how to generalize standard algorithms producing a Gaussian power-law correlated time series in order to create a synthetic time series with an arbitrary distribution and controlled power-law correlations. Finally, we show a practical example of this algorithm by generating time series mimicking the marginal distribution and the power-law tail of the autocorrelation function of real time series: the absolute returns of stock prices.

4.
Nucleic Acids Res ; 45(D1): D97-D103, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27794041

RESUMO

The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB.


Assuntos
Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Animais , Citosina/metabolismo , Genoma , Humanos
5.
Chaos ; 29(12): 123114, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31893647

RESUMO

Despite the widespread diffusion of nonlinear methods for heart rate variability (HRV) analysis, the presence and the extent to which nonlinear dynamics contribute to short-term HRV are still controversial. This work aims at testing the hypothesis that different types of nonlinearity can be observed in HRV depending on the method adopted and on the physiopathological state. Two entropy-based measures of time series complexity (normalized complexity index, NCI) and regularity (information storage, IS), and a measure quantifying deviations from linear correlations in a time series (Gaussian linear contrast, GLC), are applied to short HRV recordings obtained in young (Y) and old (O) healthy subjects and in myocardial infarction (MI) patients monitored in the resting supine position and in the upright position reached through head-up tilt. The method of surrogate data is employed to detect the presence and quantify the contribution of nonlinear dynamics to HRV. We find that the three measures differ both in their variations across groups and conditions and in the percentage and strength of nonlinear HRV dynamics. NCI and IS displayed opposite variations, suggesting more complex dynamics in O and MI compared to Y and less complex dynamics during tilt. The strength of nonlinear dynamics is reduced by tilt using all measures in Y, while only GLC detects a significant strengthening of such dynamics in MI. A large percentage of detected nonlinear dynamics is revealed only by the IS measure in the Y group at rest, with a decrease in O and MI and during T, while NCI and GLC detect lower percentages in all groups and conditions. While these results suggest that distinct dynamic structures may lie beneath short-term HRV in different physiological states and pathological conditions, the strong dependence on the measure adopted and on their implementation suggests that physiological interpretations should be provided with caution.


Assuntos
Frequência Cardíaca/fisiologia , Dinâmica não Linear , Adulto , Entropia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fatores de Tempo
6.
Biology (Basel) ; 12(6)2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37372134

RESUMO

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

7.
J Theor Biol ; 297: 127-36, 2012 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-22226985

RESUMO

Relevant words in literary texts (key words) are known to be clustered, while common words are randomly distributed. Given the clustered distribution of many functional genome elements, we hypothesize that the biological text per excellence, the DNA sequence, might behave in the same way: k-length words (k-mers) with a clear function may be spatially clustered along the one-dimensional chromosome sequence, while less-important, non-functional words may be randomly distributed. To explore this linguistic analogy, we calculate a clustering coefficient for each k-mer (k=2-9bp) in human and mouse chromosome sequences, then checking if clustered words are enriched in the functional part of the genome. First, we found a positive general trend relating clustering level and word enrichment within exons and Transcription Factor Binding Sites (TFBSs), while a much weaker relation exists for repeats, and no relation at all exists for introns. Second, we found that 38.45% of the 200 top-clustered 8-mers, but only 7.70% of the non-clustered words, are represented in known motif databases. Third, enrichment/depletion experiments show that highly clustered words are significantly enriched in exons and TFBSs, while they are depleted in introns and repetitive DNA. Considering exons and TFBSs together, 1417 (or 72.26%) in human and 1385 (or 72.97%) in mouse of the top-clustered 8-mers showed a statistically significant association to either exons or TFBSs, thus strongly supporting the link between word clustering and biological function. Lastly, we identified a subset of clustered, diagnostic words that are enriched in exons but depleted in introns, and therefore might help to discriminate between these two gene regions. The clustering of DNA words thus appears as a novel principle to detect functionality in genome sequences. As evolutionary conservation is not a prerequisite, the proof of principle described here may open new ways to detect species-specific functional DNA sequences and the improvement of gene and promoter predictions, thus contributing to the quest for function in the genome.


Assuntos
DNA/genética , Modelos Genéticos , Algoritmos , Animais , Sequência de Bases , Sítios de Ligação/genética , Análise por Conglomerados , Éxons/genética , Humanos , Íntrons/genética , Linguística , Camundongos , Especificidade da Espécie , Fatores de Transcrição/genética
8.
Physica A ; 390(23-24): 4057-4072, 2011 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-25392599

RESUMO

We investigate how various coarse-graining (signal quantization) methods affect the scaling properties of long-range power-law correlated and anti-correlated signals, quantified by the detrended fluctuation analysis. Specifically, for coarse-graining in the magnitude of a signal, we consider (i) the Floor, (ii) the Symmetry and (iii) the Centro-Symmetry coarse-graining methods. We find that for anti-correlated signals coarse-graining in the magnitude leads to a crossover to random behavior at large scales, and that with increasing the width of the coarse-graining partition interval Δ, this crossover moves to intermediate and small scales. In contrast, the scaling of positively correlated signals is less affected by the coarse-graining, with no observable changes when Δ < 1, while for Δ > 1 a crossover appears at small scales and moves to intermediate and large scales with increasing Δ. For very rough coarse-graining (Δ > 3) based on the Floor and Symmetry methods, the position of the crossover stabilizes, in contrast to the Centro-Symmetry method where the crossover continuously moves across scales and leads to a random behavior at all scales; thus indicating a much stronger effect of the Centro-Symmetry compared to the Floor and the Symmetry method. For coarse-graining in time, where data points are averaged in non-overlapping time windows, we find that the scaling for both anti-correlated and positively correlated signals is practically preserved. The results of our simulations are useful for the correct interpretation of the correlation and scaling properties of symbolic sequences.

9.
Sci Rep ; 10(1): 19073, 2020 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-33149190

RESUMO

Progressive evolution, or the tendency towards increasing complexity, is a controversial issue in biology, which resolution entails a proper measurement of complexity. Genomes are the best entities to address this challenge, as they encode the historical information of a species' biotic and environmental interactions. As a case study, we have measured genome sequence complexity in the ancient phylum Cyanobacteria. To arrive at an appropriate measure of genome sequence complexity, we have chosen metrics that do not decipher biological functionality but that show strong phylogenetic signal. Using a ridge regression of those metrics against root-to-tip distance, we detected positive trends towards higher complexity in three of them. Lastly, we applied three standard tests to detect if progressive evolution is passive or driven-the minimum, ancestor-descendant, and sub-clade tests. These results provide evidence for driven progressive evolution at the genome-level in the phylum Cyanobacteria.


Assuntos
Cianobactérias/genética , Evolução Molecular , Genoma Bacteriano , Cianobactérias/classificação , Filogenia
10.
BMC Evol Biol ; 8: 107, 2008 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-18405379

RESUMO

BACKGROUND: The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness) has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. RESULTS: The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris), birds (Gallus gallus), fishes (Danio rerio), invertebrates (Drosophila melanogaster and Caenorhabditis elegans), plants (Arabidopsis thaliana) and yeasts (Saccharomyces cerevisiae). We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. CONCLUSION: Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.


Assuntos
Genoma/genética , Isocoros/genética , Filogenia , Animais , Arabidopsis/genética , Biologia Computacional , Cães , Genoma Fúngico/genética , Genoma Humano/genética , Genoma de Planta/genética , Humanos , Camundongos , Pan troglodytes/genética , Ratos , Saccharomyces cerevisiae/genética , Análise de Sequência de DNA , Especificidade da Espécie
11.
Physiol Meas ; 39(8): 084008, 2018 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-30091423

RESUMO

OBJECTIVE: In this work we want to analyze differences in nonlinear properties between rest and exercise and also to study the permanent effects of physical exercise on heart rate dynamics. APPROACH: It has been shown that physical exercise alters heart dynamics by increasing heart rate and decreasing variability, modifying spectral power and linear correlations, etc. We hypothesize that physical exercise should also reduce nonlinearity in the heartbeat time series. To quantify nonlinearity in the heartbeat time series, we use an index of nonlinearity recently proposed by Bernaola et al based on correlations of the magnitude time series. MAIN RESULTS: Our results confirm our initial hypothesis of loss of nonlinearity during physical exercise. Moreover, regarding the permanent effects of physical exercise on heart rate dynamics, we also obtain that aerobic physical training tends to increase nonlinearity in heart dynamics during rest. SIGNIFICANCE: It is well-known that heart dynamics are controlled by complex interactions between the sympathetic and parasympathetic branches of the autonomic nervous system. Moreover, these two branches act in a competing way, resulting in a clear parasympathetic withdrawal and sympathetic activation during physical exercise. We associate these interactions during physical exercise with a drastic loss of nonlinear properties in the heartbeat time series, revealing the importance of nonlinearity measures in the study of complex systems.


Assuntos
Exercício Físico/fisiologia , Coração/fisiologia , Dinâmica não Linear , Descanso/fisiologia , Adulto , Frequência Cardíaca , Humanos , Masculino
12.
Phys Rev E ; 96(3-1): 032218, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29347013

RESUMO

The correlation properties of the magnitude of a time series are associated with nonlinear and multifractal properties and have been applied in a great variety of fields. Here we have obtained the analytical expression of the autocorrelation of the magnitude series (C_{|x|}) of a linear Gaussian noise as a function of its autocorrelation (C_{x}). For both, models and natural signals, the deviation of C_{|x|} from its expectation in linear Gaussian noises can be used as an index of nonlinearity that can be applied to relatively short records and does not require the presence of scaling in the time series under study. In a model of artificial Gaussian multifractal signal we use this approach to analyze the relation between nonlinearity and multifractallity and show that the former implies the latter but the reverse is not true. We also apply this approach to analyze experimental data: heart-beat records during rest and moderate exercise. For each individual subject, we observe higher nonlinearities during rest. This behavior is also achieved on average for the analyzed set of 10 semiprofessional soccer players. This result agrees with the fact that other measures of complexity are dramatically reduced during exercise and can shed light on its relationship with the withdrawal of parasympathetic tone and/or the activation of sympathetic activity during physical activity.


Assuntos
Fractais , Modelos Teóricos , Dinâmica não Linear , Atletas , Frequência Cardíaca , Humanos , Masculino , Descanso/fisiologia , Corrida/fisiologia , Futebol , Fatores de Tempo , Adulto Jovem
13.
Nucleic Acids Res ; 32(Web Server issue): W287-92, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215396

RESUMO

Isochores are long genome segments homogeneous in G+C. Here, we describe an algorithm (IsoFinder) running on the web (http://bioinfo2.ugr.es/IsoF/isofinder.html) able to predict isochores at the sequence level. We move a sliding pointer from left to right along the DNA sequence. At each position of the pointer, we compute the mean G+C values to the left and to the right of the pointer. We then determine the position of the pointer for which the difference between left and right mean values (as measured by the t-statistic) reaches its maximum. Next, we determine the statistical significance of this potential cutting point, after filtering out short-scale heterogeneities below 3 kb by applying a coarse-graining technique. Finally, the program checks whether this significance exceeds a probability threshold. If so, the sequence is cut at this point into two subsequences; otherwise, the sequence remains undivided. The procedure continues recursively for each of the two resulting subsequences created by each cut. This leads to the decomposition of a chromosome sequence into long homogeneous genome regions (LHGRs) with well-defined mean G+C contents, each significantly different from the G+C contents of the adjacent LHGRs. Most LHGRs can be identified with Bernardi's isochores, given their correlation with biological features such as gene density, SINE and LINE (short, long interspersed repetitive elements) densities, recombination rate or single nucleotide polymorphism variability. The resulting isochore maps are available at our web site (http://bioinfo2.ugr.es/isochores/), and also at the UCSC Genome Browser (http://genome.cse.ucsc.edu/).


Assuntos
Biologia Computacional , Genômica , Isocoros/química , Software , Algoritmos , Gráficos por Computador , Internet , Complexo Principal de Histocompatibilidade , Interface Usuário-Computador
14.
Phys Rev E ; 93: 042201, 2016 04.
Artigo em Inglês | MEDLINE | ID: mdl-27176287

RESUMO

We systematically study the scaling properties of the magnitude and sign of the fluctuations in correlated time series, which is a simple and useful approach to distinguish between systems with different dynamical properties but the same linear correlations. First, we decompose artificial long-range power-law linearly correlated time series into magnitude and sign series derived from the consecutive increments in the original series, and we study their correlation properties. We find analytical expressions for the correlation exponent of the sign series as a function of the exponent of the original series. Such expressions are necessary for modeling surrogate time series with desired scaling properties. Next, we study linear and nonlinear correlation properties of series composed as products of independent magnitude and sign series. These surrogate series can be considered as a zero-order approximation to the analysis of the coupling of magnitude and sign in real data, a problem still open in many fields. We find analytical results for the scaling behavior of the composed series as a function of the correlation exponents of the magnitude and sign series used in the composition, and we determine the ranges of magnitude and sign correlation exponents leading to either single scaling or to crossover behaviors. Finally, we obtain how the linear and nonlinear properties of the composed series depend on the correlation exponents of their magnitude and sign series. Based on this information we propose a method to generate surrogate series with controlled correlation exponent and multifractal spectrum.


Assuntos
Modelos Lineares , Dinâmica não Linear , Algoritmos , Análise de Fourier , Fractais , Fatores de Tempo
15.
Phys Rev E ; 94(5-1): 052302, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27967154

RESUMO

Symbolic sequences have been extensively investigated in the past few years within the framework of statistical physics. Paradigmatic examples of such sequences are written texts, and deoxyribonucleic acid (DNA) and protein sequences. In these examples, the spatial distribution of a given symbol (a word, a DNA motif, an amino acid) is a key property usually related to the symbol importance in the sequence: The more uneven and far from random the symbol distribution, the higher the relevance of the symbol to the sequence. Thus, many techniques of analysis measure in some way the deviation of the symbol spatial distribution with respect to the random expectation. The problem is then to know the spatial distribution corresponding to randomness, which is typically considered to be either the geometric or the exponential distribution. However, these distributions are only valid for very large symbolic sequences and for many occurrences of the analyzed symbol. Here, we obtain analytically the exact, randomly expected spatial distribution valid for any sequence length and any symbol frequency, and we study its main properties. The knowledge of the distribution allows us to define a measure able to properly quantify the deviation from randomness of the symbol distribution, especially for short sequences and low symbol frequency. We apply the measure to the problem of keyword detection in written texts and to study amino acid clustering in protein sequences. In texts, we show how the results improve with respect to previous methods when short texts are analyzed. In proteins, which are typically short, we show how the measure quantifies unambiguously the amino acid clustering and characterize its spatial distribution.


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Modelos Teóricos , Probabilidade , Algoritmos , Sequência de Aminoácidos , Análise por Conglomerados , Periodicidade , Proteínas/química , Análise de Sequência
16.
Phys Rev E Stat Nonlin Soft Matter Phys ; 71(1 Pt 1): 011104, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15697577

RESUMO

When investigating the dynamical properties of complex multiple-component physical and physiological systems, it is often the case that the measurable system's output does not directly represent the quantity we want to probe in order to understand the underlying mechanisms. Instead, the output signal is often a linear or nonlinear function of the quantity of interest. Here, we investigate how various linear and nonlinear transformations affect the correlation and scaling properties of a signal, using the detrended fluctuation analysis (DFA) which has been shown to accurately quantify power-law correlations in nonstationary signals. Specifically, we study the effect of three types of transforms: (i) linear ( y(i) =a x(i) +b) , (ii) nonlinear polynomial ( y(i) =a x(k)(i) ) , and (iii) nonlinear logarithmic [ y(i) =log ( x(i) +Delta) ] filters. We compare the correlation and scaling properties of signals before and after the transform. We find that linear filters do not change the correlation properties, while the effect of nonlinear polynomial and logarithmic filters strongly depends on (a) the strength of correlations in the original signal, (b) the power k of the polynomial filter, and (c) the offset Delta in the logarithmic filter. We further apply the DFA method to investigate the "apparent" scaling of three analytic functions: (i) exponential [exp (+/-x+a) ] , (ii) logarithmic [log (x+a) ] , and (iii) power law [ (x+a)(lambda) ] , which are often encountered as trends in physical and biological processes. While these three functions have different characteristics, we find that there is a broad range of values for parameter a common for all three functions, where the slope of the DFA curves is identical. We further note that the DFA results obtained for a class of other analytic functions can be reduced to these three typical cases. We systematically test the performance of the DFA method when estimating long-range power-law correlations in the output signals for different parameter values in the three types of filters and the three analytic functions we consider.


Assuntos
Algoritmos , Modelos Biológicos , Modelos Estatísticos , Dinâmica não Linear , Animais , Simulação por Computador , Humanos , Estatística como Assunto
17.
Gene ; 333: 121-33, 2004 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-15177687

RESUMO

The sequencing of prokaryotic genomes covering a wide taxonomic range has sparked renewed interest in intrachromosomal compositional (GC) heterogeneity, largely in view of lateral transfers. We present here a brief overview of some methods for visualizing and quantifying GC variation in prokaryotes. We used these methods to examine heterogeneity levels in sequenced prokaryotes, for a range of scales or stringencies. Some species are consistently homogeneous, whereas others are markedly heterogeneous in comparison, in particular Aeropyrum pernix, Xylella fastidiosa, Mycoplasma genitalium, Enterococcus faecalis, Bacillus subtilis, Pyrobaculum aerophilum, Vibrio vulnificus chromosome I, Deinococcus radiodurans chromosome II and Halobacterium. As we discuss here, the wide range of heterogeneities calls for reexamination of an accepted belief, namely that the endogenous DNA of bacteria and archaea should typically exhibit low intrachromosomal GC contrasts. Supplementary results for all species analyzed are available at our website: http://bioinfo2.ugr.es/prok.


Assuntos
Composição de Bases/genética , DNA Bacteriano/genética , Genoma Bacteriano , Algoritmos , Pareamento de Bases/genética , Centrifugação com Gradiente de Concentração , Césio , Cloretos , Cromossomos de Archaea/genética , Cromossomos Bacterianos/genética , Códon/genética , DNA Arqueal/química , DNA Arqueal/genética , DNA Bacteriano/química , Genoma Arqueal , Isocoros/genética
18.
Gene ; 300(1-2): 117-27, 2002 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-12468093

RESUMO

The human genome is a mosaic of isochores, which are long DNA segments (z.Gt;300 kbp) relatively homogeneous in G+C. Human isochores were first identified by density-gradient ultracentrifugation of bulk DNA, and differ in important features, e.g. genes are found predominantly in the GC-richest isochores. Here, we use a reliable segmentation method to partition the longest contigs in the human genome draft sequence into long homogeneous genome regions (LHGRs), thereby revealing the isochore structure of the human genome. The advantages of the isochore maps presented here are: (1) sequence heterogeneities at different scales are shown in the same plot; (2) pair-wise compositional differences between adjacent regions are all statistically significant; (3) isochore boundaries are accurately defined to single base pair resolution; and (4) both gradual and abrupt isochore boundaries are simultaneously revealed. Taking advantage of the wide sample of genome sequence analyzed, we investigate the correspondence between LHGRs and true human isochores revealed through DNA centrifugation. LHGRs show many of the typical isochore features, mainly size distribution, G+C range, and proportions of the isochore classes. The relative density of genes, Alu and long interspersed nuclear element repeats and the different types of single nucleotide polymorphisms on LHGRs also coincide with expectations in true isochores. Potential applications of isochore maps range from the improvement of gene-finding algorithms to the prediction of linkage disequilibrium levels in association studies between marker genes and complex traits. The coordinates for the LHGRs identified in all the contigs longer than 2 Mb in the human genome sequence are available at the online resource on isochore mapping: http://bioinfo2.ugr.es/isochores.


Assuntos
Genoma Humano , Isocoros/genética , Elementos Alu/genética , Composição de Bases , Mapeamento Cromossômico , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , DNA/química , DNA/genética , Genes/genética , Humanos , Elementos Nucleotídeos Longos e Dispersos/genética , Polimorfismo de Nucleotídeo Único/genética
19.
Comput Biol Chem ; 27(1): 5-10, 2003 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-12798034

RESUMO

The isochore concept in the human genome sequence was challenged in an analysis by the International Human Genome Sequencing Consortium (IHGSC). We argue here that a statement in the IHGSC's analysis concerning the existence of isochores is misleading, because the homogeneity was not examined at a large enough length scale and consequently an inappropriate statistical test was applied. A test of the existence of isochores should be equivalent to a test of homogeneity or equality of windowed GC%. The statistical test applied in the IHGSC's analysis, the binomial test, is a test of whether individual bases are independent and identically-distributed (iid). For testing the existence of isochores, or homogeneity in windowed GC%, we propose to use another statistical test: the analysis of variance (ANOVA). It can be shown that DNA sequences that are rejected by the binomial test may not be rejected by the ANOVA test.


Assuntos
Isocoros/química , Análise de Variância , Composição de Bases , Distribuição Binomial , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Ilhas de CpG , Sequência Rica em GC , Genoma Humano , Humanos , Modelos Estatísticos , Análise de Sequência de DNA/estatística & dados numéricos
20.
Phys Rev E Stat Nonlin Soft Matter Phys ; 65(5 Pt 1): 051909, 2002 May.
Artigo em Inglês | MEDLINE | ID: mdl-12059595

RESUMO

Genomic DNA is fragmented into segments using the Jensen-Shannon divergence. Use of this criterion results in the fragments being entropically homogeneous to within a predefined level of statistical significance. Application of this procedure is made to complete genomes of organisms from archaebacteria, eubacteria, and eukaryotes. The distribution of fragment lengths in bacterial and primitive eukaryotic DNAs shows two distinct regimes of power-law scaling. The characteristic length separating these two regimes appears to be an intrinsic property of the sequence rather than a finite-size artifact, and is independent of the significance level used in segmenting a given genome. Fragment length distributions obtained in the segmentation of the genomes of more highly evolved eukaryotes do not have such distinct regimes of power-law behavior.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA