RESUMEN
BACKGROUND: Genomes are inherently inhomogeneous, with features such as base composition, recombination, gene density, and gene expression varying along chromosomes. Evolutionary, biological, and biomedical analyses aim to quantify this variation, account for it during inference procedures, and ultimately determine the causal processes behind it. Since sequential observations along chromosomes are not independent, it is unsurprising that autocorrelation patterns have been observed e.g., in human base composition. In this article, we develop a class of Hidden Markov Models (HMMs) called oHMMed (ordered HMM with emission densities, the corresponding R package of the same name is available on CRAN): They identify the number of comparably homogeneous regions within autocorrelated observed sequences. These are modelled as discrete hidden states; the observed data points are realisations of continuous probability distributions with state-specific means that enable ordering of these distributions. The observed sequence is labelled according to the hidden states, permitting only neighbouring states that are also neighbours within the ordering of their associated distributions. The parameters that characterise these state-specific distributions are inferred. RESULTS: We apply our oHMMed algorithms to the proportion of G and C bases (modelled as a mixture of normal distributions) and the number of genes (modelled as a mixture of poisson-gamma distributions) in windows along the human, mouse, and fruit fly genomes. This results in a partitioning of the genomes into regions by statistically distinguishable averages of these features, and in a characterisation of their continuous patterns of variation. In regard to the genomic G and C proportion, this latter result distinguishes oHMMed from segmentation algorithms based in isochore or compositional domain theory. We further use oHMMed to conduct a detailed analysis of variation of chromatin accessibility (ATAC-seq) and epigenetic markers H3K27ac and H3K27me3 (modelled as a mixture of poisson-gamma distributions) along the human chromosome 1 and their correlations. CONCLUSIONS: Our algorithms provide a biologically assumption free approach to characterising genomic landscapes shaped by continuous, autocorrelated patterns of variation. Despite this, the resulting genome segmentation enables extraction of compositionally distinct regions for further downstream analyses.
Asunto(s)
Genoma , Genómica , Animales , Humanos , Ratones , Cadenas de Markov , Composición de Base , Probabilidad , AlgoritmosRESUMEN
Short-term load forecasting is viewed as one promising technology for demand prediction under the most critical inputs for the promising arrangement of power plant units. Thus, it is imperative to present new incentive methods to motivate such power system operations for electricity management. This paper proposes an approach for short-term electric load forecasting using long short-term memory networks and an improved sine cosine algorithm called MetaREC. First, using long short-term memory networks for a special kind of recurrent neural network, the dispatching commands have the characteristics of storing and transmitting both long-term and short-term memories. Next, four important parameters are determined using the sine cosine algorithm base on a logistic chaos operator and multilevel modulation factor to overcome the inaccuracy of long short-term memory networks prediction, in terms of the manual selection of parameter values. Moreover, the performance of the MetaREC method outperforms others with regard to convergence accuracy and convergence speed on a variety of test functions. Finally, our analysis is extended to the scenario of the MetaREC_long short-term memory with back propagation neural network, long short-term memory networks with default parameters, long short-term memory networks with the conventional sine-cosine algorithm, and long short-term memory networks with whale optimization for power load forecasting on a real electric load dataset. Simulation results demonstrate that the multiple forecasts with MetaREC_long short-term memory can effectively incentivize the high accuracy and stability for short-term power load forecasting.