RESUMO
Primary microRNAs (miRNAs) are the precursors of miRNAs that modulate the expression of most mRNAs in humans. They fold up into a hairpin structure that is cleaved at its base by an enzyme complex known as the Microprocessor (Drosha/DGCR8). While many of the molecular details are known, a complete understanding of what features distinguish primary miRNA from hairpin structures in other transcripts is still lacking. We develop a massively parallel functional assay termed Dro-seq (Drosha sequencing) that enables testing of hundreds of known primary miRNA substrates and thousands of single-nucleotide variants. We find an additional feature of primary miRNAs, called Shannon entropy, describing the structural ensemble important for processing. In a deep mutagenesis experiment, we observe particular apical loop U bases, likely recognized by DGCR8, are important for efficient processing. These findings build on existing knowledge about primary miRNA maturation by the Microprocessor and further explore the substrate RNA sequence-structure relationship.
Assuntos
MicroRNAs , Complexos Multiproteicos , Conformação de Ácido Nucleico , Processamento Pós-Transcricional do RNA , Proteínas de Ligação a RNA , Ribonuclease III , Animais , Humanos , MicroRNAs/química , MicroRNAs/genética , MicroRNAs/metabolismo , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Proteínas de Ligação a RNA/química , Proteínas de Ligação a RNA/metabolismo , Ribonuclease III/química , Ribonuclease III/metabolismo , Células Sf9 , SpodopteraRESUMO
Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).
Assuntos
Motivos de Nucleotídeos , Software , Fatores de Transcrição/metabolismo , Navegador , Algoritmos , Arabidopsis , Sítios de Ligação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Matrizes de Pontuação de Posição Específica , Análise de Sequência de DNARESUMO
INTRODUCTION: Multiple analysis techniques evaluate electrograms during atrial fibrillation (AF), but none have been established to guide catheter ablation. This study compares electrogram properties recorded from multiple right (RA) and left atrial (LA) sites. METHODS: Multisite LA/RA mapping (281 ± 176/239 ± 166 sites/patient) was performed in 42 patients (30 males, age 63 ± 9 years) undergoing first (n = 32) or redo-AF ablation (n = 10). All electrogram recordings were visually reviewed and artifactual signals were excluded leaving a total of 21 846 for analysis. Electrogram characteristics evaluated were cycle length (CL), amplitude, Shannon's entropy (ShEn), fractionation interval, dominant frequency, organizational index, and cycle length of most recurrent morphology (CLR ) from morphology recurrence plot analysis. RESULTS: Electrogram characteristics were correlated to each other. All pairwise comparisons were significant (p < .001) except for dominant frequency and CLR (p = .59), and amplitude and dominant frequency (p = .38). Only ShEn and fractionation interval demonstrated a strong negative correlation (r = -.94). All other pairwise comparisons were poor to moderately correlated. The relationships are highly conserved among patients, in the RA versus LA, and in those undergoing initial versus redo ablations. Antiarrhythmic drug therapy did not have a significant effect on electrogram characteristics, except minimum ShEn. Electrogram characteristics associated with ablation outcome were shorter minimum CLR , lower minimum ShEn, and longer mimimum CL. There was minimal overlap between the top 10 sites identified by one electrogram characteristic and the top 10 sites identified by the other 10 characteristics. CONCLUSION: Multiple techniques can be employed for electrogram analysis in AF. In this analysis of eight different electrogram characteristics, seven were poorly to moderately correlated and do not identify similar locations. Only some characteristics were predictive of ablation outcome. Further studies to consider electrogram properties, perhaps in combination, for categorizing and/or mapping AF are warranted.
Assuntos
Apêndice Atrial , Fibrilação Atrial , Ablação por Cateter , Masculino , Humanos , Pessoa de Meia-Idade , Idoso , Fibrilação Atrial/diagnóstico , Fibrilação Atrial/cirurgia , Átrios do Coração , Ablação por Cateter/efeitos adversos , Ablação por Cateter/métodosRESUMO
The enhancing risk from human action and multi-hazard interaction has substantially complicated the hazard-society relationship. The underlying vulnerabilities are crucial in predicting the probable impact to be caused by multi-hazards. Thus, the evaluation of social vulnerability is decisive in inferring the driving factor and preparing for mitigation strategies. The Himalayan landscape is prone to multiple hazards as well as possesses a multitude of vulnerabilities owing to changing human landscape. Thus, an attempt has been made to inquire into the underlying socioeconomic factors enhancing the susceptibility of the region to multi-hazards. The social vulnerability index (SVIent) has been introduced, consisting of 13 indicators and 33 variables. The variables have been standardized using the maximum and minimum normalization method and the relative importance for each indicator has been determined using Shannon entropy methods to compute SVIent. The findings revealed that female population, population above 60 years old, net irrigated area, migrant population, dilapidated house, nonworkers, bank, and nonworkers seeking jobs were found to be relatively significant contributors to the vulnerability. The western part of the study area was classified as the highly vulnerable category (SVI > 0.40628), attributed to high dependence, and higher share of unemployed workers and high poverty. The SVIent was shown to have positive correlation between unemployment, socioeconomic status, migration, dependency, and household structure significant at two-tailed test. The study's impact can be found in influencing the decision of policymakers and stakeholders in framing the mitigation strategies and policy documents.
RESUMO
Pilot behavior is crucial for aviation safety. This study aims to investigate the EEG characteristics of pilots, refine training assessment methodologies, and bolster flight safety measures. The collected EEG signals underwent initial preprocessing. The EEG characteristic analysis was performed during left and right turns, involving the calculation of the energy ratio of beta waves and Shannon entropy. The psychological workload of pilots during different flight phases was quantified as well. Based on the EEG characteristics, the pilots' psychological workload was classified through the use of a support vector machine (SVM). The study results showed significant changes in the energy ratio of beta waves and Shannon entropy during left and right turns compared to the cruising phase. Additionally, the pilots' psychological workload was found to have increased during these turning phases. Using support vector machines to detect the pilots' psychological workload, the classification accuracy for the training set was 98.92%, while for the test set, it was 93.67%. This research holds significant importance in understanding pilots' psychological workload.
Assuntos
Eletroencefalografia , Pilotos , Máquina de Vetores de Suporte , Carga de Trabalho , Humanos , Eletroencefalografia/métodos , Pilotos/psicologia , Carga de Trabalho/psicologia , Masculino , Adulto , AviaçãoRESUMO
Early detection of agricultural drought can alert farmers and authorities, enhancing the resilience of the food sector. A framework is proposed for developing a novel regional agricultural drought index (RegCDI) by combining remotely sensed vegetation health, soil moisture and crop water stress via a transparent Shannon's entropy weighting method. The framework consists of the selection of suitable datasets based on their regional performance, the aggregation of selected drought indicators, the validation of the combined index against crop yield, and the testing of predictive capabilities. The creation and performance of RegCDI are demonstrated for the drought prone Indian state of Odisha. MODIS surface reflectance is selected for crop water stress and GLDAS-2 for assessing soil moisture deficits and vegetation health. Three selected indicators (SMCI, TCI, and SIWSI-1) are combined into RegCDI for Odisha. The performance of RegCDI is evaluated (a) against other popular drought indices and (b) by comparing with seasonal crop yields. RegCDI is used to identify drought hotspots based on drought severity, duration, and propensity over the study area. A reforecast evaluation of RegCDI (up to three months ahead) showed that the indicators based on soil moisture deficit and crop water stress could predict drought conditions up to two months ahead with no less than 80% accuracy. This demonstrated the potential of the RegCDI framework and its component indicators for early warning of drought in Odisha.
Assuntos
Agricultura , Produtos Agrícolas , Secas , Monitoramento Ambiental , Tecnologia de Sensoriamento Remoto , Monitoramento Ambiental/métodos , Agricultura/métodos , Índia , Solo/químicaRESUMO
Predicting groundwater level (GWL) fluctuations, which act as a reserve water reservoir, particularly in arid and semi-arid climates, is vital in water resources management and planning. Within the scope of current research, a novel hybrid algorithm is proposed for estimating GWL values in the Tabriz plain of Iran by combining the artificial neural network (ANN) algorithm with newly developed nature-inspired Coot and Honey Badger metaheuristic optimization algorithms. Various combinations of meteorological data such as temperature, evaporation, and precipitation, previous GWL values, and the month and year values of the data were used to evaluate the algorithm's success. Furthermore, the Shannon entropy of model performance was assessed according to 44 different statistical indicators, classified into two classes: accuracy and error. Hence, based on the high value of Shannon entropy, the best statistical indicator was selected. The results of the best model and the best scenario were analyzed. Results indicated that value of Shannon entropy is higher for the accuracy class than error class. Also, for accuracy and error class, respectively, Akaike information criterion (AIC) and residual sum of squares (RSS) indexes with the highest entropy value which is equal to 12.72 and 7.3 are the best indicators of both classes, and Legate-McCabe efficiency (LME) and normalized root mean square error-mean (NRMSE-Mean) indexes with the lowest entropy value which is equal to 3.7 and - 8.3 are the worst indicators of both classes. According to the evaluation best indicator results in the testing phase, the AIC indicator value for HBA-ANN, COOT-ANN, and the standalone ANN models is equal to - 344, - 332.8, and - 175.8, respectively. Furthermore, it was revealed that the proposed metaheuristic algorithms significantly improve the performance of the standalone ANN model and offer satisfactory GWL prediction results. Finally, it was concluded that the Honey Badger optimization algorithm showed superior results than the Coot optimization algorithm in GWL prediction.
Assuntos
Água Subterrânea , Mustelidae , Animais , Irã (Geográfico) , Entropia , Monitoramento Ambiental/métodos , AlgoritmosRESUMO
Entropy estimation is a fundamental problem in information theory that has applications in various fields, including physics, biology, and computer science. Estimating the entropy of discrete sequences can be challenging due to limited data and the lack of unbiased estimators. Most existing entropy estimators are designed for sequences of independent events and their performances vary depending on the system being studied and the available data size. In this work, we compare different entropy estimators and their performance when applied to Markovian sequences. Specifically, we analyze both binary Markovian sequences and Markovian systems in the undersampled regime. We calculate the bias, standard deviation, and mean squared error for some of the most widely employed estimators. We discuss the limitations of entropy estimation as a function of the transition probabilities of the Markov processes and the sample size. Overall, this paper provides a comprehensive comparison of entropy estimators and their performance in estimating entropy for systems with memory, which can be useful for researchers and practitioners in various fields.
RESUMO
In the case of certain chemical compounds, especially organic ones, electrons can be delocalized between different atoms within the molecule. These resulting bonds, known as resonance bonds, pose a challenge not only in theoretical descriptions of the studied system but also present difficulties in simulating such systems using molecular dynamics methods. In computer simulations of such systems, it is often common practice to use fractional bonds as an averaged value across equivalent structures, known as a resonance hybrid. This paper presents the results of the analysis of five forms of C60 fullerene polymorphs: one with all bonds being resonance, three with all bonds being integer (singles and doubles in different configurations), one with the majority of bonds being integer (singles and doubles), and ten bonds (within two opposite pentagons) valued at one and a half. The analysis involved the Shannon entropy value for bond length distributions and the eigenfrequency of intrinsic vibrations (first vibrational mode), reflecting the stiffness of the entire structure. The maps of the electrostatic potential distribution around the investigated structures are presented and the dipole moment was estimated. Introducing asymmetry in bond redistribution by incorporating mixed bonds (integer and partial), in contrast to variants with equivalent bonds, resulted in a significant change in the examined observables.
RESUMO
Brushed DC motors and generators (DCMs) are extensively used in various industrial applications, including the automotive industry, where they are critical for electric vehicles (EVs) due to their high torque, power, and efficiency. Despite their advantages, DCMs are prone to premature failure due to sparking between brushes and commutators, which can lead to significant economic losses. This study proposes two approaches for determining the temporal and frequency evolution of Shannon entropy in armature current and stray flux signals. One approach indirectly achieves this through prior analysis using the Short-Time Fourier Transform (STFT), while the other applies the Stockwell Transform (S-Transform) directly. Experimental results show that increased sparking activity generates significant low-frequency harmonics, which are more pronounced compared to mid and high-frequency ranges, leading to a substantial rise in system entropy. This finding enables the introduction of fault-severity indicators or Key Performance Indicators (KPIs) that relate the current condition of commutation quality to a baseline established under healthy conditions. The proposed technique can be used as a predictive maintenance tool to detect and assess sparking phenomena in DCMs, providing early warnings of component failure and performance degradation, thereby enhancing the reliability and availability of these machines.
RESUMO
In addition to their importance in statistical thermodynamics, probabilistic entropy measurements are crucial for understanding and analyzing complex systems, with diverse applications in time series and one-dimensional profiles. However, extending these methods to two- and three-dimensional data still requires further development. In this study, we present a new method for classifying spatiotemporal processes based on entropy measurements. To test and validate the method, we selected five classes of similar processes related to the evolution of random patterns: (i) white noise; (ii) red noise; (iii) weak turbulence from reaction to diffusion; (iv) hydrodynamic fully developed turbulence; and (v) plasma turbulence from MHD. Considering seven possible ways to measure entropy from a matrix, we present the method as a parameter space composed of the two best separating measures of the five selected classes. The results highlight better combined performance of Shannon permutation entropy (SHp) and a new approach based on Tsallis Spectral Permutation Entropy (Sqs). Notably, our observations reveal the segregation of reaction terms in this SHp×Sqs space, a result that identifies specific sectors for each class of dynamic process, and it can be used to train machine learning models for the automatic classification of complex spatiotemporal patterns.
RESUMO
The influence of the collective and quantum effects on the Shannon information entropy for atomic states in dense nonideal plasma was investigated. The interaction potential, which takes into account the effect of quantum non-locality as well as electronic correlations, was used to solve the Schrödinger equation for the hydrogen atom. It is shown that taking into account ionic screening leads to an increase in entropy, while taking into account only electronic screening does not lead to significant changes.
RESUMO
We developed a macroscopic description of the evolutionary dynamics by following the temporal dynamics of the total Shannon entropy of sequences, denoted by S, and the average Hamming distance between them, denoted by H. We argue that a biological system can persist in the so-called quasi-equilibrium state for an extended period, characterized by strong correlations between S and H, before undergoing a phase transition to another quasi-equilibrium state. To demonstrate the results, we conducted a statistical analysis of SARS-CoV-2 data from the United Kingdom during the period between March 2020 and December 2023. From a purely theoretical perspective, this allowed us to systematically study various types of phase transitions described by a discontinuous change in the thermodynamic parameters. From a more-practical point of view, the analysis can be used, for example, as an early warning system for pandemics.
RESUMO
The processes involved in encoding and decoding signals in the human brain are a continually studied topic, as neuronal information flow involves complex nonlinear dynamics. This study examines awake human intracranial electroencephalography (iEEG) data from normal brain regions to explore how biological sex influences these dynamics. The iEEG data were analyzed using permutation entropy and statistical complexity in the time domain and power spectrum calculations in the frequency domain. The Bandt and Pompe method was used to assess time series causality by associating probability distributions based on ordinal patterns with the signals. Due to the invasive nature of data acquisition, the study encountered limitations such as small sample sizes and potential sources of error. Nevertheless, the high spatial resolution of iEEG allows detailed analysis and comparison of specific brain regions. The results reveal differences between sexes in brain regions, observed through power spectrum, entropy, and complexity analyses. Significant differences were found in the left supramarginal gyrus, posterior cingulate, supplementary motor cortex, middle temporal gyrus, and right superior temporal gyrus. This study emphasizes the importance of considering sex as a biological variable in brain dynamics research, which is essential for improving the diagnosis and treatment of neurological and psychiatric disorders.
RESUMO
Over the past decade and a half, dynamic functional imaging has revealed low-dimensional brain connectivity measures, identified potential common human spatial connectivity states, tracked the transition patterns of these states, and demonstrated meaningful transition alterations in disorders and over the course of development. Recently, researchers have begun to analyze these data from the perspective of dynamic systems and information theory in the hopes of understanding how these dynamics support less easily quantified processes, such as information processing, cortical hierarchy, and consciousness. Little attention has been paid to the effects of psychiatric disease on these measures, however. We begin to rectify this by examining the complexity of subject trajectories in state space through the lens of information theory. Specifically, we identify a basis for the dynamic functional connectivity state space and track subject trajectories through this space over the course of the scan. The dynamic complexity of these trajectories is assessed along each dimension of the proposed basis space. Using these estimates, we demonstrate that schizophrenia patients display substantially simpler trajectories than demographically matched healthy controls and that this drop in complexity concentrates along specific dimensions. We also demonstrate that entropy generation in at least one of these dimensions is linked to cognitive performance. Overall, the results suggest great value in applying dynamic systems theory to problems of neuroimaging and reveal a substantial drop in the complexity of schizophrenia patients' brain function.
RESUMO
Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback-Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics.
RESUMO
BACKGROUND: Protein methylation, a post-translational modification, is crucial in regulating various cellular functions. Arginine methylation is required to understand crucial biochemical activities and biological functions, like gene regulation, signal transduction, etc. However, some experimental methods, including Chip-Chip, mass spectrometry, and methylation-specific antibodies, exist for the prediction of methylated proteins. These experimental methods are expensive and tedious. As a result, computational methods based on machine learning play an efficient role in predicting arginine methylation sites. RESULTS: In this research, a novel method called PRMxAI has been proposed to predict arginine methylation sites. The proposed PRMxAI extract sequence-based features, such as dipeptide composition, physicochemical properties, amino acid composition, and information theory-based features (Arimoto, Havrda-Charvat, Renyi, and Shannon entropy), to represent the protein sequences into numerical format. Various machine learning algorithms are implemented to select the better classifier, such as Decision trees, Naive Bayes, Random Forest, Support vector machines, and K-nearest neighbors. The random forest algorithm is selected as the underlying classifier for the PRMxAI model. The performance of PRMxAI is evaluated by employing 10-fold cross-validation, and it yields 87.17% and 90.40% accuracy on mono-methylarginine and di-methylarginine data sets, respectively. This research also examines the impact of various features on both data sets using explainable artificial intelligence. CONCLUSIONS: The proposed PRMxAI shows the effectiveness of the features for predicting arginine methylation sites. Additionally, the SHapley Additive exPlanation method is used to interpret the predictive mechanism of the proposed model. The results indicate that the proposed PRMxAI model outperforms other state-of-the-art predictors.
Assuntos
Aminoácidos , Arginina , Aminoácidos/metabolismo , Arginina/química , Arginina/metabolismo , Metilação , Inteligência Artificial , Teorema de Bayes , Processamento de Proteína Pós-Traducional , AlgoritmosRESUMO
Mutations are the cause of several diseases as well as the underlying force of evolution. A thorough understanding of their biophysical consequences is essential. We present a computational framework for evaluating different levels of mutual information (MI) and its dependence on mutation. We used molecular dynamics trajectories of the third PDZ domain and its different mutations. Nonlinear MI between all residue pairs are calculated by tensor Hermite polynomials up to the fifth order and compared with results from multivariate Gaussian distribution of joint probabilities. We show that MI is written as the sum of a Gaussian and a nonlinear component. Results for the PDZ domain show that the Gaussian term gives a sufficiently accurate representation of MI when compared with nonlinear terms up to the fifth order. Changes in MI between residue pairs show the characteristic patterns resulting from specific mutations. Emergence of new peaks in the MI versus residue index plots of mutated PDZ shows how mutation may change allosteric pathways. Triple correlations are characterized by evaluating MI between triplets of residues. We observed that certain triplets are strongly affected by mutation. Susceptibility of residues to perturbation is obtained by MI and discussed in terms of linear response theory.
Assuntos
Simulação de Dinâmica Molecular , Proteínas , Proteínas/genética , Proteínas/química , Domínios PDZ , Mutação , Distribuição NormalRESUMO
Cell-based models provide a helpful approach for simulating complex systems that exhibit adaptive, resilient qualities, such as cancer. Their focus on individual cell interactions makes them a particularly appropriate strategy to study cancer therapies' effects, which are often designed to disrupt single-cell dynamics. In this work, we propose them as viable methods for studying the time evolution of cancer imaging biomarkers (IBM). We propose a cellular automata model for tumor growth and three different therapies: chemotherapy, radiotherapy, and immunotherapy, following well-established modeling procedures documented in the literature. The model generates a sequence of tumor images, from which a time series of two biomarkers: entropy and fractal dimension, is obtained. Our model shows that the fractal dimension increased faster at the onset of cancer cell dissemination. At the same time, entropy was more responsive to changes induced in the tumor by the different therapy modalities. These observations suggest that the prognostic value of the proposed biomarkers could vary considerably with time. Thus, it is essential to assess their use at different stages of cancer and for different imaging modalities. Another observation derived from the results was that both biomarkers varied slowly when the applied therapy attacked cancer cells scattered along the automatons' area, leaving multiple independent clusters of cells at the end of the treatment. Thus, patterns of change of simulated biomarkers time series could reflect on essential qualities of the spatial action of a given cancer intervention.
Assuntos
Fractais , Neoplasias , Humanos , Autômato Celular , Entropia , Neoplasias/diagnóstico , Neoplasias/terapia , BiomarcadoresRESUMO
Metabolic scaling provides valuable information about the physiological and ecological functions of organisms, although few studies have quantified the metabolic scaling exponent (b) of communities under natural conditions. Maximum entropy theory of ecology (METE) is a constraint-based unified theory with the potential to empirically assess the spatial variation of the metabolic scaling. Our main goal is to develop a novel method of estimating b within a community by integrating metabolic scaling and METE. We also aim to study the relationships between the estimated b and environmental variables across communities. We developed a new METE framework to estimate b in 118 stream fish communities in the north-eastern Iberian Peninsula. We first extended the original maximum entropy model by parameterizing b in the model prediction of the community-level individual size distributions and compared our results with empirical and theoretical predictions. We then tested the effects of abiotic conditions, species composition and human disturbance on the spatial variation of community-level b. We found that community-level b of the best maximum entropy models showed great spatial variability, ranging from 0.25 to 2.38. The mean exponent (b = 0.93) resembled the community-aggregated mean values from three previous metabolic scaling meta-analyses, all of which were greater than the theoretical predictions of 0.67 and 0.75. Furthermore, the generalized additive model showed that b reached maximum at the intermediate mean annual precipitation level and declined significantly as human disturbance intensified. The parameterized METE is proposed here as a novel framework for estimating the metabolic pace of life of stream fish communities. The large spatial variation of b may reflect the combined effects of environmental constraints and species interactions, which likely have important feedback on the structure and function of natural communities. Our newly developed framework can also be applied to study the impact of global environmental pressures on metabolic scaling and energy use in other ecosystems.