ABSTRACT
Febrile seizures during early childhood are a relevant risk factor for the development of mesial temporal lobe epilepsy. Nevertheless, the molecular mechanism induced by febrile seizures that render the brain susceptible or not-susceptible to epileptogenesis remain poorly understood. Because the temporal investigation of such mechanisms in human patients is impossible, rat models of hyperthermia-induced febrile seizures have been used for that purpose. Here we conducted a temporal analysis of the transcriptomic and microRNA changes in the ventral CA3 of rats that develop (HS group) or not-develop (HNS group) seizures after hyperthermic insult on the eleventh postnatal day. The selected time intervals corresponded to acute, latent, and chronic phases of the disease. We found that the transcriptional differences between the HS and the HNS groups are related to inflammatory pathways, immune response, neurogenesis, and dendritogenesis in the latent and chronic phases. Additionally, the HNS group expressed a greater number of miRNAs (some abundantly expressed) as compared to the HS group. These results indicate that HNS rats were able to modulate their inflammatory response after insult, thus presenting better tissue repair and re-adaptation. Potential therapeutic targets, including genes, miRNAs and signaling pathways involved in epileptogenesis were identified.
Subject(s)
Hyperthermia, Induced , MicroRNAs , Seizures, Febrile , Humans , Child, Preschool , Rats , Animals , Seizures, Febrile/genetics , Transcriptome , Hippocampus , MicroRNAs/genetics , Disease SusceptibilityABSTRACT
Since the molecular mechanisms determining COVID-19 severity are not yet well understood, there is a demand for biomarkers derived from comparative transcriptome analyses of mild and severe cases, combined with patients' clinico-demographic and laboratory data. Here the transcriptomic response of human leukocytes to SARS-CoV-2 infection was investigated by focusing on the differences between mild and severe cases and between age subgroups (younger and older adults). Three transcriptional modules correlated with these traits were functionally characterized, as well as 23 differentially expressed genes (DEGs) associated to disease severity. One module, correlated with severe cases and older patients, had an overrepresentation of genes involved in innate immune response and in neutrophil activation, whereas two other modules, correlated with disease severity and younger patients, harbored genes involved in the innate immune response to viral infections, and in the regulation of this response. This transcriptomic mechanism could be related to the better outcome observed in younger COVID-19 patients. The DEGs, all hyper-expressed in the group of severe cases, were mostly involved in neutrophil activation and in the p53 pathway, therefore related to inflammation and lymphopenia. These biomarkers may be useful for getting a better stratification of risk factors in COVID-19.
Subject(s)
Age Factors , COVID-19 , Patient Acuity , Humans , Biomarkers/metabolism , COVID-19/genetics , Leukocytes/metabolism , SARS-CoV-2/metabolism , TranscriptomeABSTRACT
BACKGROUND: In this study, clustering was performed using a bitmap representation of HIV reverse transcriptase and protease sequences, to produce an unsupervised classification of HIV sequences. The classification will aid our understanding of the interactions between mutations and drug resistance. 10,229 HIV genomic sequences from the protease and reverse transcriptase regions of the pol gene and antiretroviral resistant related mutations represented in an 82-dimensional binary vector space were analyzed. RESULTS: A new cluster representation was proposed using an image inspired by microarray data, such that the rows in the image represented the protein sequences from the genotype data and the columns represented presence or absence of mutations in each protein position.The visualization of the clusters showed that some mutations frequently occur together and are probably related to an epistatic phenomenon. CONCLUSION: We described a methodology based on the application of a pattern recognition algorithm using binary data to suggest clusters of mutations that can easily be discriminated by cluster viewing schemes.
Subject(s)
Algorithms , Drug Resistance, Viral/genetics , HIV Protease/genetics , HIV Reverse Transcriptase/genetics , HIV-1/genetics , Mutation/genetics , Anti-HIV Agents/pharmacology , Genotype , HIV Infections/drug therapy , HIV Infections/epidemiology , HIV Infections/virology , HIV-1/drug effects , HIV-1/enzymology , HumansABSTRACT
BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. METHODS: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. RESULTS AND CONCLUSIONS: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
Subject(s)
Gene Regulatory Networks , Algorithms , Databases, Genetic , Genome , Plasmodium falciparum/genetics , Protein Interaction Maps , Proteins/metabolismABSTRACT
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree
Subject(s)
Computational Biology/methods , Gene Regulatory Networks/genetics , Models, Genetic , Software Validation , Systems Biology/methods , Algorithms , Artificial Intelligence , Computer Simulation , Gene Expression , Synthetic Biology , Time FactorsABSTRACT
BACKGROUND: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. RESULTS: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. CONCLUSIONS: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.
Subject(s)
Computational Biology/methods , Entropy , Gene Regulatory Networks , Models, Genetic , Time FactorsABSTRACT
BACKGROUND: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. RESULTS: The intent of this work is to provide an open-source multiplatform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system. CONCLUSION: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.
Subject(s)
Computational Biology/methods , Genomics/methods , Pattern Recognition, Automated/methods , Software , Algorithms , Bayes Theorem , Data Interpretation, Statistical , Internet , Markov Chains , Models, Genetic , Reproducibility of Results , User-Computer InterfaceABSTRACT
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods, based on the occurrence of specific patterns of nucleotides at coding regions, have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically pre-defined window length required for a local analysis of a DNA region. We introduce a method, based on a modified Gabor-wavelet transform (MGWT), for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods using eukaryote datasets. The results show that the MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors, but also makes available a tool for detailed exploration of the nucleotide occurrence.
Subject(s)
DNA/genetics , Proteins/genetics , Sequence Analysis, DNA/statistics & numerical data , Computational Biology , Databases, Nucleic Acid , Databases, Protein , Globins/genetics , Humans , Models, Statistical , Pattern Recognition, Automated , Signal Processing, Computer-AssistedABSTRACT
BACKGROUND: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. RESULTS: A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. CONCLUSION: The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
Subject(s)
Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Oligonucleotide Array Sequence Analysis , Astrocytoma/genetics , Astrocytoma/pathology , Brain/metabolism , Gene Library , Glioblastoma/genetics , Humans , Models, StatisticalABSTRACT
This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported.
Subject(s)
Computational Biology , Knowledge , Databases as Topic , Gene Expression Profiling , Systems IntegrationABSTRACT
Leiomyoma is a benign smooth muscle tumor of the uterus that affects many women in active reproductive life. It is composed by bundles of smooth muscle cells surrounded by extracellular matrix. We have recently shown that the glycosylation of extracellular matrix proteoglycans is modified in leiomyoma: increased amounts of galactosaminoglycans with structural modifications are present. The data here presented show that decorin is present in both normal myometrium and leiomyoma but tumoral decorin is glycosylated with longer galactosaminoglycan side chains. Furthermore, these chains contain a higher ratio D-glucuronate/L-iduronate, as compared to normal tissue. To determine if these changes in proteoglycan glycosylation correlates with modifications in the extracellular matrix organization, we compared the general structural architecture of leiomyoma to normal myometrium. By histochemical and immunofluorescence methods, we found a reorganization of muscle fibers and extracellular matrix, with changes in the distribution of glycoproteins, proteoglycans, and collagen. Thin reticular fibers, possibly composed by types I and III collagen, were replaced by thick fibers, possibly richer in type I collagen. Type I collagen colocalized with decorin both in leiomyoma and normal myometrium, in contrast to type IV collagen that did not. The relative amount of decorin was increased and the distribution of decorin and collagen was totally modified in the tumor, as compared to the normal myometrium. These findings reveal that not only decorin structure is modified in leiomyoma but also the tissue architecture changed, especially concerning extracellular matrix.