Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33585910

ABSTRACT

As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability:https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.


Subject(s)
Computational Biology/methods , Deep Learning , Models, Theoretical , RNA, Circular/genetics , RNA, Long Noncoding/genetics , RNA, Messenger/genetics , Base Sequence/genetics , Entropy , Fourier Analysis , Humans , Open Reading Frames , RNA, Circular/classification , RNA, Long Noncoding/classification , RNA, Messenger/classification
2.
Data Brief ; 23: 103652, 2019 Apr.
Article in English | MEDLINE | ID: mdl-30788393

ABSTRACT

Agribusiness has a great relevance in the world׳s economy. It generates a considerable impact in the gross national product of several nations. Hence, it is the major driver of many national economies. Nowadays, from each new planting to harvesting process it is mandatory and crucial to apply some kind of technology to optimize a given singular process, or even the entire cropping chain. For instance, digital image analysis joined with machine learning methods can be applied to obtain and guarantee a higher quality of the harvest, leading to not only a greater profit for producers, but also better products with lower cost to the final consumers. Thus, to provide this possibility this work describes a visual feature dataset from soybean seed images obtained from the tetrazolium test. This is a test capable to define how healthy a given seed is (e.g. how much the plant will produce, or if it is resistant to inclement weather, among others). To answer these questions we proposed this dataset which is the cornerstone to provide an effective classification of the soybean seed vigor (i.e. an extremely tiresome human visual inspection process). Besides, as one of the most prominent international commodity, the soybean production must follow rigid quality control process to be part of world trade. Hence, small mistakes in the seed vigor definition of a given seed lot can lead to huge losses.

3.
Gene ; 541(2): 129-37, 2014 May 15.
Article in English | MEDLINE | ID: mdl-24631265

ABSTRACT

Inference of gene regulatory networks (GRNs) is one of the most challenging research problems of Systems Biology. In this investigation, a new GRNs inference methodology, called Entropic Biological Score (EBS), which linearly combines the mean conditional entropy (MCE) from expression levels and a Biological Score (BS), obtained by integrating different biological data sources, is proposed. The EBS is validated with the Cell Cycle related functional annotation information, available from Munich Information Center for Protein Sequences (MIPS), and compared with some existing methods like MRNET, ARACNE, CLR and MCE for GRNs inference. For real networks, the performance of EBS, which uses the concept of integrating different data sources, is found to be superior to the aforementioned inference methods. The best results for EBS are obtained by considering the weights w1=0.2 and w2=0.8 for MCE and BS values, respectively, where approximately 40% of the inferred connections are found to be correct and significantly better than related methods. The results also indicate that expression profile is able to recover some true connections, that are not present in biological annotations, thus leading to the possibility of discovering new relations between its genes.


Subject(s)
Cell Cycle/genetics , Computational Biology/methods , Gene Regulatory Networks , Entropy , Gene Expression , Models, Theoretical , Phenotype , Protein Interaction Mapping
4.
Malar J ; 12: 69, 2013 Feb 21.
Article in English | MEDLINE | ID: mdl-23433077

ABSTRACT

BACKGROUND: Plasmodium vivax malaria clinical outcomes are a consequence of the interaction of multiple parasite, environmental and host factors. The host molecular and genetic determinants driving susceptibility to disease severity in this infection are largely unknown. Here, a network analysis of large-scale data from a significant number of individuals with different clinical presentations of P. vivax malaria was performed in an attempt to identify patterns of association between various candidate biomarkers and the clinical outcomes. METHODS: A retrospective analysis of 530 individuals from the Brazilian Amazon, including P. vivax-infected individuals who developed different clinical outcomes (148 asymptomatic malaria, 187 symptomatic malaria, 13 severe non-lethal malaria, and six severe lethal malaria) as well as 176 non-infected controls, was performed. Plasma levels of liver transaminases, bilirubins, creatinine, fibrinogen, C-reactive protein, superoxide dismutase (SOD)-1, haem oxygenase (HO)-1 and a panel composed by multiple cytokines and chemokines were measured and compared between the different clinical groups using network analysis. RESULTS: Non-infected individuals displayed several statistically significant interactions in the networks, including associations between the levels of IL-10 and IL-4 with the chemokine CXCL9. Individuals with asymptomatic malaria displayed multiple significant interactions involving IL-4. Subjects with mild or severe non-lethal malaria displayed substantial loss of interactions in the networks and TNF had significant associations more frequently with other parameters. Cases of lethal P. vivax malaria infection were associated with significant interactions between TNF ALT, HO-1 and SOD-1. CONCLUSIONS: The findings imply that clinical immunity to P. vivax malaria is associated with multiple significant interactions in the network, mostly involving IL-4, while lethality is linked to a systematic reduction of complexity of these interactions and to an increase in connections between markers linked to haemolysis-induced damage.


Subject(s)
Malaria, Vivax/immunology , Malaria, Vivax/pathology , Plasmodium vivax/immunology , Adolescent , Adult , Blood Chemical Analysis , Brazil , Female , Host-Pathogen Interactions , Humans , Male , Middle Aged , Retrospective Studies , Young Adult
5.
Toxicology ; 304: 100-8, 2013 Feb 08.
Article in English | MEDLINE | ID: mdl-23274088

ABSTRACT

Pteridium aquilinum, one of the most important poisonous plants in the world, is known to be carcinogenic to animals and humans. Moreover, our previous studies showed that the immunosuppressive effects of ptaquiloside, its main toxic agent, were prevented by selenium in mouse natural killer (NK) cells. We also verified that this immunosuppression facilitated development of cancer. Here, we performed gene expression microarray analysis in splenic NK cells from mice treated for 14 days with ptaquiloside (5.3 mg/kg) and/or selenium (1.3 mg/kg) to identify gene transcripts altered by ptaquiloside that could be linked to the immunosuppression and that would be prevented by selenium. Transcriptome analysis of ptaquiloside samples revealed that 872 transcripts were expressed differentially (fold change>2 and p<0.05), including 77 up-regulated and 795 down-regulated transcripts. Gene ontology analysis mapped these up-regulated transcripts to three main biological processes (cellular ion homeostasis, negative regulation of apoptosis and regulation of transcription). Considering the immunosuppressive effect of ptaquiloside, we hypothesized that two genes involved in cellular ion homeostasis, metallothionein 1 (Mt1) and metallothionein 2 (Mt2), could be implicated because Mt1 and Mt2 are responsible for zinc homeostasis, and a reduction of free intracellular zinc impairs NK functions. We confirm these hypotheses and show increased expression of metallothionein in splenic NK cells and reduction in free intracellular zinc following treatment with ptaquiloside that were completely prevented by selenium co-treatment. These findings could help avoid the higher susceptibility to cancer that is induced by P. aquilinum-mediated immunosuppressive effects.


Subject(s)
Indans/toxicity , Killer Cells, Natural/drug effects , Metallothionein/genetics , Selenium/pharmacology , Sesquiterpenes/toxicity , Animals , Apoptosis/drug effects , Carcinogens/toxicity , Down-Regulation/drug effects , Gene Expression Profiling , Killer Cells, Natural/metabolism , Male , Mice , Mice, Inbred C57BL , Oligonucleotide Array Sequence Analysis , Pteridium/chemistry , Spleen/cytology , Spleen/drug effects , Spleen/metabolism , Transcription, Genetic/drug effects , Transcriptome , Up-Regulation/drug effects , Zinc/metabolism
6.
BMC Genomics ; 13 Suppl 6: S7, 2012.
Article in English | MEDLINE | ID: mdl-23134775

ABSTRACT

BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. METHODS: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. RESULTS AND CONCLUSIONS: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.


Subject(s)
Gene Regulatory Networks , Algorithms , Databases, Genetic , Genome , Plasmodium falciparum/genetics , Protein Interaction Maps , Proteins/metabolism
7.
J Comput Biol ; 18(10): 1353-67, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21548810

ABSTRACT

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdös-Rényi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabási-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree variation, decreasing its network recovery rate with the increase of . The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks/genetics , Models, Genetic , Software Validation , Systems Biology/methods , Algorithms , Artificial Intelligence , Computer Simulation , Gene Expression , Synthetic Biology , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...