Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
Genes (Basel) ; 14(6)2023 06 11.
Article in English | MEDLINE | ID: mdl-37372430

ABSTRACT

The likelihood of being diagnosed with thyroid cancer has increased in recent years; it is the fastest-expanding cancer in the United States and it has tripled in the last three decades. In particular, Papillary Thyroid Carcinoma (PTC) is the most common type of cancer affecting the thyroid. It is a slow-growing cancer and, thus, it can usually be cured. However, given the worrying increase in the diagnosis of this type of cancer, the discovery of new genetic markers for accurate treatment and prognostic is crucial. In the present study, the aim is to identify putative genes that may be specifically relevant in PTC through bioinformatic analysis of several gene expression public datasets and clinical information. Two datasets from Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) dataset were studied. Statistics and machine learning methods were sequentially employed to retrieve a final small cluster of genes of interest: PTGFR, ZMAT3, GABRB2, and DPP6. Kaplan-Meier plots were employed to assess the expression levels regarding overall survival and relapse-free survival. Furthermore, a manual bibliographic search for each gene was carried out, and a Protein-Protein Interaction (PPI) network was built to verify existing associations among them, followed by a new enrichment analysis. The results revealed that all the genes are highly relevant in the context of thyroid cancer and, more particularly interesting, PTGFR and DPP6 have not yet been associated with the disease up to date, thus making them worthy of further investigation as to their relationship to PTC.


Subject(s)
Gene Expression Regulation, Neoplastic , Thyroid Neoplasms , Humans , Thyroid Cancer, Papillary/metabolism , Neoplasm Recurrence, Local/genetics , Thyroid Neoplasms/pathology , Computational Biology , Gene Expression
2.
Polymers (Basel) ; 15(5)2023 Mar 06.
Article in English | MEDLINE | ID: mdl-36904566

ABSTRACT

Artificial intelligence (AI) is an emerging technology that is revolutionizing the discovery of new materials. One key application of AI is virtual screening of chemical libraries, which enables the accelerated discovery of materials with desired properties. In this study, we developed computational models to predict the dispersancy efficiency of oil and lubricant additives, a critical property in their design that can be estimated through a quantity named blotter spot. We propose a comprehensive approach that combines machine learning techniques with visual analytics strategies in an interactive tool that supports domain experts' decision-making. We evaluated the proposed models quantitatively and illustrated their benefits through a case study. Specifically, we analyzed a series of virtual polyisobutylene succinimide (PIBSI) molecules derived from a known reference substrate. Our best-performing probabilistic model was Bayesian Additive Regression Trees (BART), which achieved a mean absolute error of 5.50±0.34 and a root mean square error of 7.56±0.47, as estimated through 5-fold cross-validation. To facilitate future research, we have made the dataset, including the potential dispersants used for modeling, publicly available. Our approach can help accelerate the discovery of new oil and lubricant additives, and our interactive tool can aid domain experts in making informed decisions based on blotter spot and other key properties.

3.
J Chem Inf Model ; 62(24): 6342-6351, 2022 12 26.
Article in English | MEDLINE | ID: mdl-36066065

ABSTRACT

The Ames mutagenicity test constitutes the most frequently used assay to estimate the mutagenic potential of drug candidates. While this test employs experimental results using various strains of Salmonella typhimurium, the vast majority of the published in silico models for predicting mutagenicity do not take into account the test results of the individual experiments conducted for each strain. Instead, such QSAR models are generally trained employing overall labels (i.e., mutagenic and nonmutagenic). Recently, neural-based models combined with multitask learning strategies have yielded interesting results in different domains, given their capabilities to model multitarget functions. In this scenario, we propose a novel neural-based QSAR model to predict mutagenicity that leverages experimental results from different strains involved in the Ames test by means of a multitask learning approach. To the best of our knowledge, the modeling strategy hereby proposed has not been applied to model Ames mutagenicity previously. The results yielded by our model surpass those obtained by single-task modeling strategies, such as models that predict the overall Ames label or ensemble models built from individual strains. For reproducibility and accessibility purposes, all source code and datasets used in our experiments are publicly available.


Subject(s)
Mutagens , Neural Networks, Computer , Mutagens/toxicity , Reproducibility of Results , Mutagenesis , Computer Simulation , Mutagenicity Tests/methods
4.
J Chem Phys ; 156(20): 204903, 2022 May 28.
Article in English | MEDLINE | ID: mdl-35649865

ABSTRACT

The artificial intelligence-based prediction of the mechanical properties derived from the tensile test plays a key role in assessing the application profile of new polymeric materials, especially in the design stage, prior to synthesis. This strategy saves time and resources when creating new polymers with improved properties that are increasingly demanded by the market. A quantitative structure-property relationship (QSPR) model for tensile strength at break is presented in this work. The QSPR methodology applied here is based on machine learning tools, visual analytics methods, and expert-in-the-loop strategies. From the whole study, a QSPR model composed of five molecular descriptors that achieved a correlation coefficient of 0.9226 is proposed. We applied visual analytics tools at two levels of analysis: a more general one in which models are discarded for redundant information metrics and a deeper one in which a chemistry expert can make decisions on the composition of the model in terms of subsets of molecular descriptors, from a physical-chemical point of view. In this way, with the present work, we close a contribution cycle to polymer informatics, providing QSPR models oriented to the prediction of mechanical properties related to the tensile test.


Subject(s)
Artificial Intelligence , Polymers , Informatics , Polymers/chemistry , Quantitative Structure-Activity Relationship
5.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34498670

ABSTRACT

With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for Quantitative Structure-Activity Relationship (QSAR) modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over $25 000$ trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. Although supervised embeddings yielded competitive results compared with those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks and motivate a discussion about the potential of molecular embeddings in computer-aided drug design.


Subject(s)
Algorithms , Quantitative Structure-Activity Relationship
6.
IEEE Trans Vis Comput Graph ; 27(2): 891-901, 2021 02.
Article in English | MEDLINE | ID: mdl-33048734

ABSTRACT

In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.


Subject(s)
Computer Graphics , Proteins
7.
J Chem Inf Model ; 60(2): 592-603, 2020 02 24.
Article in English | MEDLINE | ID: mdl-31790226

ABSTRACT

The feature selection (FS) process is a key step in the Quantitative Structure-Property Relationship (QSPR) modeling of physicochemical properties in cheminformatics. In particular, the inference of QSPR models for polymeric material properties constitutes a complex problem because of the uncertainty introduced by the polydispersity of these materials. The main challenge is how to capture the polydispersity information from the molecular weight distribution (MWD) curve to achieve a more effective computational representation of polymeric materials. To date, most of the existing QSPR techniques use only a single molecule to represent each of these materials, but polydispersity is not considered. Consequently, QSPR models obtained by these approaches are being oversimplified. For this reason, we introduced in a previous work a new FS algorithm called Feature Selection for Random Variables with Discrete Distribution (FS4RVDD), which allows dealing with polydisperse data. In the present paper, we evaluate both the scalability and the robustness of the FS4RVDD algorithm. In this sense, we generated synthetic data by varying and combining different parameters: the size of the database, the cardinality of the selected feature subsets, the presence of noise in the data, and the type of correlation (linear and nonlinear). Moreover, the performances obtained by FS4RVDD were contrasted with traditional FS techniques applied to different simplified representations of polymeric materials. The obtained results show that the FS4RVDD algorithm outperformed the traditional FS methods in all proposed scenarios, which suggest the need of an algorithm such as FS4RVDD to deal with the uncertainty that polydispersity introduces in human-made polymers.


Subject(s)
Algorithms , Polymers/chemistry , Models, Molecular , Molecular Conformation , Molecular Weight , Quantitative Structure-Activity Relationship
8.
Comput Methods Programs Biomed ; 177: 211-218, 2019 Aug.
Article in English | MEDLINE | ID: mdl-31319950

ABSTRACT

BACKGROUND AND OBJECTIVE: Gene regulatory networks (GRNs) are essential for understanding most molecular processes. In this context, the so-called model-free approaches have an advantage modeling the complex topologies behind these dynamic molecular networks, since most GRNs are difficult to map correctly by any other mathematical model. Abstract model-free approaches, also known as rule-based extraction methods, offer valuable benefits when performing data-driven analysis; such as requiring the least amount of data and simplifying the inference of large models at a faster analysis speed. In particular, GRNCOP2 is a combinatorial optimization method with an adaptive criterion for the discretization of gene expression data and high performance, in contrast to other rule-based extraction methods for discovering GRNs. However, the analysis of the large relational structures of the networks inferred by GRNCOP2 requires the support of effective tools for interactive network visualization and topological analysis of the extracted associations. This need motivated the possibility of integrating GRNCOP2 in the Cytoscape ecosystem in order to benefit from Cytoscapes core functionality, as well as all the other apps in its ecosystem. METHODS: In this paper, we introduce the implementation of a GRNCOP2 Cytoscape app. This incorporation to Cytoscape platform includes new functionality for GRN visualizations, dynamic user-interaction and integration with other apps for topological analysis of the networks. RESULTS: In order to demonstrate the usefulness of integrating GRNCOP2 in Cytoscape, the new app was used to tackle a novel use case for GRNCOP2: the analysis of crosstalk between pathways. In this regard, datasets associated with Alzheimer's disease (AD) were analyzed using GRNCOP2 app and other apps of the Cytoscape ecosystem by performing a topological analysis of the AD progression and its synchronization with the Ubiquitin Mediated Proteolysis pathway. Finally, the biological relevance of the findings achieved by this new app were evaluated by searching for evidence in the literature. CONCLUSIONS: The proposed crosstalk analysis with the new GRNCOP2 app focused on assessing the phase of the Alzheimer's disease progression where the coordination with the Ubiquitin Mediated Proteolysis pathway increase, and identifying the genes that explain the signalling between these cellular processes. Both questions were explored by topological contrastive analysis of the GRNs generated for the GRNCOP2 app, where several facilities of Cytoscape were exploited. The topological patterns inferred by this new App have been consistent with biological evidence reported in the scientic literature, illustrating the effectiveness of using this new GRNCOP2 App in pathway analysis. AVAILABILITY: The GRNCOP2 App is freely available at the official Cytoscape app store: http://apps.cytoscape.org/apps/grncop2.


Subject(s)
Alzheimer Disease/physiopathology , Gene Regulatory Networks , Medical Informatics/methods , Proteolysis , Software , Ubiquitin/metabolism , Algorithms , Alzheimer Disease/metabolism , Computational Biology/methods , Computer Graphics , Disease Progression , Gene Expression , Humans , Models, Statistical , Signal Transduction , User-Computer Interface
9.
Sci Rep ; 9(1): 9102, 2019 06 24.
Article in English | MEDLINE | ID: mdl-31235739

ABSTRACT

Alzheimer's disease is one of the most common neurodegenerative disorders in elder population. The ß-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.


Subject(s)
Alzheimer Disease/drug therapy , Amyloid Precursor Protein Secretases/antagonists & inhibitors , Protease Inhibitors/chemistry , Protease Inhibitors/pharmacology , Quantitative Structure-Activity Relationship , Alzheimer Disease/enzymology , Computer Simulation , Machine Learning , Protease Inhibitors/therapeutic use
10.
Biomed Res Int ; 2019: 2905203, 2019.
Article in English | MEDLINE | ID: mdl-30906770

ABSTRACT

The selection of the most relevant molecular descriptors to describe a target variable in the context of QSAR (Quantitative Structure-Activity Relationship) modelling is a challenging combinatorial optimization problem. In this paper, a novel software tool for addressing this task in the context of regression and classification modelling is presented. The methodology that implements the tool is organized into two phases. The first phase uses a multiobjective evolutionary technique to perform the selection of subsets of descriptors. The second phase performs an external validation of the chosen descriptors subsets in order to improve reliability. The tool functionalities have been illustrated through a case study for the estimation of the ready biodegradation property as an example of classification QSAR modelling. The results obtained show the usefulness and potential of this novel software tool that aims to reduce the time and costs of development in the drug discovery process.


Subject(s)
Machine Learning , Models, Molecular , Software , Quantitative Structure-Activity Relationship
11.
J Integr Bioinform ; 16(1)2019 Feb 14.
Article in English | MEDLINE | ID: mdl-30763264

ABSTRACT

Parkinson's disease is one of the most common neurodegenerative illnesses in older persons and the leucine-rich repeat kinase 2 (LRRK2) is an auspicious target for its pharmacological treatment. In this work, quantitative structure-activity relationship (QSAR) models for identification of putative inhibitors of LRRK2 protein are developed by using an in-house chemical library and several machine learning techniques. The methodology applied in this paper has two steps: first, alternative subsets of molecular descriptors useful for characterizing LRRK2 inhibitors are chosen by a multi-objective feature selection method; secondly, QSAR models are learned by using these subsets and three different strategies for supervised learning. The qualities of all these QSAR models are compared by classical metrics and the best models are discussed in statistical and physicochemical terms.


Subject(s)
Leucine-Rich Repeat Serine-Threonine Protein Kinase-2/antagonists & inhibitors , Models, Molecular , Parkinson Disease/drug therapy , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Quantitative Structure-Activity Relationship , Computer Simulation , Humans , Molecular Structure , Parkinson Disease/enzymology
12.
Sci Rep ; 7(1): 2403, 2017 05 25.
Article in English | MEDLINE | ID: mdl-28546583

ABSTRACT

Quantitative structure-activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.


Subject(s)
Drug Discovery , Machine Learning , Models, Molecular , Quantitative Structure-Activity Relationship , Algorithms , Blood-Brain Barrier/drug effects , Blood-Brain Barrier/metabolism , Chemical Phenomena , Drug Discovery/methods , Humans , Intestinal Absorption/drug effects , Software
13.
Evol Bioinform Online ; 12: 247-251, 2016.
Article in English | MEDLINE | ID: mdl-27812277

ABSTRACT

The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow the discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. In this study, a de novo strategy for detecting patterns that represent nested motifs was designed based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories, motifs within other motifs, motifs flanked by other motifs, and motifs of large size. The methodology used in this study, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to identify putative nested TEs by detecting these three types of patterns. The results were validated through BLAST alignments, which revealed the efficacy and usefulness of the new method, which is called Mamushka.

14.
Biosystems ; 150: 1-12, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27521767

ABSTRACT

Detection of crosstalks among pathways is a challenging task, which requires the identification of different types of interactions associated with cellular processes. A common strategy used in bioinformatics consists in extrapolating pathway associations from the pairwise analysis of some genes related to them, using gene expression data and topological information. PET, the method proposed in this paper, goes a step further by incorporating a strategy for the detection of correlation across conditions between differentially expressed genes based on biclustering analysis. In order to evaluate the performance of this new approach, a comparison with two recently published algorithms was carried out. The methods were contrasted in the inference of pathway associations from Alzheimer disease datasets, where the new proposal presents a higher crosstalk discoveries' rate. Finally, the analysis of the biological relevance of the pathway associations inferred by PET has shown the soundness of the extracted knowledge.


Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Gene Expression Regulation , Algorithms , Alzheimer Disease/diagnosis , Alzheimer Disease/genetics , Cluster Analysis , Humans
15.
J Integr Bioinform ; 13(2): 286, 2016 Nov 27.
Article in English | MEDLINE | ID: mdl-28187416

ABSTRACT

Several feature extraction approaches for QSPR modelling in Cheminformatics are discussed in this paper. In particular, this work is focused on the use of these strategies for predicting mechanical properties, which are relevant for the design of polymeric materials. The methodology analysed in this study employs a feature learning method that uses a quantification process of 2D structural characterization of materials with the autoencoder method. Alternative QSPR models inferred for tensile strength at break (a well-known mechanical property of polymers) are presented. These alternative models are contrasted to QSPR models obtained by feature selection technique by using accuracy measures and a visual analytic tool. The results show evidence about the benefits of combining feature learning approaches with feature selection methods for the design of QSPR models.


Subject(s)
Models, Chemical , Polymers/chemistry , Tensile Strength
16.
Brief Bioinform ; 17(5): 758-70, 2016 09.
Article in English | MEDLINE | ID: mdl-26438418

ABSTRACT

Gene expression measurements represent the most important source of biological data used to unveil the interaction and functionality of genes. In this regard, several data mining and machine learning algorithms have been proposed that require, in a number of cases, some kind of data discretization to perform the inference. Selection of an appropriate discretization process has a major impact on the design and outcome of the inference algorithms, as there are a number of relevant issues that need to be considered. This study presents a revision of the current state-of-the-art discretization techniques, together with the key subjects that need to be considered when designing or selecting a discretization approach for gene expression data.


Subject(s)
Gene Expression , Algorithms , Data Mining , Gene Expression Profiling
17.
J Cheminform ; 7: 39, 2015.
Article in English | MEDLINE | ID: mdl-26300983

ABSTRACT

BACKGROUND: The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. RESULTS: In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. CONCLUSIONS: The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.

18.
BMC Syst Biol ; 8 Suppl 2: S7, 2014.
Article in English | MEDLINE | ID: mdl-25032889

ABSTRACT

BACKGROUND: The development of high-throughput omics technologies enabled genome-wide measurements of the activity of cellular elements and provides the analytical resources for the progress of the Systems Biology discipline. Analysis and interpretation of gene expression data has evolved from the gene to the pathway and interaction level, i.e. from the detection of differentially expressed genes, to the establishment of gene interaction networks and the identification of enriched functional categories. Still, the understanding of biological systems requires a further level of analysis that addresses the characterization of the interaction between functional modules. RESULTS: We present a novel computational methodology to study the functional interconnections among the molecular elements of a biological system. The PANA approach uses high-throughput genomics measurements and a functional annotation scheme to extract an activity profile from each functional block -or pathway- followed by machine-learning methods to infer the relationships between these functional profiles. The result is a global, interconnected network of pathways that represents the functional cross-talk within the molecular system. We have applied this approach to describe the functional transcriptional connections during the yeast cell cycle and to identify pathways that change their connectivity in a disease condition using an Alzheimer example. CONCLUSIONS: PANA is a useful tool to deepen in our understanding of the functional interdependences that operate within complex biological systems. We show the approach is algorithmically consistent and the inferred network is well supported by the available functional data. The method allows the dissection of the molecular basis of the functional connections and we describe the different regulatory mechanisms that explain the network's topology obtained for the yeast cell cycle data.


Subject(s)
Gene Expression Profiling , Gene Regulatory Networks , Systems Biology/methods , Alzheimer Disease/genetics , Cell Cycle/genetics , DNA Replication/genetics , Gluconeogenesis/genetics , Glycolysis/genetics , Oxidative Phosphorylation , Proteolysis , Purines/metabolism , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Ubiquitin/metabolism
19.
Molecules ; 17(12): 14937-53, 2012 Dec 17.
Article in English | MEDLINE | ID: mdl-23247367

ABSTRACT

Volatile organic compounds (VOCs) are contained in a variety of chemicals that can be found in household products and may have undesirable effects on health. Thereby, it is important to model blood-to-liver partition coefficients (log P(liver)) for VOCs in a fast and inexpensive way. In this paper, we present two new quantitative structure-property relationship (QSPR) models for the prediction of log P(liver), where we also propose a hybrid approach for the selection of the descriptors. This hybrid methodology combines a machine learning method with a manual selection based on expert knowledge. This allows obtaining a set of descriptors that is interpretable in physicochemical terms. Our regression models were trained using decision trees and neural networks and validated using an external test set. Results show high prediction accuracy compared to previous log P(liver) models, and the descriptor selection approach provides a means to get a small set of descriptors that is in agreement with theoretical understanding of the target property.


Subject(s)
Gases , Models, Theoretical , Quantitative Structure-Activity Relationship , Volatile Organic Compounds , Animals , Artificial Intelligence , Gases/chemistry , Gases/toxicity , Humans , Liver/drug effects , Rats , Volatile Organic Compounds/chemistry , Volatile Organic Compounds/toxicity
20.
BMC Bioinformatics ; 12: 123, 2011 Apr 27.
Article in English | MEDLINE | ID: mdl-21524308

ABSTRACT

BACKGROUND: Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. RESULTS: This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. CONCLUSIONS: A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Gene Regulatory Networks , Oligonucleotide Array Sequence Analysis/methods , Saccharomyces cerevisiae/genetics , Databases, Genetic , Gene Expression Regulation , Saccharomyces cerevisiae Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...