Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
J Chem Inf Model ; 59(6): 2702-2713, 2019 06 24.
Article in English | MEDLINE | ID: mdl-30908028

ABSTRACT

The ability to search for a query molecule on massive molecular repositories is a fundamental task in chemoinformatics and drug-discovery. Chemical fingerprints are commonly used to characterize the structure and properties of molecules. Some fingerprints, particularly unfolded fingerprints, are often of extreme high dimension and sparse where only few features have a positive value. In this work, we propose a new searching algorithm, RISC, which exploits sparsity in high-dimensional fingerprints to derive effective pruning mechanisms and dramatically speed-up searching efficiency. RISC is robust enough to work on both binary and nonbinary chemical fingerprints. Extensive experiments on Range Queries and Top-k Queries across several molecular repositories demonstrate that at fingerprints of dimension 2048 and above, which is often the case with unfolded fingerprints, RISC is consistently faster than the state-of-the-art techniques. The source code of our implementation is available at http://www.cse.iitd.ac.in/~sayan/software.html .


Subject(s)
Algorithms , Drug Discovery/methods , Small Molecule Libraries/chemistry , Software , Cheminformatics/methods , Humans , Models, Chemical
2.
Bioinformatics ; 33(24): 3955-3963, 2017 Dec 15.
Article in English | MEDLINE | ID: mdl-28961716

ABSTRACT

MOTIVATION: The ability to predict pathways for biosynthesis of metabolites is very important in metabolic engineering. It is possible to mine the repertoire of biochemical transformations from reaction databases, and apply the knowledge to predict reactions to synthesize new molecules. However, this usually involves a careful understanding of the mechanism and the knowledge of the exact bonds being created and broken. There is a need for a method to rapidly predict reactions for synthesizing new molecules, which relies only on the structures of the molecules, without demanding additional information such as thermodynamics or hand-curated reactant mapping, which are often hard to obtain accurately. RESULTS: We here describe a robust method based on subgraph mining, to predict a series of biochemical transformations, which can convert between two (even previously unseen) molecules. We first describe a reliable method based on subgraph edit distance to map reactants and products, using only their chemical structures. Having mapped reactants and products, we identify the reaction centre and its neighbourhood, the reaction signature, and store this in a reaction rule network. This novel representation enables us to rapidly predict pathways, even between previously unseen molecules. We demonstrate this ability by predicting pathways to molecules not present in the KEGG database. We also propose a heuristic that predominantly recovers natural biosynthetic pathways from amongst hundreds of possible alternatives, through a directed search of the reaction rule network, enabling us to provide a reliable ranking of the different pathways. Our approach scales well, even to databases with >100 000 reactions. AVAILABILITY AND IMPLEMENTATION: A Java-based implementation of our algorithms is available at https://github.com/RamanLab/ReactionMiner. CONTACT: sayanranu@cse.iitd.ac.in or kraman@iitm.ac.in. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology/methods , Data Mining , Metabolic Networks and Pathways , Metabolic Engineering , Molecular Structure
3.
Phys Chem Chem Phys ; 19(31): 20891-20903, 2017 Aug 09.
Article in English | MEDLINE | ID: mdl-28745340

ABSTRACT

How many structurally different microscopic routes are accessible to a protein molecule while folding? This has been a challenging question to address experimentally as single-molecule studies are constrained by the limited number of observed folding events while ensemble measurements, by definition, report only an average and not the distribution of the quantity under study. Atomistic simulations, on the other hand, are restricted by sampling and the inability to reproduce thermodynamic observables directly. We overcome these bottlenecks in the current work and provide a quantitative description of folding pathway heterogeneity by developing a comprehensive, scalable and yet experimentally consistent approach combining concepts from statistical mechanics, physical kinetics and graph theory. We quantify the folding pathway heterogeneity of five single-domain proteins under two thermodynamic conditions from an analysis of 100 000 folding events generated from a statistical mechanical model incorporating the detailed energetics from more than a million conformational states. The resulting microstate energetics predicts the results of protein engineering experiments, the thermodynamic stabilities of secondary-structure segments from NMR studies, and the end-to-end distance estimates from single-molecule force spectroscopy measurements. We find that a minimum of ∼3-200 microscopic routes, with a diverse ensemble of transition-path structures, are required to account for the total folding flux across the five proteins and the thermodynamic conditions. The partitioning of flux amongst the numerous pathways is shown to be subtly dependent on the experimental conditions that modulate protein stability, topological complexity and the structural resolution at which the folding events are observed. Our predictive methodology thus reveals the presence of rich ensembles of folding mechanisms that are generally invisible in experiments, reconciles the contradictory observations from experiments and simulations and provides an experimentally consistent avenue to quantify folding heterogeneity.


Subject(s)
Proteins/chemistry , Cluster Analysis , Markov Chains , Protein Folding , Protein Structure, Secondary , Thermodynamics
4.
Phys Chem Chem Phys ; 17(41): 27264-9, 2015 Nov 07.
Article in English | MEDLINE | ID: mdl-26421497

ABSTRACT

We show that the phosphorylation of 4E-BP2 acts as a triggering event to shape its folding-function landscape that is delicately balanced between conflicting favorable energetics and intrinsically unfavorable topological connectivity. We further provide first evidence that the fitness landscapes of proteins at the threshold of disorder can differ considerably from ordered domains.


Subject(s)
Eukaryotic Initiation Factors/chemistry , Eukaryotic Initiation Factors/metabolism , Protein Folding , Thermodynamics , Humans , Phosphorylation , Protein Conformation
5.
J Chem Inf Model ; 51(5): 1106-21, 2011 May 23.
Article in English | MEDLINE | ID: mdl-21488651

ABSTRACT

We propose a novel method for pharmacophore analysis by examining the Joint Pharmacophore Space of chemical compounds, targets, and chemical/biological properties. The proposed approach is a notable deviation from existing techniques that analyze compounds on a target-by-target basis, aimed at extracting and optimizing a specific pharmacophore. The underlying geometry of the pharmacophores is responsible for binding between compounds and targets as well as properties of compounds such as Blood Brain Barrier permeability. The identification of this joint space enables us to cluster and classify similar pharmacophores based on geometric arrangements, analyze the diversity of this space, ascribe positive/negative properties to the subspaces, and query and mine a database of compounds for presence or absence of activity. Extensive experiments are carried out to validate the presence of subspaces that uniquely identify geometric configurations conforming to certain biological activities. The discriminative potential of these subspaces is also verified by employing them as a molecular descriptor. Empirical results show promising performance in terms of classification quality highlighting the utility of mining the joint pharmacophore space.


Subject(s)
Algorithms , Antineoplastic Agents/chemistry , Drug Discovery , Antineoplastic Agents/pharmacology , Binding Sites , Blood-Brain Barrier , Capillary Permeability , Cell Line, Tumor , Data Mining , Databases, Chemical , Humans , Hydrogen Bonding , Hydrophobic and Hydrophilic Interactions , Ligands , Molecular Conformation , Molecular Targeted Therapy
6.
J Chem Inf Model ; 49(11): 2537-50, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19928835

ABSTRACT

The increased availability of large repositories of chemical compounds has created new challenges in designing efficient molecular querying and mining systems. Molecular classification is an important problem in drug development where libraries of chemical compounds are screened and molecules with the highest probability of success against a given target are selected. We have developed a technique called GraphSig to mine significantly over-represented molecular substructures in a given class of molecules. GraphSig successfully overcomes the scalability bottleneck of mining patterns at a low frequency. Patterns mined by GraphSig display correlation with biological activities and serve as an excellent platform on which to build molecular analysis tools. The potential of GraphSig as a chemical descriptor is explored, and support vector machines are used to classify molecules described by patterns mined using GraphSig. Furthermore, the over-represented patterns are more informative than features generated exhaustively by traditional fingerprints; this has potential in providing scaffolds and lead generation. Extensive experiments are carried out to evaluate the proposed techniques, and empirical results show promising performance in terms of classification quality. An implementation of the algorithm is available free for academic use at http://www.uweb.ucsb.edu/ approximately sayan/software/GraphSig.tar.


Subject(s)
Information Storage and Retrieval , Molecular Structure
7.
J Phys Chem Lett ; 9(7): 1771-1777, 2018 Apr 05.
Article in English | MEDLINE | ID: mdl-29565127

ABSTRACT

The inherent conflict between noncovalent interactions and the large conformational entropy of the polypeptide chain forces folding reactions and their mechanisms to deviate significantly from chemical reactions. Accordingly, measures of structure in the transition state ensemble (TSE) are strongly influenced by the underlying distributions of microscopic folding pathways that are challenging to discern experimentally. Here, we present a detailed analysis of 150,000 folding transition paths of five proteins at three different thermodynamic conditions from an experimentally consistent statistical mechanical model. We find that the underlying TSE structural distributions are rarely unimodal, and the average experimental measures arise from complex underlying distributions. Unfolding pathways also exhibit subtle differences from folding counterparts due to a combination of Hammond behavior and native-state movements. Local interactions and topological complexity, to a lesser extent, are found to determine pathway heterogeneity, underscoring the importance of the balance between local and nonlocal energetics in protein folding.


Subject(s)
Protein Folding , Proteins/chemistry , Bacillus subtilis , Entropy , Kinetics , Models, Chemical , Models, Molecular , Phase Transition , Protein Conformation , Protein Domains , Thermotoga maritima
8.
Mol Inform ; 30(9): 809-15, 2011 Sep.
Article in English | MEDLINE | ID: mdl-27467413

ABSTRACT

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

SELECTION OF CITATIONS
SEARCH DETAIL