RESUMO
MOTIVATION: A factory in a metabolic network specifies how to produce target molecules from source compounds through biochemical reactions, properly accounting for reaction stoichiometry to conserve or not deplete intermediate metabolites. While finding factories is a fundamental problem in systems biology, available methods do not consider the number of reactions used, nor address negative regulation. METHODS: We introduce the new problem of finding optimal factories that use the fewest reactions, for the first time incorporating both first- and second-order negative regulation. We model this problem with directed hypergraphs, prove it is NP-complete, solve it via mixed-integer linear programming, and accommodate second-order negative regulation by an iterative approach that generates next-best factories. RESULTS: This optimization-based approach is remarkably fast in practice, typically finding optimal factories in a few seconds, even for metabolic networks involving tens of thousands of reactions and metabolites, as demonstrated through comprehensive experiments across all instances from standard reaction databases. AVAILABILITY AND IMPLEMENTATION: Source code for an implementation of our new method for optimal factories with negative regulation in a new tool called Odinn, together with all datasets, is available free for non-commercial use at http://odinn.cs.arizona.edu.
Assuntos
Algoritmos , Redes e Vias Metabólicas , Programação Linear , Software , Biologia de SistemasRESUMO
MOTIVATION: Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD: We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS: On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION: A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
Assuntos
Algoritmos , Proteínas , Análise por Conglomerados , Biologia Computacional , Estrutura Secundária de ProteínaRESUMO
Perhaps the most fundamental model in synthetic and systems biology for inferring pathways in metabolic reaction networks is a metabolic factory: a system of reactions that starts from a set of source compounds and produces a set of target molecules, while conserving or not depleting intermediate metabolites. Finding a shortest factory-that minimizes a sum of real-valued weights on its reactions to infer the most likely pathway-is NP-complete. The current state-of-the-art for shortest factories solves a mixed-integer linear program with a major drawback: it requires the user to set a critical parameter, where too large a value can make optimal solutions infeasible, while too small a value can yield degenerate solutions due to numerical error. We present the first robust algorithm for optimal factories that is both parameter-free (relieving the user from determining a parameter setting) and degeneracy-free (guaranteeing it finds an optimal nondegenerate solution). We also give for the first time a complete characterization of the graph-theoretic structure of shortest factories, that reveals an important class of degenerate solutions which was overlooked and potentially output by the prior state-of-the-art.We show degeneracy is precisely due to invalid stoichiometries in reactions, and provide an efficient algorithm for identifying all such misannotations in a metabolic network. In addition we settle the relationship between the two established pathway models of hyperpaths and factories by proving hyperpaths actually comprise a subclass of factories. Comprehensive experiments over all instances from the standard metabolic reaction databases in the literature demonstrate our parameter-free exact algorithm is fast in practice, quickly finding optimal factories in large real-world networks containing thousands of reactions. A preliminary implementation of our robust algorithm for shortest factories in a new tool called Freeia is available free for research use at http://freeia.cs.arizona.edu.
Assuntos
Algoritmos , Redes e Vias Metabólicas , Biologia de Sistemas/métodos , Modelos Biológicos , Biologia Computacional/métodosRESUMO
Signaling and metabolic pathways, which consist of chains of reactions that produce target molecules from source compounds, are cornerstones of cellular biology. Properly modeling the reaction networks that represent such pathways requires directed hypergraphs, where each molecule or compound maps to a vertex, and each reaction maps to a hyperedge directed from its set of input reactants to its set of output products. Inferring the most likely series of reactions that produces a given set of targets from a given set of sources, where for each reaction its reactants are produced by prior reactions in the series, corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. We give the first exact algorithm for general shortest hyperpaths that can find provably optimal solutions for large, real-world, reaction networks. In particular, we derive a novel graph-theoretic characterization of hyperpaths, which we leverage in a new integer linear programming formulation of shortest hyperpaths that for the first time handles cycles, and develop a cutting-plane algorithm that can solve this integer linear program to optimality in practice. Through comprehensive experiments over all of the thousands of instances from the standard Reactome and NCI-PID reaction databases, we demonstrate that our cutting-plane algorithm quickly finds an optimal hyperpath-inferring the most likely pathway-with a median running time of under 10 seconds, and a maximum time of less than 30 minutes, even on instances with thousands of reactions. We also explore for the first time how well hyperpaths infer true pathways, and show that shortest hyperpaths accurately recover known pathways, typically with very high precision and recall. Source code implementing our cutting-plane algorithm for shortest hyperpaths is available free for research use in a new tool called Mmunin.
Assuntos
Biologia Computacional , Software , Algoritmos , Redes e Vias Metabólicas , Transdução de SinaisRESUMO
BACKGROUND: Cell signaling pathways, which are a series of reactions that start at receptors and end at transcription factors, are basic to systems biology. Properly modeling the reactions in such pathways requires directed hypergraphs, where an edge is now directed between two sets of vertices. Inferring a pathway by the most parsimonious series of reactions corresponds to finding a shortest hyperpath in a directed hypergraph, which is NP-complete. The current state-of-the-art for shortest hyperpaths in cell signaling hypergraphs solves a mixed-integer linear program to find an optimal hyperpath that is restricted to be acyclic, and offers no efficiency guarantees. RESULTS: We present, for the first time, a heuristic for general shortest hyperpaths that properly handles cycles, and is guaranteed to be efficient. We show the heuristic finds provably optimal hyperpaths for the class of singleton-tail hypergraphs, and also give a practical algorithm for tractably generating all source-sink hyperpaths. The accuracy of the heuristic is demonstrated through comprehensive experiments on all source-sink instances from the standard NCI-PID and Reactome pathway databases, which show it finds a hyperpath that matches the state-of-the-art mixed-integer linear program on over 99% of all instances that are acyclic. On instances where only cyclic hyperpaths exist, the heuristic surpasses the state-of-the-art, which finds no solution; on every such cyclic instance, enumerating all source-sink hyperpaths shows the solution found by the heuristic was in fact optimal. CONCLUSIONS: The new shortest hyperpath heuristic is both fast and accurate. This makes finding source-sink hyperpaths, which in general may contain cycles, now practical for real cell signaling networks. AVAILABILITY: Source code for the hyperpath heuristic in a new tool we call Hhugin (as well as for hyperpath enumeration, and all dataset instances) is available free for non-commercial use at http://hhugin.cs.arizona.edu.
RESUMO
In this meeting overview, we summarise the scientific program and organisation of the 16th International Society for Computational Biology Student Council Symposium in 2020 (ISCB SCS2020). This symposium was the first virtual edition in an uninterrupted series of symposia that has been going on for 15 years, aiming to unite computational biology students and early career researchers across the globe.
Assuntos
Biologia Computacional , Estudantes , Humanos , PesquisadoresRESUMO
Technetium poses an environmental hazard because of its radioactivity and long half-life. It exists in the form of pertechnetate in the environment and can be modeled by the nonradioactive ion perrhenate, since pertechnetate and perrhenate have the same geometry and similar chemical properties. In this research, a new zinc cyclen resorcinarene cavitand (ZCR) column was used in ion chromatography (IC) to efficiently separate perrhenate. Ion chromatography has the advantage of requiring almost no sample preparation for water samples. The ZCR column demonstrated the ability to separate anions: fluoride, chloride, nitrate, sulfate, phosphate, perchlorate, and perrhenate by gradient 2-60 mM NaOH. Unlike other columns, the new column material was selective in retaining perrhenate. The ZCR column also gave a linear range from 2.0 to 1000 mg L-1 for perrhenate with R2 > 0.997. There was a logarithmic relationship between the concentration of perrhenate and its retention time. Excellent perrhenate recovery was achieved on the ZCR column when river water was spiked with perrhenate and perrhenate was preconcentrated. The efficient separations of perrhenate by the ZCR column will potentially assist in pertechnetate separations.