Search | VHL Regional Portal

1.

Constructing phylogenetic networks via cherry picking and machine learning.

Bernardini, Giulia; van Iersel, Leo; Julien, Esther; Stougie, Leen.

Algorithms Mol Biol ; 18(1): 13, 2023 Sep 16.

Article in English | MEDLINE | ID: mdl-37717003

ABSTRACT

BACKGROUND: Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. Existing methods are computationally expensive and can either handle only small numbers of phylogenetic trees or are limited to severely restricted classes of networks. RESULTS: In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of efficient heuristics that are guaranteed to produce a network containing each of the input trees, for practical-size datasets consisting of binary trees. Some of the heuristics in this framework are based on the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. We also propose simple and fast randomised heuristics that prove to be very effective when run multiple times. CONCLUSIONS: Unlike the existing exact methods, our heuristics are applicable to datasets of practical size, and the experimental study we conducted on both simulated and real data shows that these solutions are qualitatively good, always within some small constant factor from the optimum. Moreover, our machine-learned heuristics are one of the first applications of machine learning to phylogenetics and show its promise.

2.

A duality based 2-approximation algorithm for maximum agreement forest.

Olver, Neil; Schalekamp, Frans; van der Ster, Suzanne; Stougie, Leen; van Zuylen, Anke.

Math Program ; 198(1): 811-853, 2023.

Article in English | MEDLINE | ID: mdl-36845754

ABSTRACT

We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel exponential-size linear programming formulation. In addition, we show this linear program has a smaller integrality gap than previously known formulations, and we give an equivalent compact formulation, showing that it can be solved in polynomial time.

3.

Applicability of several rooted phylogenetic network algorithms for representing the evolutionary history of SARS-CoV-2.

Wallin, Rosanne; van Iersel, Leo; Kelk, Steven; Stougie, Leen.

BMC Ecol Evol ; 21(1): 220, 2021 12 07.

Article in English | MEDLINE | ID: mdl-34876022

ABSTRACT

BACKGROUND: Rooted phylogenetic networks are used to display complex evolutionary history involving so-called reticulation events, such as genetic recombination. Various methods have been developed to construct such networks, using for example a multiple sequence alignment or multiple phylogenetic trees as input data. Coronaviruses are known to recombine frequently, but rooted phylogenetic networks have not yet been used extensively to describe their evolutionary history. Here, we created a workflow to compare the evolutionary history of SARS-CoV-2 with other SARS-like viruses using several rooted phylogenetic network inference algorithms. This workflow includes filtering noise from sets of phylogenetic trees by contracting edges based on branch length and bootstrap support, followed by resolution of multifurcations. We explored the running times of the network inference algorithms, the impact of filtering on the properties of the produced networks, and attempted to derive biological insights regarding the evolution of SARS-CoV-2 from them. RESULTS: The network inference algorithms are capable of constructing rooted phylogenetic networks for coronavirus data, although running-time limitations require restricting such datasets to a relatively small number of taxa. Filtering generally reduces the number of reticulations in the produced networks and increases their temporal consistency. Taxon bat-SL-CoVZC45 emerges as a major and structural source of discordance in the dataset. The tested algorithms often indicate that SARS-CoV-2/RaTG13 is a tree-like clade, with possibly some reticulate activity further back in their history. A smaller number of constructed networks posit SARS-CoV-2 as a possible recombinant, although this might be a methodological artefact arising from the interaction of bat-SL-CoVZC45 discordance and the optimization criteria used. CONCLUSION: Our results demonstrate that as part of a wider workflow and with careful attention paid to running time, rooted phylogenetic network algorithms are capable of producing plausible networks from coronavirus data. These networks partly corroborate existing theories about SARS-CoV-2, and partly produce new avenues for exploration regarding the location and significance of reticulate activity within the wider group of SARS-like viruses. Our workflow may serve as a model for pipelines in which phylogenetic network algorithms can be used to analyse different datasets and test different hypotheses.

Subject(s)

COVID-19 , SARS-CoV-2 , Algorithms , Humans , Phylogeny

4.

MOOMIN - Mathematical explOration of 'Omics data on a MetabolIc Network.

Pusa, Taneli; Ferrarini, Mariana Galvão; Andrade, Ricardo; Mary, Arnaud; Marchetti-Spaccamela, Alberto; Stougie, Leen; Sagot, Marie-France.

Bioinformatics ; 36(2): 514-523, 2020 01 15.

Article in English | MEDLINE | ID: mdl-31504164

ABSTRACT

MOTIVATION: Analysis of differential expression of genes is often performed to understand how the metabolic activity of an organism is impacted by a perturbation. However, because the system of metabolic regulation is complex and all changes are not directly reflected in the expression levels, interpreting these data can be difficult. RESULTS: In this work, we present a new algorithm and computational tool that uses a genome-scale metabolic reconstruction to infer metabolic changes from differential expression data. Using the framework of constraint-based analysis, our method produces a qualitative hypothesis of a change in metabolic activity. In other words, each reaction of the network is inferred to have increased, decreased, or remained unchanged in flux. In contrast to similar previous approaches, our method does not require a biological objective function and does not assign on/off activity states to genes. An implementation is provided and it is available online. We apply the method to three published datasets to show that it successfully accomplishes its two main goals: confirming or rejecting metabolic changes suggested by differentially expressed genes based on how well they fit in as parts of a coordinated metabolic change, as well as inferring changes in reactions whose genes did not undergo differential expression. AVAILABILITY AND IMPLEMENTATION: github.com/htpusa/moomin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Metabolic Networks and Pathways , Algorithms , Computational Biology , Genome , Models, Biological

5.

Full-length de novo viral quasispecies assembly through variation graph construction.

Baaijens, Jasmijn A; Van der Roest, Bastiaan; Köster, Johannes; Stougie, Leen; Schönhuth, Alexander.

Bioinformatics ; 35(24): 5086-5094, 2019 12 15.

Article in English | MEDLINE | ID: mdl-31147688

ABSTRACT

MOTIVATION: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent ('de novo') approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. RESULTS: We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. AVAILABILITY AND IMPLEMENTATION: Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Quasispecies , Algorithms , Genome , Haplotypes , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Software

6.

On Unrooted and Root-Uncertain Variants of Several Well-Known Phylogenetic Network Problems.

van Iersel, Leo; Kelk, Steven; Stamoulis, Georgios; Stougie, Leen; Boes, Olivier.

Algorithmica ; 80(11): 2993-3022, 2018.

Article in English | MEDLINE | ID: mdl-30956378

ABSTRACT

The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an unrooted phylogenetic network displays (i.e. embeds) an unrooted phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the tree bisection and reconnect distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.

7.

Enumeration of minimal stoichiometric precursor sets in metabolic networks.

Andrade, Ricardo; Wannagat, Martin; Klein, Cecilia C; Acuña, Vicente; Marchetti-Spaccamela, Alberto; Milreu, Paulo V; Stougie, Leen; Sagot, Marie-France.

Algorithms Mol Biol ; 11: 25, 2016.

Article in English | MEDLINE | ID: mdl-27679654

ABSTRACT

BACKGROUND: What an organism needs at least from its environment to produce a set of metabolites, e.g. target(s) of interest and/or biomass, has been called a minimal precursor set. Early approaches to enumerate all minimal precursor sets took into account only the topology of the metabolic network (topological precursor sets). Due to cycles and the stoichiometric values of the reactions, it is often not possible to produce the target(s) from a topological precursor set in the sense that there is no feasible flux. Although considering the stoichiometry makes the problem harder, it enables to obtain biologically reasonable precursor sets that we call stoichiometric. Recently a method to enumerate all minimal stoichiometric precursor sets was proposed in the literature. The relationship between topological and stoichiometric precursor sets had however not yet been studied. RESULTS: Such relationship between topological and stoichiometric precursor sets is highlighted. We also present two algorithms that enumerate all minimal stoichiometric precursor sets. The first one is of theoretical interest only and is based on the above mentioned relationship. The second approach solves a series of mixed integer linear programming problems. We compared the computed minimal precursor sets to experimentally obtained growth media of several Escherichia coli strains using genome-scale metabolic networks. CONCLUSIONS: The results show that the second approach efficiently enumerates minimal precursor sets taking stoichiometry into account, and allows for broad in silico studies of strains or species interactions that may help to understand e.g. pathotype and niche-specific metabolic capabilities. sasita is written in Java, uses cplex as LP solver and can be downloaded together with all networks and input files used in this paper at http://www.sasita.gforge.inria.fr.

8.

A Combinatorial Algorithm for Microbial Consortia Synthetic Design.

Julien-Laferrière, Alice; Bulteau, Laurent; Parrot, Delphine; Marchetti-Spaccamela, Alberto; Stougie, Leen; Vinga, Susana; Mary, Arnaud; Sagot, Marie-France.

Sci Rep ; 6: 29182, 2016 07 04.

Article in English | MEDLINE | ID: mdl-27373593

ABSTRACT

Synthetic biology has boomed since the early 2000s when it started being shown that it was possible to efficiently synthetize compounds of interest in a much more rapid and effective way by using other organisms than those naturally producing them. However, to thus engineer a single organism, often a microbe, to optimise one or a collection of metabolic tasks may lead to difficulties when attempting to obtain a production system that is efficient, or to avoid toxic effects for the recruited microorganism. The idea of using instead a microbial consortium has thus started being developed in the last decade. This was motivated by the fact that such consortia may perform more complicated functions than could single populations and be more robust to environmental fluctuations. Success is however not always guaranteed. In particular, establishing which consortium is best for the production of a given compound or set thereof remains a great challenge. This is the problem we address in this paper. We thus introduce an initial model and a method that enable to propose a consortium to synthetically produce compounds that are either exogenous to it, or are endogenous but where interaction among the species in the consortium could improve the production line.

Subject(s)

Algorithms , Microbial Consortia , Synthetic Biology/methods , Acetates/metabolism , Bacteria/metabolism , Biotechnology , Glycerol/metabolism , Propylene Glycols/metabolism

9.

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.

Patterson, Murray; Marschall, Tobias; Pisanti, Nadia; van Iersel, Leo; Stougie, Leen; Klau, Gunnar W; Schönhuth, Alexander.

J Comput Biol ; 22(6): 498-509, 2015 Jun.

Article in English | MEDLINE | ID: mdl-25658651

ABSTRACT

The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of the genome. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which are oblivious to direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. While future-technology sequencing reads will contain sufficient amounts of SNPs per read for phasing, they are also likely to suffer from higher sequencing error rates. Currently, no haplotype assembly approaches exist that allow for taking both increasing read length and sequencing error information into account. Here, we suggest WhatsHap, the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20×, and that 15× are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates. We also find that the switch and flip error rates of the haplotypes we output are favorable when comparing them with state-of-the-art statistical phasers.

Subject(s)

Haplotypes/genetics , Sequence Analysis, DNA/methods , Diploidy , Genetics, Population/methods , Genome, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Polymorphism, Single Nucleotide/genetics

10.

Fast flux module detection using matroid theory.

Reimers, Arne C; Bruggeman, Frank J; Olivier, Brett G; Stougie, Leen.

J Comput Biol ; 22(5): 414-24, 2015 May.

Article in English | MEDLINE | ID: mdl-25565150

ABSTRACT

Flux balance analysis (FBA) is one of the most often applied methods on genome-scale metabolic networks. Although FBA uniquely determines the optimal yield, the pathway that achieves this is usually not unique. The analysis of the optimal-yield flux space has been an open challenge. Flux variability analysis is only capturing some properties of the flux space, while elementary mode analysis is intractable due to the enormous number of elementary modes. However, it has been found by Kelk et al. (2012) that the space of optimal-yield fluxes decomposes into flux modules. These decompositions allow a much easier but still comprehensive analysis of the optimal-yield flux space. Using the mathematical definition of module introduced by Müller and Bockmayr (2013b), we discovered useful connections to matroid theory, through which efficient algorithms enable us to compute the decomposition into modules in a few seconds for genome-scale networks. Using that every module can be represented by one reaction that represents its function, in this article, we also present a method that uses this decomposition to visualize the interplay of modules. We expect the new method to replace flux variability analysis in the pipelines for metabolic networks.

Subject(s)

Algorithms , Genome , Metabolic Networks and Pathways/genetics , Models, Statistical , Computer Graphics , Escherichia coli/genetics , Helicobacter pylori/genetics , Humans , Mathematical Computing , Mycobacterium/genetics , Mycobacterium tuberculosis/genetics , Saccharomyces cerevisiae/genetics , Staphylococcus aureus/genetics , Time Factors

11.

Telling metabolic stories to explore metabolomics data: a case study on the yeast response to cadmium exposure.

Milreu, Paulo Vieira; Klein, Cecilia Coimbra; Cottret, Ludovic; Acuña, Vicente; Birmelé, Etienne; Borassi, Michele; Junot, Christophe; Marchetti-Spaccamela, Alberto; Marino, Andrea; Stougie, Leen; Jourdan, Fabien; Crescenzi, Pierluigi; Lacroix, Vincent; Sagot, Marie-France.

Bioinformatics ; 30(1): 61-70, 2014 Jan 01.

Article in English | MEDLINE | ID: mdl-24167155

ABSTRACT

MOTIVATION: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. RESULTS: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. AVAILABILITY AND IMPLEMENTATION: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.

Subject(s)

Cadmium/pharmacology , Metabolomics/methods , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae/metabolism , Enzyme Activation , Glutathione/biosynthesis

12.

Ancient dispersal of the human fungal pathogen Cryptococcus gattii from the Amazon rainforest.

Hagen, Ferry; Ceresini, Paulo C; Polacheck, Itzhack; Ma, Hansong; van Nieuwerburgh, Filip; Gabaldón, Toni; Kagan, Sarah; Pursall, E Rhiannon; Hoogveld, Hans L; van Iersel, Leo J J; Klau, Gunnar W; Kelk, Steven M; Stougie, Leen; Bartlett, Karen H; Voelz, Kerstin; Pryszcz, Leszek P; Castañeda, Elizabeth; Lazera, Marcia; Meyer, Wieland; Deforce, Dieter; Meis, Jacques F; May, Robin C; Klaassen, Corné H W; Boekhout, Teun.

PLoS One ; 8(8): e71148, 2013.

Article in English | MEDLINE | ID: mdl-23940707

ABSTRACT

Over the past two decades, several fungal outbreaks have occurred, including the high-profile 'Vancouver Island' and 'Pacific Northwest' outbreaks, caused by Cryptococcus gattii, which has affected hundreds of otherwise healthy humans and animals. Over the same time period, C. gattii was the cause of several additional case clusters at localities outside of the tropical and subtropical climate zones where the species normally occurs. In every case, the causative agent belongs to a previously rare genotype of C. gattii called AFLP6/VGII, but the origin of the outbreak clades remains enigmatic. Here we used phylogenetic and recombination analyses, based on AFLP and multiple MLST datasets, and coalescence gene genealogy to demonstrate that these outbreaks have arisen from a highly-recombining C. gattii population in the native rainforest of Northern Brazil. Thus the modern virulent C. gattii AFLP6/VGII outbreak lineages derived from mating events in South America and then dispersed to temperate regions where they cause serious infections in humans and animals.

Subject(s)

Cryptococcosis/microbiology , Cryptococcus gattii/genetics , Animals , Brazil , British Columbia/epidemiology , Cells, Cultured , Cryptococcosis/epidemiology , Cryptococcus gattii/classification , Cryptococcus gattii/pathogenicity , Disease Outbreaks , Genes, Fungal , Humans , Likelihood Functions , Macrophages/microbiology , Mice , Mice, Inbred BALB C , Molecular Sequence Data , Multilocus Sequence Typing , Mycological Typing Techniques , Northwestern United States/epidemiology , Phylogeny , Polymorphism, Restriction Fragment Length , Trees , Tropical Climate , Virulence

13.

A realistic model under which the genetic code is optimal.

Buhrman, Harry; van der Gulik, Peter T S; Klau, Gunnar W; Schaffner, Christian; Speijer, Dave; Stougie, Leen.

J Mol Evol ; 77(4): 170-84, 2013 Oct.

Article in English | MEDLINE | ID: mdl-23877342

ABSTRACT

The genetic code has a high level of error robustness. Using values of hydrophobicity scales as a proxy for amino acid character, and the mean square measure as a function quantifying error robustness, a value can be obtained for a genetic code which reflects the error robustness of that code. By comparing this value with a distribution of values belonging to codes generated by random permutations of amino acid assignments, the level of error robustness of a genetic code can be quantified. We present a calculation in which the standard genetic code is shown to be optimal. We obtain this result by (1) using recently updated values of polar requirement as input; (2) fixing seven assignments (Ile, Trp, His, Phe, Tyr, Arg, and Leu) based on aptamer considerations; and (3) using known biosynthetic relations of the 20 amino acids. This last point is reflected in an approach of subdivision (restricting the random reallocation of assignments to amino acid subgroups, the set of 20 being divided in four such subgroups). The three approaches to explain robustness of the code (specific selection for robustness, amino acid-RNA interactions leading to assignments, or a slow growth process of assignment patterns) are reexamined in light of our findings. We offer a comprehensive hypothesis, stressing the importance of biosynthetic relations, with the code evolving from an early stage with just glycine and alanine, via intermediate stages, towards 64 codons carrying todays meaning.

Subject(s)

Amino Acids/chemistry , Amino Acids/genetics , Genetic Code , Models, Genetic , Aptamers, Peptide/chemistry , Aptamers, Peptide/genetics , Codon , Evolution, Molecular , Protein Biosynthesis/genetics , Protein Biosynthesis/physiology

14.

Networks: expanding evolutionary thinking.

Bapteste, Eric; van Iersel, Leo; Janke, Axel; Kelchner, Scot; Kelk, Steven; McInerney, James O; Morrison, David A; Nakhleh, Luay; Steel, Mike; Stougie, Leen; Whitfield, James.

Trends Genet ; 29(8): 439-41, 2013 Aug.

Article in English | MEDLINE | ID: mdl-23764187

ABSTRACT

Networks allow the investigation of evolutionary relationships that do not fit a tree model. They are becoming a leading tool for describing the evolutionary relationships between organisms, given the comparative complexities among genomes.

Subject(s)

Biological Evolution , Models, Genetic , Phylogeny , Animals , Genome , Saccharomyces/genetics

15.

Charge group partitioning in biomolecular simulation.

Canzar, Stefan; El-Kebir, Mohammed; Pool, René; Elbassioni, Khaled; Mark, Alan E; Geerke, Daan P; Stougie, Leen; Klau, Gunnar W.

J Comput Biol ; 20(3): 188-98, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23461571

ABSTRACT

Molecular simulation techniques are increasingly being used to study biomolecular systems at an atomic level. Such simulations rely on empirical force fields to represent the intermolecular interactions. There are many different force fields available--each based on a different set of assumptions and thus requiring different parametrization procedures. Recently, efforts have been made to fully automate the assignment of force-field parameters, including atomic partial charges, for novel molecules. In this work, we focus on a problem arising in the automated parametrization of molecules for use in combination with the GROMOS family of force fields: namely, the assignment of atoms to charge groups such that for every charge group the sum of the partial charges is ideally equal to its formal charge. In addition, charge groups are required to have size at most k. We show NP-hardness and give an exact algorithm that solves practical problem instances to provable optimality in a fraction of a second.

Subject(s)

Adenosine Triphosphate/chemistry , Amino Acids/chemistry , Electrons , Molecular Dynamics Simulation , Hydrogen-Ion Concentration , Thermodynamics , Water/chemistry

16.

Optimal flux spaces of genome-scale stoichiometric models are determined by a few subnetworks.

Kelk, Steven M; Olivier, Brett G; Stougie, Leen; Bruggeman, Frank J.

Sci Rep ; 2: 580, 2012.

Article in English | MEDLINE | ID: mdl-22896812

ABSTRACT

The metabolism of organisms can be studied with comprehensive stoichiometric models of their metabolic networks. Flux balance analysis (FBA) calculates optimal metabolic performance of stoichiometric models. However, detailed biological interpretation of FBA is limited because, in general, a huge number of flux patterns give rise to the same optimal performance. The complete description of the resulting optimal solution spaces was thus far a computationally intractable problem. Here we present CoPE-FBA: Comprehensive Polyhedra Enumeration Flux Balance Analysis, a computational method that solves this problem. CoPE-FBA indicates that the thousands to millions of optimal flux patterns result from a combinatorial explosion of flux patterns in just a few metabolic sub-networks. The entire optimal solution space can now be compactly described in terms of the topology of these sub-networks. CoPE-FBA simplifies the biological interpretation of stoichiometric models of metabolism, and provides a profound understanding of metabolic flexibility in optimal states.

Subject(s)

Computational Biology/methods , Metabolic Networks and Pathways , Metabolomics/methods , Models, Biological , Computer Simulation , Escherichia coli/metabolism , Glucose/metabolism , Software

17.

Algorithms and complexity of enumerating minimal precursor sets in genome-wide metabolic networks.

Acuña, Vicente; Milreu, Paulo Vieira; Cottret, Ludovic; Marchetti-Spaccamela, Alberto; Stougie, Leen; Sagot, Marie-France.

Bioinformatics ; 28(19): 2474-83, 2012 Oct 01.

Article in English | MEDLINE | ID: mdl-22782547

ABSTRACT

MOTIVATION: In the context of studying whole metabolic networks and their interaction with the environment, the following question arises: given a set of target metabolites T and a set of possible external source metabolites , which are the minimal subsets of that are able to produce all the metabolites in T. Such subsets are called the minimal precursor sets of T. The problem is then whether we can enumerate all of them efficiently. RESULTS: We propose a new characterization of precursor sets as the inputs of reaction sets called factories and an efficient algorithm to decide if a set of sources is precursor set of T. We show proofs of hardness for the problems of finding a precursor set of minimum size and of enumerating all minimal precursor sets T. We propose two new algorithms which, despite the hardness of the enumeration problem, allow to enumerate all minimal precursor sets in networks with up to 1000 reactions. AVAILABILITY: Source code and datasets used in our benchmarks are freely available for download at http://sites.google.com/site/pitufosoftware/download. CONTACT: vicente77@gmail.com, pvmilreu@gmail.com or marie-france.sagot@inria.fr.

Subject(s)

Algorithms , Computational Biology/methods , Metabolic Networks and Pathways , Models, Theoretical

18.

Some mathematical refinements concerning error minimization in the genetic code.

Buhrman, Harry; van der Gulik, Peter T S; Kelk, Steven M; Koolen, Wouter M; Stougie, Leen.

IEEE/ACM Trans Comput Biol Bioinform ; 8(5): 1358-72, 2011.

Article in English | MEDLINE | ID: mdl-21358008

ABSTRACT

The genetic code is known to have a high level of error robustness and has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate this optimization problem as a Quadratic Assignment Problem and use this to formally verify that the code found by the heuristic algorithm is the global optimum. We also argue that it is strongly misleading to compare the genetic code only with codes sampled from the fixed block model, because the real code space is orders of magnitude larger. We thus enlarge the space from which random codes can be sampled from approximately 2.433 × 10(18) codes to approximately 5.908 × 10(45) codes. We do this by leaving the fixed block model, and using the wobble rules to formulate the characteristics acceptable for a genetic code. By relaxing more constraints, three larger spaces are also constructed. Using a modified error function, the genetic code is found to be more error robust compared to a background of randomly generated codes with increasing space size. We point out that these results do not necessarily imply that the code was optimized during evolution for error minimization, but that other mechanisms could be the reason for this error robustness.

Subject(s)

Algorithms , Computational Biology/methods , Genetic Code , Models, Genetic , Evolution, Molecular

19.

Graph-based analysis of the metabolic exchanges between two co-resident intracellular symbionts, Baumannia cicadellinicola and Sulcia muelleri, with their insect host, Homalodisca coagulata.

Cottret, Ludovic; Milreu, Paulo Vieira; Acuña, Vicente; Marchetti-Spaccamela, Alberto; Stougie, Leen; Charles, Hubert; Sagot, Marie-France.

PLoS Comput Biol ; 6(9)2010 Sep 02.

Article in English | MEDLINE | ID: mdl-20838465

ABSTRACT

Endosymbiotic bacteria from different species can live inside cells of the same eukaryotic organism. Metabolic exchanges occur between host and bacteria but also between different endocytobionts. Since a complete genome annotation is available for both, we built the metabolic network of two endosymbiotic bacteria, Sulcia muelleri and Baumannia cicadellinicola, that live inside specific cells of the sharpshooter Homalodisca coagulata and studied the metabolic exchanges involving transfers of carbon atoms between the three. We automatically determined the set of metabolites potentially exogenously acquired (seeds) for both metabolic networks. We show that the number of seeds needed by both bacteria in the carbon metabolism is extremely reduced. Moreover, only three seeds are common to both metabolic networks, indicating that the complementarity of the two metabolisms is not only manifested in the metabolic capabilities of each bacterium, but also by their different use of the same environment. Furthermore, our results show that the carbon metabolism of S. muelleri may be completely independent of the metabolic network of B. cicadellinicola. On the contrary, the carbon metabolism of the latter appears dependent on the metabolism of S. muelleri, at least for two essential amino acids, threonine and lysine. Next, in order to define which subsets of seeds (precursor sets) are sufficient to produce the metabolites involved in a symbiotic function, we used a graph-based method, PITUFO, that we recently developed. Our results highly refine our knowledge about the complementarity between the metabolisms of the two bacteria and their host. We thus indicate seeds that appear obligatory in the synthesis of metabolites are involved in the symbiotic function. Our results suggest both B. cicadellinicola and S. muelleri may be completely independent of the metabolites provided by the co-resident endocytobiont to produce the carbon backbone of the metabolites provided to the symbiotic system (., thr and lys are only exploited by B. cicadellinicola to produce its proteins).

Subject(s)

Bacteroidetes/metabolism , Gammaproteobacteria/metabolism , Hemiptera/metabolism , Hemiptera/microbiology , Animals , Bacteroidetes/pathogenicity , Computational Biology/methods , Gammaproteobacteria/pathogenicity , Host-Pathogen Interactions/physiology , Metabolic Networks and Pathways/physiology , Metabolome , Models, Biological , Symbiosis

20.

A note on the complexity of finding and enumerating elementary modes.

Acuña, Vicente; Marchetti-Spaccamela, Alberto; Sagot, Marie-France; Stougie, Leen.

Biosystems ; 99(3): 210-4, 2010 Mar.

Article in English | MEDLINE | ID: mdl-19962421

ABSTRACT

In the context of the study into elementary modes of metabolic networks, we prove two complexity results. Enumerating elementary modes containing a specific reaction is hard in an enumeration complexity sense. The decision problem if there exists an elementary mode containing two specific reactions is NP-complete. The complexity of enumerating all elementary modes remains open.

Subject(s)

Metabolic Networks and Pathways , Models, Biological , Metabolic Networks and Pathways/physiology

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL