Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 39(11)2023 11 01.
Article in English | MEDLINE | ID: mdl-37889263

ABSTRACT

MOTIVATION: The Balanced Minimum Evolution (BME) is a powerful distance based phylogenetic estimation model introduced by Desper and Gascuel and nowadays implemented in popular tools for phylogenetic analyses. It was proven to be computationally less demanding than more sophisticated estimation methods, e.g. maximum likelihood or Bayesian inference while preserving the statistical consistency and the ability to run with almost any kind of data for which a dissimilarity measure is available. BME can be stated in terms of a nonlinear non-convex combinatorial optimization problem, usually referred to as the Balanced Minimum Evolution Problem (BMEP). Currently, the state-of-the-art among approximate methods for the BMEP is represented by FastME (version 2.0), a software which implements several deterministic phylogenetic construction heuristics combined with a local search on specific neighbourhoods derived by classical topological tree rearrangements. These combinations, however, may not guarantee convergence to close-to-optimal solutions to the problem due to the lack of solution space exploration, a phenomenon which is exacerbated when tackling molecular datasets characterized by a large number of taxa. RESULTS: To overcome such convergence issues, in this article, we propose a novel metaheuristic, named PhyloES, which exploits the combination of an exploration phase based on Evolution Strategies, a special type of evolutionary algorithm, with a refinement phase based on two local search algorithms. Extensive computational experiments show that PhyloES consistently outperforms FastME, especially when tackling larger datasets, providing solutions characterized by a shorter tree length but also significantly different from the topological perspective. AVAILABILITY AND IMPLEMENTATION: The software and the data are available at https://github.com/andygaspar/PHYLOES.


Subject(s)
Algorithms , Models, Genetic , Phylogeny , Bayes Theorem , Software , Evolution, Molecular
2.
Article in English | MEDLINE | ID: mdl-26353381

ABSTRACT

Ductal Carcinoma In Situ (DCIS) is a precursor lesion of Invasive Ductal Carcinoma (IDC) of the breast. Investigating its temporal progression could provide fundamental new insights for the development of better diagnostic tools to predict which cases of DCIS will progress to IDC. We investigate the problem of reconstructing a plausible progression from single-cell sampled data of an individual with synchronous DCIS and IDC. Specifically, by using a number of assumptions derived from the observation of cellular atypia occurring in IDC, we design a possible predictive model using integer linear programming (ILP). Computational experiments carried out on a preexisting data set of 13 patients with simultaneous DCIS and IDC show that the corresponding predicted progression models are classifiable into categories having specific evolutionary characteristics. The approach provides new insights into mechanisms of clonal progression in breast cancers and helps illustrate the power of the ILP approach for similar problems in reconstructing tumor evolution scenarios under complex sets of constraints.


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/physiopathology , Carcinoma, Ductal/genetics , Carcinoma, Ductal/physiopathology , Computational Biology/methods , Algorithms , Breast Neoplasms/diagnosis , Carcinoma, Ductal/diagnosis , Clonal Evolution , Cluster Analysis , Female , Humans , Models, Genetic
3.
Algorithms Mol Biol ; 8(1): 3, 2013 Jan 23.
Article in English | MEDLINE | ID: mdl-23343437

ABSTRACT

BACKGROUND: Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from medical research, to drug discovery, to epidemiology, to population dynamics. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives. Usually, such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. One of the most important estimation criteria is the parsimony which states that the optimal phylogeny T∗for a set H of n haplotype sequences over a common set of variable loci is the one that satisfies the following requirements: (i) it has the shortest length and (ii) it is such that, for each pair of distinct haplotypes hi,hj∈H, the sum of the edge weights belonging to the path from hi to hj in T∗ is not smaller than the observed number of changes between hi and hj. Finding the most parsimonious phylogeny for H involves solving an optimization problem, called the Most Parsimonious Phylogeny Estimation Problem (MPPEP), which is NP-hard in many of its versions. RESULTS: In this article we investigate a recent version of the MPPEP that arises when input data consist of single nucleotide polymorphism haplotypes extracted from a population of individuals on a common genomic region. Specifically, we explore the prospects for improving on the implicit enumeration strategy of implicit enumeration strategy used in previous work using a novel problem formulation and a series of strengthening valid inequalities and preliminary symmetry breaking constraints to more precisely bound the solution space and accelerate implicit enumeration of possible optimal phylogenies. We present the basic formulation and then introduce a series of provable valid constraints to reduce the solution space. We then prove that these constraints can often lead to significant reductions in the gap between the optimal solution and its non-integral linear programming bound relative to the prior art as well as often substantially faster processing of moderately hard problem instances. CONCLUSION: We provide an indication of the conditions under which such an optimal enumeration approach is likely to be feasible, suggesting that these strategies are usable for relatively large numbers of taxa, although with stricter limits on numbers of variable sites. The work thus provides methodology suitable for provably optimal solution of some harder instances that resist all prior approaches.

4.
Article in English | MEDLINE | ID: mdl-24407298

ABSTRACT

A loss of heterozygosity (LOH) event occurs when, by the laws of Mendelian inheritance, an individual should be heterozygote at a given site but, due to a deletion polymorphism, is not. Deletions play an important role in human disease and their detection could provide fundamental insights for the development of new diagnostics and treatments. In this paper, we investigate the parsimonious loss of heterozygosity problem (PLOHP), i.e., the problem of partitioning suspected polymorphisms from a set of individuals into a minimum number of deletion areas. Specifically, we generalize Halldórsson et al.'s work by providing a more general formulation of the PLOHP and by showing how one can incorporate different recombination rates and prior knowledge about the locations of deletions. Moreover, we show that the PLOHP can be formulated as a specific version of the clique partition problem in a particular class of graphs called undirected catch-point interval graphs and we prove its general $({\cal NP})$-hardness. Finally, we provide a state-of-the-art integer programming (IP) formulation and strengthening valid inequalities to exactly solve real instances of the PLOHP containing up to 9,000 individuals and 3,000 SNPs. Our results give perspectives on the mathematics of the PLOHP and suggest new directions on the development of future efficient exact solution approaches.


Subject(s)
Computational Biology/methods , Loss of Heterozygosity , Models, Genetic , Algorithms , Gene Deletion , Genome, Human , Genome-Wide Association Study , Heterozygote , Humans , Polymorphism, Genetic , Polymorphism, Single Nucleotide , Recombination, Genetic , Sequence Homology, Nucleic Acid , Software
5.
PLoS One ; 6(3): e17937, 2011 Mar 25.
Article in English | MEDLINE | ID: mdl-21464966

ABSTRACT

The Pure Parsimony Haplotyping (PPH) problem is a NP-hard combinatorial optimization problem that consists of finding the minimum number of haplotypes necessary to explain a given set of genotypes. PPH has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from mapping complex disease genes to inferring population histories, passing through designing drugs, functional genomics and pharmacogenetics. In this article we investigate, for the first time, a recent version of PPH called the Pure Parsimony Haplotype problem under Uncertain Data (PPH-UD). This version mainly arises when the input genotypes are not accurate, i.e., when some single nucleotide polymorphisms are missing or affected by errors. We propose an exact approach to solution of PPH-UD based on an extended version of Catanzaro et al.[1] class representative model for PPH, currently the state-of-the-art integer programming model for PPH. The model is efficient, accurate, compact, polynomial-sized, easy to implement, solvable with any solver for mixed integer programming, and usable in all those cases for which the parsimony criterion is well suited for haplotype estimation.


Subject(s)
Databases, Genetic , Haplotypes/genetics , Models, Genetic , Uncertainty , Humans
6.
Hum Immunol ; 71(8): 783-8, 2010 Aug.
Article in English | MEDLINE | ID: mdl-20447432

ABSTRACT

We present an integer programming model for human leukocyte antigen (HLA) association studies based on the parsimony criterion. The model is simple, compact, easy to implement, and able to handle datasets containing up to 200 phenotypes. Computational experiments carried out on patients affected by psoriasis and severe alopecia areata show that the model is consistent with the experimental haplotype frequencies, showing, for the considered diseases at least, a high reliability of the predictions. These promising results provide perspective on computer-aided association studies and encourage the development of efficient exact computational approaches for haplotype estimation.


Subject(s)
Alopecia Areata/genetics , Computational Biology/methods , HLA Antigens/genetics , Psoriasis/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Algorithms , Alopecia Areata/immunology , Child , Child, Preschool , Female , Gene Frequency , Haplotypes , Humans , Infant , Male , Middle Aged , Models, Genetic , Phenotype , Psoriasis/immunology , Reproducibility of Results , Severity of Illness Index , Young Adult
7.
BMC Evol Biol ; 7: 228, 2007 Nov 15.
Article in English | MEDLINE | ID: mdl-18005416

ABSTRACT

BACKGROUND: Distance matrix methods constitute a major family of phylogenetic estimation methods, and the minimum evolution (ME) principle (aiming at recovering the phylogeny with shortest length) is one of the most commonly used optimality criteria for estimating phylogenetic trees. The major difficulty for its application is that the number of possible phylogenies grows exponentially with the number of taxa analyzed and the minimum evolution principle is known to belong to the NRho- hard class of problems. RESULTS: In this paper, we introduce an Ant Colony Optimization (ACO) algorithm to estimate phylogenies under the minimum evolution principle. ACO is an optimization technique inspired from the foraging behavior of real ant colonies. This behavior is exploited in artificial ant colonies for the search of approximate solutions to discrete optimization problems. CONCLUSION: We show that the ACO algorithm is potentially competitive in comparison with state-of-the-art algorithms for the minimum evolution principle. This is the first application of an ACO algorithm to the phylogenetic estimation problem.


Subject(s)
Algorithms , Ants/genetics , Evolution, Molecular , Phylogeny , Animals , Ants/physiology , Behavior, Animal
8.
Evol Bioinform Online ; 2: 145-55, 2007 Feb 04.
Article in English | MEDLINE | ID: mdl-19455208

ABSTRACT

The General Time Reversible (GTR) model of nucleotide substitution is at the core of many distance-based and character-based phylogeny inference methods. The procedure described by Waddell and Steel (1997), for estimating distances and instantaneous substitution rate matrices, R, under the GTR model, is known to be inapplicable under some conditions, ie, it leads to the inapplicability of the GTR model. Here, we simulate the evolution of DNA sequences along 12 trees characterized by different combinations of tree length, (non-)homogeneity of the substitution rate matrix R, and sequence length. We then evaluate both the frequency of the GTR model inapplicability for estimating distances and the accuracy of inferred alignments. Our results indicate that, inapplicability of the Waddel and Steel's procedure can be considered a real practical issue, and illustrate that the probability of this inapplicability is a function of substitution rates and sequence length.We also discuss the implications of our results on the current implementations of maximum likelihood and Bayesian methods.

9.
Bioinformatics ; 22(6): 708-15, 2006 Mar 15.
Article in English | MEDLINE | ID: mdl-16397009

ABSTRACT

MOTIVATION: The general-time-reversible (GTR) model is one of the most popular models of nucleotide substitution because it constitutes a good trade-off between mathematical tractability and biological reality. However, when it is applied for inferring evolutionary distances and/or instantaneous rate matrices, the GTR model seems more prone to inapplicability than more restrictive time-reversible models. Although it has been previously noted that the causes for intractability are caused by the impossibility of computing the logarithm of a matrix characterised by negative eigenvalues, the issue has not been investigated further. RESULTS: Here, we formally characterize the mathematical conditions, and discuss their biological interpretation, which lead to the inapplicability of the GTR model. We investigate the relations between, on one hand, the occurrence of negative eigenvalues and, on the other hand, both sequence length and sequence divergence. We then propose a possible re-formulation of previous procedures in terms of a non-linear optimization problem. We analytically investigate the effect of our approach on the estimated evolutionary distances and transition probability matrix. Finally, we provide an analysis on the goodness of the solution we propose. A numerical example is discussed.


Subject(s)
Algorithms , Base Pair Mismatch/genetics , Chromosome Mapping/methods , Evolution, Molecular , Linkage Disequilibrium/genetics , Models, Genetic , Sequence Analysis, DNA/methods , Computer Simulation , Genetic Variation/genetics , Nonlinear Dynamics , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL