RESUMO
In 1965, Jack Edmonds characterized pairs of graphs G and G* with a bijection between their edge sets that form a pair of dual graphs realizing the vertices and countries of a map embedded in a surface. A necessary condition is that, if d = (d1, , dn) and t = (t1, , tm) denote the degree sequences of two such graphs, then ∑ i = 1 n d i = ∑ j = 1 m t j = 2 l , where l is the number of edges in each of the two graphs and χ = n + m - l is the Euler characteristic of the surface. However, this condition is not sufficient, and it is an open question to characterize bi-vectors (d, t) that are geographic, that is, that can be realized as the degree sequences of pairs G and G* of surface-embedded graphs. The above question is a special case of the following one. A multigraph G is even if each vertex has even degree and 3-colored if G is equipped with a fixed proper coloring of its vertex set assigning each vertex a color in the set {1,2,3}. Let G be a 3-colored even multigraph embedded in a surface S so that every face is a triangle. Denote by d = (d1, , dn), t = (t1, , tm), and δ = (δ1, ..., , δk) the sequences of half-degrees of vertices of G of colors 1, 2, and 3, respectively. Then, ∑ i = 1 n d i = ∑ j = 1 m t j = ∑ µ = 1 k t µ = l , where χ = n + k + m - l is the Euler characteristic of the surface S. A tri-vector (d, t, δ) satisfying the above conditions is called feasible. A feasible tri-vector is called geographic if it is realized by a 3-colored triangulation of a surface. Geographic tri-vectors extend the concept of geographic bi-vectors. We present a dataset of geographic bi-vectors and tri-vectors, along with realizations proving that they are geographic.
RESUMO
We generalize the notion of λ-superstrings, presented in a previous paper, to the notion of weighted λ-superstrings. This generalization entails an important improvement in the applications to vaccine designs, as it allows epitopes to be weighted by their immunogenicities. Motivated by these potential applications of constructing short weighted λ-superstrings to vaccine design, we approach this problem in two ways. First, we formalize the problem as a combinatorial optimization problem (in fact, as two polynomially equivalent problems) and develop an integer programming (IP) formulation for solving it optimally. Second, we describe a model that also takes into account good pairwise alignments of the obtained superstring with the input strings, and present a genetic algorithm that solves the problem approximately. We apply both algorithms to a set of 169 strings corresponding to the Nef protein taken from patiens infected with HIV-1. In the IP-based algorithm, we take the epitopes and the estimation of the immunogenicities from databases of experimental epitopes. In the genetic algorithm we take as candidate epitopes all 9-mers present in the 169 strings and estimate their immunogenicities using a public bioinformatics tool. Finally, we used several bioinformatic tools to evaluate the properties of the candidates generated by our method, which indicated that we can score high immunogenic λ-superstrings that at the same time present similar conformations to the Nef virus proteins.
Assuntos
Cadeias lambda de Imunoglobulina/imunologia , Vacinas/síntese química , Vacinas contra a AIDS/síntese química , Vacinas contra a AIDS/imunologia , Algoritmos , Epitopos/genética , Epitopos/imunologia , HIV-1/genética , HIV-1/imunologia , Humanos , Cadeias lambda de Imunoglobulina/genética , Modelos Teóricos , Alinhamento de Sequência , Vacinas/imunologia , Produtos do Gene nef do Vírus da Imunodeficiência Humana/genética , Produtos do Gene nef do Vírus da Imunodeficiência Humana/imunologiaRESUMO
MOTIVATION: Discovering the evolution of a tumor may help identify driver mutations and provide a more comprehensive view on the history of the tumor. Recent studies have tackled this problem using multiple samples sequenced from a tumor, and due to clinical implications, this has attracted great interest. However, such samples usually mix several distinct tumor subclones, which confounds the discovery of the tumor phylogeny. RESULTS: We study a natural problem formulation requiring to decompose the tumor samples into several subclones with the objective of forming a minimum perfect phylogeny. We propose an Integer Linear Programming formulation for it, and implement it into a method called MIPUP. We tested the ability of MIPUP and of four popular tools LICHeE, AncesTree, CITUP, Treeomics to reconstruct the tumor phylogeny. On simulated data, MIPUP shows up to a 34% improvement under the ancestor-descendant relations metric. On four real datasets, MIPUP's reconstructions proved to be generally more faithful than those of LICHeE. AVAILABILITY AND IMPLEMENTATION: MIPUP is available at https://github.com/zhero9/MIPUP as open source. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Neoplasias , Humanos , Mutação , Neoplasias/genética , Filogenia , Programação Linear , SoftwareRESUMO
Hajirasouliha and Raphael (WABI 2014) proposed a model for deconvoluting mixed tumor samples measured from a collection of high-throughput sequencing reads. This is related to understanding tumor evolution and critical cancer mutations. In short, their formulation asks to split each row of a binary matrix so that the resulting matrix corresponds to a perfect phylogeny and has the minimum number of rows among all matrices with this property. In this paper, we disprove several claims about this problem, including an NP-hardness proof of it. However, we show that the problem is indeed NP-hard, by providing a different proof. We also prove NP-completeness of a variant of this problem proposed in the same paper. On the positive side, we propose an efficient (though not necessarily optimal) heuristic algorithm based on coloring co-comparability graphs, and a polynomial time algorithm for solving the problem optimally on matrix instances in which no column is contained in both columns of a pair of conflicting columns. Implementations of these algorithms are freely available at https://github.com/alexandrutomescu/MixedPerfectPhylogeny.
Assuntos
Algoritmos , Biologia Computacional/métodos , Neoplasias/genética , Filogenia , Humanos , Modelos Genéticos , Neoplasias/classificaçãoRESUMO
We present two new problems of combinatorial optimization and discuss their applications to the computational design of vaccines. In the shortest λ-superstring problem, given a family S1,...,S(k) of strings over a finite alphabet, a set Τ of "target" strings over that alphabet, and an integer λ, the task is to find a string of minimum length containing, for each i, at least λ target strings as substrings of S(i). In the shortest λ-cover superstring problem, given a collection X1,...,X(n) of finite sets of strings over a finite alphabet and an integer λ, the task is to find a string of minimum length containing, for each i, at least λ elements of X(i) as substrings. The two problems are polynomially equivalent, and the shortest λ-cover superstring problem is a common generalization of two well known combinatorial optimization problems, the shortest common superstring problem and the set cover problem. We present two approaches to obtain exact or approximate solutions to the shortest λ-superstring and λ-cover superstring problems: one based on integer programming, and a hill-climbing algorithm. An application is given to the computational design of vaccines and the algorithms are applied to experimental data taken from patients infected by H5N1 and HIV-1.