Pesquisa | Biblioteca Virtual em Saúde

Generalizations of the genomic rank distance to indels.

Pereira Zanetti, João Paulo; Peres Oliveira, Lucas; Chindelevitch, Leonid; Meidanis, João.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36790056

RESUMO

MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS: We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION: Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Modelos Genéticos , Filogenia , Genoma , Mutação INDEL , Algoritmos

On the rank-distance median of 3 permutations.

Chindelevitch, Leonid; Pereira Zanetti, João Paulo; Meidanis, João.

BMC Bioinformatics ; 19(Suppl 6): 142, 2018 05 08.

Artigo em Inglês | MEDLINE | ID: mdl-29745865

RESUMO

BACKGROUND: Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper. In their paper, the authors provide an [Formula: see text] algorithm for determining three candidate medians, prove the tight approximation ratio [Formula: see text], and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data. RESULTS: In this paper, we extend their results and provide the following: Three invariants characterizing the problem of finding the median of 3 matrices A sufficient condition for uniqueness of medians that can be checked in O(n) A faster, [Formula: see text] algorithm for determining the median under this condition A new heuristic algorithm for this problem based on compressed sensing A [Formula: see text] algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases. CONCLUSIONS: Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.

Assuntos

Modelos Genéticos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Rearranjo Gênico , Genoma , Genômica/métodos , Mutação/genética

Median Approximations for Genomes Modeled as Matrices.

Zanetti, Joao Paulo Pereira; Biller, Priscila; Meidanis, Joao.

Bull Math Biol ; 78(4): 786-814, 2016 04.

Artigo em Inglês | MEDLINE | ID: mdl-27072561

RESUMO

The genome median problem is an important problem in phylogenetic reconstruction under rearrangement models. It can be stated as follows: Given three genomes, find a fourth that minimizes the sum of the pairwise rearrangement distances between it and the three input genomes. In this paper, we model genomes as matrices and study the matrix median problem using the rank distance. It is known that, for any metric distance, at least one of the corners is a [Formula: see text]-approximation of the median. Our results allow us to compute up to three additional matrix median candidates, all of them with approximation ratios at least as good as the best corner, when the input matrices come from genomes. We also show a class of instances where our candidates are optimal. From the application point of view, it is usually more interesting to locate medians farther from the corners, and therefore, these new candidates are potentially more useful. In addition to the approximation algorithm, we suggest a heuristic to get a genome from an arbitrary square matrix. This is useful to translate the results of our median approximation algorithm back to genomes, and it has good results in our tests. To assess the relevance of our approach in the biological context, we ran simulated evolution tests and compared our solutions to those of an exact DCJ median solver. The results show that our method is capable of producing very good candidates.

Assuntos

Genoma , Modelos Genéticos , Algoritmos , Simulação por Computador , Evolução Molecular , Conceitos Matemáticos , Modelos Estatísticos , Filogenia

Evolution of genes neighborhood within reconciled phylogenies: an ensemble approach.

Chauve, Cedric; Ponty, Yann; Zanetti, João.

BMC Bioinformatics ; 16 Suppl 19: S6, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26696141

RESUMO

CONTEXT: The reconstruction of evolutionary scenarios for whole genomes in terms of genome rearrangements is a fundamental problem in evolutionary and comparative genomics. The DeCo algorithm, recently introduced by Bérard et al., computes parsimonious evolutionary scenarios for gene adjacencies, from pairs of reconciled gene trees. However, as for many combinatorial optimization algorithms, there can exist many co-optimal, or slightly sub-optimal, evolutionary scenarios that deserve to be considered. CONTRIBUTION: We extend the DeCo algorithm to sample evolutionary scenarios from the whole solution space under the Boltzmann distribution, and also to compute Boltzmann probabilities for specific ancestral adjacencies. RESULTS: We apply our algorithms to a dataset of mammalian gene trees and adjacencies, and observe a significant reduction of the number of syntenic conflicts observed in the resulting ancestral gene adjacencies.

Assuntos

Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Filogenia , Animais , Bases de Dados Genéticas , Genoma , Mamíferos/genética , Probabilidade , Temperatura

Counting Sorting Scenarios and Intermediate Genomes for the Rank Distance.

Zanetti, Joao Paulo Pereira; Oliveira, Lucas Peres; Meidanis, Joao; Chindelevitch, Leonid.

IEEE/ACM Trans Comput Biol Bioinform ; PP2023 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-37200133

RESUMO

An important problem in genome comparison is the genome sorting problem, that is, the problem of finding a sequence of basic operations that transforms one genome into another whose length (possibly weighted) equals the distance between them. These sequences are called optimal sorting scenarios. However, there is usually a large number of such scenarios, and a naïve algorithm is very likely to be biased towards a specific type of scenario, impairing its usefulness in real-world applications. One way to go beyond the traditional sorting algorithms is to explore all possible solutions, looking at all the optimal sorting scenarios instead of just an arbitrary one. Another related approach is to analyze all the intermediate genomes, that is, all the genomes that can occur in an optimal sorting scenario. In this paper, we show how to enumerate the optimal sorting scenarios and the intermediate genomes between any two given genomes, under the rank distance.

Algorithms and Complexity Results for Genome Mapping Problems.

Rajaraman, Ashok; Zanetti, Joao Paulo Pereira; Manuch, Jan; Chauve, Cedric.

IEEE/ACM Trans Comput Biol Bioinform ; 14(2): 418-430, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-26887011

RESUMO

Genome mapping algorithms aim at computing an ordering of a set of genomic markers based on local ordering information such as adjacencies and intervals of markers. In most genome mapping models, markers are assumed to occur uniquely in the resulting map. We introduce algorithmic questions that consider repeats, i.e., markers that can have several occurrences in the resulting map. We show that, provided with an upper bound on the copy number of repeated markers and with intervals that span full repeat copies, called repeat spanning intervals, the problem of deciding if a set of adjacencies and repeat spanning intervals admits a genome representation is tractable if the target genome can contain linear and/or circular chromosomal fragments. We also show that extracting a maximum cardinality or weight subset of repeat spanning intervals given a set of adjacencies that admits a genome realization is NP-hard but fixed-parameter tractable in the maximum copy number and the number of adjacent repeats, and tractable if intervals contain a single repeated marker.

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Genômica/métodos , Sequências Repetitivas de Ácido Nucleico/genética , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA