Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36637196

RESUMEN

MOTIVATION: The phylogenetic signal of structural variation informs a more comprehensive understanding of evolution. As (near-)complete genome assembly becomes more commonplace, the next methodological challenge for inferring genome rearrangement trees is the identification of syntenic blocks of orthologous sequences. In this article, we studied 94 reference quality genomes of primarily Mycobacterium tuberculosis (Mtb) isolates as a benchmark to evaluate these methods. The clonal nature of Mtb evolution, the manageable genome sizes, along with substantial levels of structural variation make this an ideal benchmarking dataset. RESULTS: We tested several methods for detecting homology and obtaining syntenic blocks and two methods for inferring phylogenies from them, then compared the resulting trees to the standard method's tree, inferred from nucleotide substitutions. We found that, not only the choice of methods, but also their parameters can impact results, and that the tree inference method had less impact than the block determination method. Interestingly, a rearrangement tree based on blocks from the Cactus whole-genome aligner was fully compatible with the highly supported branches of the substitution-based tree, enabling the combination of the two into a high-resolution supertree. Overall, our results indicate that accurate trees can be inferred using genome rearrangements, but the choice of the methods for inferring homology requires care. AVAILABILITY AND IMPLEMENTATION: Analysis scripts and code written for this study are available at https://gitlab.com/LPCDRP/rearrangement-homology.pub and https://gitlab.com/LPCDRP/syntement. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mycobacterium tuberculosis , Filogenia , Mycobacterium tuberculosis/genética , Genoma , Sintenía
2.
Bioinformatics ; 35(14): i117-i126, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510664

RESUMEN

MOTIVATION: Genome rearrangements drastically change gene order along great stretches of a chromosome. There has been initial evidence that these apparently non-local events in the 1D sense may have breakpoints that are close in the 3D sense. We harness the power of the Double Cut and Join model of genome rearrangement, along with Hi-C chromosome conformation capture data to test this hypothesis between human and mouse. RESULTS: We devise novel statistical tests that show that indeed, rearrangement scenarios that transform the human into the mouse gene order are enriched for pairs of breakpoints that have frequent chromosome interactions. This is observed for both intra-chromosomal breakpoint pairs, as well as for inter-chromosomal pairs. For intra-chromosomal rearrangements, the enrichment exists from close (<20 Mb) to very distant (100 Mb) pairs. Further, the pattern exists across multiple cell lines in Hi-C data produced by different laboratories and at different stages of the cell cycle. We show that similarities in the contact frequencies between these many experiments contribute to the enrichment. We conclude that either (i) rearrangements usually involve breakpoints that are spatially close or (ii) there is selection against rearrangements that act on spatially distant breakpoints. AVAILABILITY AND IMPLEMENTATION: Our pipeline is freely available at https://bitbucket.org/thekswenson/locality. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromatina , Genoma , Programas Informáticos , Animales , Ciclo Celular , Puntos de Rotura del Cromosoma , Cromosomas , Humanos , Mamíferos , Ratones
3.
BMC Bioinformatics ; 17: 30, 2016 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-26757899

RESUMEN

BACKGROUND: In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. RESULTS: In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. CONCLUSIONS: Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).


Asunto(s)
Bacteriófagos/genética , Genoma Viral , Mycobacterium/virología , Alineación de Secuencia/métodos , Staphylococcus aureus/virología , Algoritmos , Biología Computacional/métodos , Genómica/métodos
4.
BMC Genomics ; 17(Suppl 10): 786, 2016 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-28185551

RESUMEN

BACKGROUND: Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve - as has been strongly suggested in the literature - it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs. RESULTS: We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade. CONCLUSIONS: Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.


Asunto(s)
Algoritmos , Proteínas/genética , Transcriptoma , Empalme Alternativo , Animales , Biología Computacional , Exones , Humanos , Ratones , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Proteínas/química , Proteínas/metabolismo , ARN/química , ARN/genética , ARN/metabolismo
5.
BMC Bioinformatics ; 14 Suppl 15: S17, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564731

RESUMEN

BACKGROUND: Viruses that infect bacteria, called phages, are well-known for their extreme mosaicism, in which an individual genome shares many different parts with many others. The mechanisms for creating these mosaics are largely unknown but are believed to be recombinations, either illegitimate, or partly homologous. In order to reconstruct the history of these recombinations, we need to identify the positions where recombinations may have occurred, and develop algorithms to generate and explore the possible reconstructions. RESULTS: We first show that, provided that their gene order is co-linear, genomes of phages can be aligned, even if large parts of their sequences lack any detectable similarity and are annotated hypothetical proteins. We give such an alignment for 31 Staphylococcus aureus phage genomes, and algorithms that can be used in any similar context. These alignments provide the datasets needed for a combinatorial study of recombinations. We next reconstruct the most likely recombination history of the set of 31 phages, under the hypothesis that recombinations are partly homologous. This history relies on the computational identification of missing phages. CONCLUSIONS: This first combinatorial study of modular recombinations acts as a proof of concept. We show that alignments of whole genomes are feasible for large sets of phages, and that this representation yields data that can be used to reconstruct parts of the evolutionary history of these organisms.


Asunto(s)
Bacteriófagos/genética , Genoma Viral , Recombinación Genética , Staphylococcus aureus/virología , Algoritmos , Análisis de Secuencia de ADN , Staphylococcus aureus/genética
6.
BMC Bioinformatics ; 14 Suppl 15: S5, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564227

RESUMEN

BACKGROUND: Reconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny. RESULTS: We explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent. AVAILABILITY: Software and data available upon request to the corresponding author.


Asunto(s)
Alineación de Secuencia , Algoritmos , Animales , Evolución Molecular , Humanos , Filogenia , Programas Informáticos , Sintenía
7.
BMC Bioinformatics ; 13 Suppl 19: S15, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281654

RESUMEN

BACKGROUND: Reconciliation is the classical method for inferring a duplication and loss history from a set of extant genes. It is based upon the notion of embedding the gene tree into the species tree, the incongruence between the two indicating evidence for duplication and loss. However, results obtained by this method are highly dependent upon the considered species and gene trees. Thus, painstaking attention has been given to the development of methods for reconstructing accurate gene trees. RESULTS: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history. More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history. We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees. We apply our method to the mammalian interleukin-1 (IL) gene tree, that has been used as a model example to illustrate the role of reconciliation.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Genes , Filogenia , Algoritmos , Animales , Genoma , Interleucina-1/genética , Análisis de Secuencia de ADN
8.
BMC Bioinformatics ; 13 Suppl 19: S16, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281701

RESUMEN

Understanding the history of a gene family that evolves through duplication, speciation, and loss is a fundamental problem in comparative genomics. Features such as function, position, and structural similarity between genes are intimately connected to this history; relationships between genes such as orthology (genes related through a speciation event) or paralogy (genes related through a duplication event) are usually correlated with these features. For example, recent work has shown that in human and mouse there is a strong connection between function and inparalogs, the paralogs that were created since the speciation event separating the human and mouse lineages. Methods exist for detecting inparalogs that either use information from only two species, or consider a set of species but rely on clustering methods. In this paper we present a graph-theoretic approach for finding lower bounds on the number of inparalogs for a given set of species; we pose an edge covering problem on the similarity graph and give an efficient 2/3-approximation as well as a faster heuristic. Since the physical position of inparalogs corresponding to recent speciations is not likely to have changed since the duplication, we also use our predictions to estimate the types of duplications that have occurred in some vertebrates and drosophila.


Asunto(s)
Evolución Molecular , Genómica/métodos , Familia de Multigenes , Análisis de Secuencia de ADN/métodos , Animales , Duplicación de Gen , Humanos , Ratones , Filogenia , Ratas
9.
Artículo en Inglés | MEDLINE | ID: mdl-31217125

RESUMEN

The maximum agreement subtree method determines the consensus of a collection of phylogenetic trees by identifying maximum cardinality subsets of leaves for which all input trees agree. The trees induced by these maximum cardinality subsets are maximum agreement subtrees (MASTs). A single MAST may be misleading, since there can exist two MASTs which share almost no leaves; nevertheless, it may be impossible to inspect all MASTs, since the number of MASTs can be exponential in the number of leaves. To overcome this drawback, Swenson et al. suggested to further summarize the information common to all MASTs by their intersection, which is called the kernel agreement subtree (KAST). The construction of the KAST is the focus of this paper. Swenson et al. had an O(kn3 + n4 + nd + 1) time algorithm for computing the KAST of k trees on n leaves, in which at least one tree has maximum degree d. In this paper, an O(kn3 + nd)-time algorithm is presented. We demonstrate the efficiency of our algorithm on simulated trees as well as on ribosomal RNA alignments, where trees with 13,000 taxa took only hours to process, whereas the previous algorithm did not terminate after a week of computation.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Filogenia , Alineación de Secuencia/métodos , Secuencia de Consenso , Genes de ARNr/genética , Análisis de Secuencia de ARN/métodos
10.
BMC Bioinformatics ; 11 Suppl 1: S54, 2010 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-20122229

RESUMEN

BACKGROUND: The rapidly increasing availability of whole-genome sequences has enabled the study of whole-genome evolution. Evolutionary mechanisms based on genome rearrangements have attracted much attention and given rise to many models; somewhat independently, the mechanisms of gene duplication and loss have seen much work. However, the two are not independent and thus require a unified treatment, which remains missing to date. Moreover, existing rearrangement models do not fit the dichotomy between most prokaryotic genomes (one circular chromosome) and most eukaryotic genomes (multiple linear chromosomes). RESULTS: To handle rearrangements, gene duplications and losses, we propose a new evolutionary model and the corresponding method for estimating true evolutionary distance. Our model, inspired from the DCJ model, is simple and the first to respect the prokaryotic/eukaryotic structural dichotomy. Experimental results on a wide variety of genome structures demonstrate the very high accuracy and robustness of our distance estimator. CONCLUSION: We give the first robust, statistically based, estimate of genomic pairwise distances based on rearrangements, duplications and losses, under a model that respects the structural dichotomy between prokaryotic and eukaryotic genomes. Accurate and robust estimates in true evolutionary distances should translate into much better phylogenetic reconstructions as well as more accurate genomic alignments, while our new model of genome rearrangements provides another refinement in simplicity and verisimilitude.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Reordenamiento Génico/genética , Genómica/métodos , Genoma , Filogenia
11.
BMC Bioinformatics ; 11 Suppl 1: S30, 2010 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-20122203

RESUMEN

BACKGROUND: The study of genome rearrangements has become a mainstay of phylogenetics and comparative genomics. Fundamental in such a study is the median problem: given three genomes find a fourth that minimizes the sum of the evolutionary distances between itself and the given three. Many exact algorithms and heuristics have been developed for the inversion median problem, of which the best known is MGR. RESULTS: We present a unifying framework for median heuristics, which enables us to clarify existing strategies and to place them in a partial ordering. Analysis of this framework leads to a new insight: the best strategies continue to refer to the input data rather than reducing the problem to smaller instances. Using this insight, we develop a new heuristic for inversion medians that uses input data to the end of its computation and leverages our previous work with DCJ medians. Finally, we present the results of extensive experimentation showing that our new heuristic outperforms all others in accuracy and, especially, in running time: the heuristic typically returns solutions within 1% of optimal and runs in seconds to minutes even on genomes with 25'000 genes--in contrast, MGR can take days on instances of 200 genes and cannot be used beyond 1'000 genes. CONCLUSION: Finding good rearrangement medians, in particular inversion medians, had long been regarded as the computational bottleneck in whole-genome studies. Our new heuristic for inversion medians, ASM, which dominates all others in our framework, puts that issue to rest by providing near-optimal solutions within seconds to minutes on even the largest genomes.


Asunto(s)
Algoritmos , Genoma , Genómica/métodos , Evolución Molecular , Reordenamiento Génico , Filogenia
12.
PLoS One ; 15(2): e0228676, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32040487

RESUMEN

Production of the Panton-Valentine leukocidin (PVL) by Staphylococcus aureus is mediated via the genes lukS-PV and lukF-PV which are carried on bacteriophage ϕSa2. PVL is associated with S. aureus strains that cause serious infections and clones of community-associated methicillin-resistant S. aureus (CA-MRSA) that have additionally disseminated widely. In Western Australia (WA) the original CA-MRSA were PVL negative however, between 2005 and 2008, following the introduction of eight international PVL-positive CA-MRSA, PVL-positive WA CA-MRSA were found. There was concern that PVL bacteriophages from the international clones were transferring into the local clones, therefore a comparative study of PVL-carrying ϕSa2 prophage genomes from historic WA PVL-positive S. aureus and representatives of all PVL-positive CA-MRSA isolated in WA between 2005 and 2008 was performed. The prophages were classified into two genera and three PVL bacteriophage groups and had undergone many recombination events during their evolution. Comparative analysis of mosaic regions of selected bacteriophages using the Alignments of bacteriophage genomes (Alpha) aligner revealed novel recombinations and modules. There was heterogeneity in the chromosomal integration sites, the lysogeny regulation regions, the defence and DNA processing modules, the structural and packaging modules and the lukSF-PV genes. One WA CA-MRSA (WA518751) and one international clone (Korean Clone) have probably acquired PVL-carrying ϕSa2 in WA, however these clones did not disseminate in the community. Genetic heterogeneity made it impossible to trace the source of the PVL prophages in the other WA clones. Against this background of PVL prophage diversity, the sequence of one group, the ϕSa2USA/ϕSa2wa-st93 group, was remarkably stable over at least 20 years and associated with the highly virulent USA300 and ST93-IVa CA-MRSA lineages that have disseminated globally.


Asunto(s)
Toxinas Bacterianas/genética , Bacteriófagos/genética , Exotoxinas/genética , Leucocidinas/genética , Staphylococcus aureus Resistente a Meticilina/virología , Linaje de la Célula , ADN Bacteriano/genética , Genotipo , Geografía , Lisogenia , Staphylococcus aureus Resistente a Meticilina/genética , Epidemiología Molecular , Tipificación de Secuencias Multilocus , Sistemas de Lectura Abierta , Profagos/genética , Factores de Virulencia/genética , Australia Occidental
13.
BMC Bioinformatics ; 10 Suppl 1: S7, 2009 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-19208174

RESUMEN

BACKGROUND: Reconstructing complete ancestral genomes (at least in terms of their gene inventory and arrangement) is attracting much interest due to the rapidly increasing availability of whole genome sequences. While modest successes have been reported for mammalian and even vertebrate genomes, more divergent groups continue to pose a stiff challenge, mostly because current models of genomic evolution support too many choices. RESULTS: We describe a novel type of genomic signature based on rearrangements that characterizes evolutionary changes that must be common to all minimal rearrangement scenarios; by focusing on global patterns of rearrangements, such signatures bypass individual variations and sharply restrict the search space. We present the results of extensive simulation studies demonstrating that these signatures can be used to reconstruct accurate ancestral genomes and phylogenies even for widely divergent collections. CONCLUSION: Focusing on genome triples rather than genomes pairs unleashes the full power of evolutionary analysis. Our genomic signature captures shared evolutionary events and thus can form the basis of a robust analysis and reconstruction of evolutionary history.


Asunto(s)
Inversión Cromosómica/genética , Evolución Molecular , Genoma , Animales , Reordenamiento Génico , Humanos
14.
BMC Bioinformatics ; 10 Suppl 1: S6, 2009 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-19208163

RESUMEN

BACKGROUND: Given three signed permutations, an inversion median is a fourth permutation that minimizes the sum of the pairwise inversion distances between it and the three others. This problem is NP-hard as well as hard to approximate. Yet median-based approaches to phylogenetic reconstruction have been shown to be among the most accurate, especially in the presence of long branches. Most existing approaches have used heuristics that attempt to find a longest sequence of inversions from one of the three permutations that, at each step in the sequence, moves closer to the other two permutations; yet very little is known about the quality of solutions returned by such approaches. RESULTS: Recently, Arndt and Tang took a step towards finding longer such sequences by using sets of commuting inversions. In this paper, we formalize the problem of finding such sequences of inversions with what we call signatures and provide algorithms to find maximum cardinality sets of commuting and noninterfering inversions. CONCLUSION: Our results offer a framework in which to study the inversion median problem, faster algorithms to obtain good medians, and an approach to study characteristic events along an evolutionary path.


Asunto(s)
Algoritmos , Filogenia , Inversión Cromosómica , Biología Computacional/métodos , Evolución Molecular
15.
Algorithms Mol Biol ; 14: 15, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31360217

RESUMEN

This paper generalizes previous studies on genome rearrangement under biological constraints, using double cut and join (DCJ). We propose a model for weighted DCJ, along with a family of optimization problems called φ -MCPS (Minimum Cost Parsimonious Scenario), that are based on labeled graphs. We show how to compute solutions to general instances of φ -MCPS, given an algorithm to compute φ -MCPS on a circular genome with exactly one occurrence of each gene. These general instances can have an arbitrary number of circular and linear chromosomes, and arbitrary gene content. The practicality of the framework is displayed by presenting polynomial-time algorithms that generalize the results of Bulteau, Fertin, and Tannier on the Sorting by wDCJs and indels in intergenes problem, and that generalize previous results on the Minimum Local Parsimonious Scenario problem.

16.
Algorithms Mol Biol ; 13: 9, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29755580

RESUMEN

BACKGROUND: The double cut and join (DCJ) model of genome rearrangement is well studied due to its mathematical simplicity and power to account for the many events that transform gene order. These studies have mostly been devoted to the understanding of minimum length scenarios transforming one genome into another. In this paper we search instead for rearrangement scenarios that minimize the number of rearrangements whose breakpoints are unlikely due to some biological criteria. One such criterion has recently become accessible due to the advent of the Hi-C experiment, facilitating the study of 3D spacial distance between breakpoint regions. RESULTS: We establish a link between the minimum number of unlikely rearrangements required by a scenario and the problem of finding a maximum edge-disjoint cycle packing on a certain transformed version of the adjacency graph. This link leads to a 3/2-approximation as well as an exact integer linear programming formulation for our problem, which we prove to be NP-complete. We also present experimental results on fruit flies, showing that Hi-C data is informative when used as a criterion for rearrangements. CONCLUSIONS: A new variant of the weighted DCJ distance problem is addressed that ignores scenario length in its objective function. A solution to this problem provides a lower bound on the number of unlikely moves necessary when transforming one gene order into another. This lower bound aids in the study of rearrangement scenarios with respect to chromatin structure, and could eventually be used in the design of a fixed parameter algorithm with a more general objective function.

17.
Algorithms Mol Biol ; 11: 13, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27190550

RESUMEN

BACKGROUND: Traditionally, the merit of a rearrangement scenario between two gene orders has been measured based on a parsimony criteria alone; two scenarios with the same number of rearrangements are considered equally good. In this paper, we acknowledge that each rearrangement has a certain likelihood of occurring based on biological constraints, e.g. physical proximity of the DNA segments implicated or repetitive sequences. RESULTS: We propose optimization problems with the objective of maximizing overall likelihood, by weighting the rearrangements. We study a binary weight function suitable to the representation of sets of genome positions that are most likely to have swapped adjacencies. We give a polynomial-time algorithm for the problem of finding a minimum weight double cut and join scenario among all minimum length scenarios. In the process we solve an optimization problem on colored noncrossing partitions, which is a generalization of the Maximum Independent Set problem on circle graphs. CONCLUSIONS: We introduce a model for weighting genome rearrangements and show that under simple yet reasonable conditions, a fundamental distance can be computed in polynomial time. This is achieved by solving a generalization of the Maximum Independent Set problem on circle graphs. Several variants of the problem are also mentioned.

18.
Algorithms Mol Biol ; 7(1): 31, 2012 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-23167951

RESUMEN

BACKGROUND: Reconciliation is the commonly used method for inferring the evolutionary scenario for a gene family. It consists in "embedding" inferred gene trees into a known species tree, revealing the evolution of the gene family by duplications and losses. When a species tree is not known, a natural algorithmic problem is to infer a species tree from a set of gene trees, such that the corresponding reconciliation minimizes the number of duplications and/or losses. The main drawback of reconciliation is that the inferred evolutionary scenario is strongly dependent on the considered gene trees, as few misplaced leaves may lead to a completely different history, with significantly more duplications and losses. RESULTS: In this paper, we take advantage of certain gene trees' properties in order to preprocess them for reconciliation or species tree inference. We flag certain duplication vertices of a gene tree, the "non-apparent duplication" (NAD) vertices, as resulting from the misplacement of leaves. In the case of species tree inference, we develop a polynomial-time heuristic for removing the minimum number of species leading to a set of gene trees that exhibit no NAD vertices with respect to at least one species tree. In the case of reconciliation, we consider the optimization problem of removing the minimum number of leaves or species leading to a tree without any NAD vertex. We develop a polynomial-time algorithm that is exact for two special classes of gene trees, and show a good performance on simulated data sets in the general case.

19.
Artículo en Inglés | MEDLINE | ID: mdl-22231622

RESUMEN

A Maximum Agreement SubTree (MAST) is a largest subtree common to a set of trees and serves as a summary of common substructure in the trees. A single MAST can be misleading, however, since there can be an exponential number of MASTs, and two MASTs for the same tree set do not even necessarily share any leaves. In this paper, we introduce the notion of the Kernel Agreement SubTree (KAST), which is the summary of the common substructure in all MASTs, and show that it can be calculated in polynomial time (for trees with bounded degree). Suppose the input trees represent competing hypotheses for a particular phylogeny. We explore the utility of the KAST as a method to discern the common structure of confidence, and as a measure of how confident we are in a given tree set. We also show the trend of the KAST, as compared to other consensus methods, on the set of all trees visited during a Bayesian analysis of flatworm genomes.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Filogenia , Animales , Teorema de Bayes , Genoma Bacteriano/genética , Genoma de los Helmintos , Modelos Genéticos , Platelmintos/genética , Proteobacteria
20.
Algorithms Mol Biol ; 6: 11, 2011 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-21504604

RESUMEN

We describe an average-case O(n2) algorithm to list all reversals on a signed permutation π that, when applied to π, produce a permutation that is closer to the identity. This algorithm is optimal in the sense that, the time it takes to write the list is Ω(n2) in the worst case.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA