Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Syst Biol ; 71(6): 1391-1403, 2022 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-35426933

RESUMEN

A large variety of pairwise measures of similarity or dissimilarity have been developed for comparing phylogenetic trees, for example, species trees or gene trees. Due to its intuitive definition in terms of tree clades and bipartitions and its computational efficiency, the Robinson-Foulds (RF) distance is the most widely used for trees with unweighted edges and labels restricted to leaves (representing the genetic elements being compared). However, in the case of gene trees, an important information revealing the nature of the homologous relation between gene pairs (orthologs, paralogs, and xenologs) is the type of event associated to each internal node of the tree, typically speciations or duplications, but other types of events may also be considered, such as horizontal gene transfers. This labeling of internal nodes is usually inferred from a gene tree/species tree reconciliation method. Here, we address the problem of comparing such event-labeled trees. The problem differs from the classical problem of comparing uniformly labeled trees (all labels belonging to the same alphabet) that may be done using the Tree Edit Distance (TED) mainly due to the fact that, in our case, two different alphabets are considered for the leaves and internal nodes of the tree, and leaves are not affected by edit operations. We propose an extension of the RF distance to event-labeled trees, based on edit operations comparable to those considered for TED: node insertion, node deletion, and label substitution. We show that this new Labeled Robinson-Foulds (LRF) distance can be computed in linear time, in addition of maintaining other desirable properties: being a metric, reducing to RF for trees with no labels on internal nodes and maintaining an intuitive interpretation. The algorithm for computing the LRF distance enables novel analyses on event-label trees such as reconciled gene trees. Here, we use it to study the impact of taxon sampling on labeled gene tree inference and conclude that denser taxon sampling yields trees with better topology but worse labeling. [Algorithms; combinatorics; gene trees; phylogenetics; Robinson-Foulds; tree distance.].


Asunto(s)
Algoritmos , Transferencia de Gen Horizontal , Filogenia
2.
Bioinformatics ; 37(Suppl_1): i120-i132, 2021 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-34252921

RESUMEN

MOTIVATION: It is largely established that all extant mitochondria originated from a unique endosymbiotic event integrating an α-proteobacterial genome into an eukaryotic cell. Subsequently, eukaryote evolution has been marked by episodes of gene transfer, mainly from the mitochondria to the nucleus, resulting in a significant reduction of the mitochondrial genome, eventually completely disappearing in some lineages. However, in other lineages such as in land plants, a high variability in gene repertoire distribution, including genes encoded in both the nuclear and mitochondrial genome, is an indication of an ongoing process of Endosymbiotic Gene Transfer (EGT). Understanding how both nuclear and mitochondrial genomes have been shaped by gene loss, duplication and transfer is expected to shed light on a number of open questions regarding the evolution of eukaryotes, including rooting of the eukaryotic tree. RESULTS: We address the problem of inferring the evolution of a gene family through duplication, loss and EGT events, the latter considered as a special case of horizontal gene transfer occurring between the mitochondrial and nuclear genomes of the same species (in one direction or the other). We consider both EGT events resulting in maintaining (EGTcopy) or removing (EGTcut) the gene copy in the source genome. We present a linear-time algorithm for computing the DLE (Duplication, Loss and EGT) distance, as well as an optimal reconciled tree, for the unitary cost, and a dynamic programming algorithm allowing to output all optimal reconciliations for an arbitrary cost of operations. We illustrate the application of our EndoRex software and analyze different costs settings parameters on a plant dataset and discuss the resulting reconciled trees. AVAILABILITY AND IMPLEMENTATION: EndoRex implementation and supporting data are available on the GitHub repository via https://github.com/AEVO-lab/EndoRex.


Asunto(s)
Evolución Molecular , Transferencia de Gen Horizontal , Algoritmos , Duplicación de Gen , Genoma , Filogenia , Simbiosis/genética
3.
BMC Genomics ; 21(Suppl 10): 779, 2020 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-33208096

RESUMEN

BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). RESULTS: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. CONCLUSIONS: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .


Asunto(s)
Algoritmos , Filogenia
4.
Mol Biol Evol ; 36(4): 766-783, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30698742

RESUMEN

Genetic code deviations involving stop codons have been previously reported in mitochondrial genomes of several green plants (Viridiplantae), most notably chlorophyte algae (Chlorophyta). However, as changes in codon recognition from one amino acid to another are more difficult to infer, such changes might have gone unnoticed in particular lineages with high evolutionary rates that are otherwise prone to codon reassignments. To gain further insight into the evolution of the mitochondrial genetic code in green plants, we have conducted an in-depth study across mtDNAs from 51 green plants (32 chlorophytes and 19 streptophytes). Besides confirming known stop-to-sense reassignments, our study documents the first cases of sense-to-sense codon reassignments in Chlorophyta mtDNAs. In several Sphaeropleales, we report the decoding of AGG codons (normally arginine) as alanine, by tRNA(CCU) of various origins that carry the recognition signature for alanine tRNA synthetase. In Chromochloris, we identify tRNA variants decoding AGG as methionine and the synonymous codon CGG as leucine. Finally, we find strong evidence supporting the decoding of AUA codons (normally isoleucine) as methionine in Pycnococcus. Our results rely on a recently developed conceptual framework (CoreTracker) that predicts codon reassignments based on the disparity between DNA sequence (codons) and the derived protein sequence. These predictions are then validated by an evaluation of tRNA phylogeny, to identify the evolution of new tRNAs via gene duplication and loss, and structural modifications that lead to the assignment of new tRNA identities and a change in the genetic code.


Asunto(s)
Chlorophyta/genética , Evolución Molecular , Código Genético , Genoma Mitocondrial , Filogenia , ARN de Transferencia/genética
5.
BMC Genomics ; 19(Suppl 2): 102, 2018 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-29764363

RESUMEN

BACKGROUND: Several methods have been developed for the accurate reconstruction of gene trees. Some of them use reconciliation with a species tree to correct, a posteriori, errors in gene trees inferred from multiple sequence alignments. Unfortunately the best fit to sequence information can be lost during this process. RESULTS: We describe GATC, a new algorithm for reconstructing a binary gene tree with branch length. GATC returns optimal solutions according to a measure combining both tree likelihood (according to sequence evolution) and a reconciliation score under the Duplication-Transfer-Loss (DTL) model. It can either be used to construct a gene tree from scratch or to correct trees infered by existing reconstruction method, making it highly flexible to various input data types. The method is based on a genetic algorithm acting on a population of trees at each step. It substantially increases the efficiency of the phylogeny space exploration, reducing the risk of falling into local minima, at a reasonable computational time. We have applied GATC to a dataset of simulated cyanobacterial phylogenies, as well as to an empirical dataset of three reference gene families, and showed that it is able to improve gene tree reconstructions compared with current state-of-the-art algorithms. CONCLUSION: The proposed algorithm is able to accurately reconstruct gene trees and is highly suitable for the construction of reference trees. Our results also highlight the efficiency of multi-objective optimization algorithms for the gene tree reconstruction problem. GATC is available on Github at: https://github.com/UdeM-LBIT/GATC .


Asunto(s)
Cianobacterias/genética , Genes Bacterianos , Genómica/métodos , Algoritmos , Evolución Molecular , Duplicación de Gen , Internet , Modelos Genéticos , Familia de Multigenes , Filogenia
6.
Bioinformatics ; 33(21): 3331-3339, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-28655158

RESUMEN

MOTIVATION: Codon reassignments have been reported across all domains of life. With the increasing number of sequenced genomes, the development of systematic approaches for genetic code detection is essential for accurate downstream analyses. Three automated prediction tools exist so far: FACIL, GenDecoder and Bagheera; the last two respectively restricted to metazoan mitochondrial genomes and CUG reassignments in yeast nuclear genomes. These tools can only analyze a single genome at a time and are often not followed by a validation procedure, resulting in a high rate of false positives. RESULTS: We present CoreTracker, a new algorithm for the inference of sense-to-sense codon reassignments. CoreTracker identifies potential codon reassignments in a set of related genomes, then uses statistical evaluations and a random forest classifier to predict those that are the most likely to be correct. Predicted reassignments are then validated through a phylogeny-aware step that evaluates the impact of the new genetic code on the protein alignment. Handling simultaneously a set of genomes in a phylogenetic framework, allows tracing back the evolution of each reassignment, which provides information on its underlying mechanism. Applied to metazoan and yeast genomes, CoreTracker significantly outperforms existing methods on both precision and sensitivity. AVAILABILITY AND IMPLEMENTATION: CoreTracker is written in Python and available at https://github.com/UdeM-LBIT/CoreTracker. CONTACT: mabrouk@iro.umontreal.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Núcleo Celular/genética , Codón , Genoma Mitocondrial , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Código Genético , Genoma Fúngico , Filogenia , Levaduras/genética
7.
Mol Biol Evol ; 32(6): 1643-56, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25660374

RESUMEN

OrthoAlign, an algorithm for the gene order alignment problem (alignment of orthologs), accounting for most genome-wide evolutionary events such as duplications, losses, rearrangements, and substitutions, was presented. OrthoAlign was used in a phylogenetic framework to infer the evolution of transfer RNA repertoires of 50 fully sequenced bacteria in the Bacillus genus. A prevalence of gene duplications and losses over rearrangement events was observed. The average rate of duplications inferred in Bacillus was 24 times lower than the one reported in Escherichia coli, whereas the average rates of losses and inversions were both 12 times lower. These rates were extremely low, suggesting a strong selective pressure acting on tRNA gene repertoires in Bacillus. An exhaustive analysis of the type, location, distribution, and length of evolutionary events was provided, together with ancestral configurations. OrthoAlign can be downloaded at: http://www.iro.umontreal.ca/~mabrouk/.


Asunto(s)
Bacillus/genética , Genes Bacterianos , ARN Bacteriano/genética , ARN de Transferencia/genética , Algoritmos , Bacillus/clasificación , Proteínas Bacterianas/genética , Secuencia de Bases , Simulación por Computador , Bases de Datos Genéticas , Escherichia coli/genética , Evolución Molecular , Duplicación de Gen , Orden Génico , Modelos Genéticos , Datos de Secuencia Molecular , Operón/genética , Filogenia , Reproducibilidad de los Resultados , Alineación de Secuencia
8.
Bioinformatics ; 36(Suppl_1): i1-i2, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657353
9.
BMC Bioinformatics ; 16 Suppl 14: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26451911

RESUMEN

Combining a set of trees on partial datasets into a single tree is a classical method for inferring large phylogenetic trees. Ideally, the combined tree should display each input partial tree, which is only possible if input trees do not contain contradictory phylogenetic information. The simplest version of the supertree problem is thus to state whether a set of trees is compatible, and if so, construct a tree displaying them all. Classically, supertree methods have been applied to the reconstruction of species trees. Here we rather consider reconstructing a super gene tree in light of a known species tree S. We define the supergenetree problem as finding, among all supertrees displaying a set of input gene trees, one supertree minimizing a reconciliation distance with S. We first show how classical exact methods to the supertree problem can be extended to the supergenetree problem. As all these methods are highly exponential, we also exhibit a natural greedy heuristic for the duplication cost, based on minimizing the set of duplications preceding the first speciation event. We then show that both the supergenetree problem and its restriction to minimizing duplications preceding the first speciation are NP-hard to approximate within a n1-ϵ factor, for any 0 < ϵ < 1. Finally, we show that a restriction of this problem to uniquely labeled speciation gene trees, which is relevant to many biological applications, is also NP-hard. Therefore, we introduce new avenues in the field of supertrees, and set the theoretical basis for the exploration of various algorithmic aspects of the problems.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Evolución Molecular , Especiación Genética , Filogenia , Animales , Humanos , Modelos Genéticos , Programas Informáticos
10.
Bioinformatics ; 35(14): i1-i2, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510708
11.
Bioinformatics ; 30(17): i519-26, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161242

RESUMEN

MOTIVATION: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies. RESULTS: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications. AVAILABILITY AND IMPLEMENTATION: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genes , Filogenia , Algoritmos , Animales , Peces/genética
12.
BMC Genomics ; 15 Suppl 6: S5, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25572278

RESUMEN

We relate the comparison of gene orders to an alignment problem. Our evolutionary model accounts for both rearrangement and content-modifying events. We present a heuristic based on dynamic programming for the inference of the median of three genomes and apply it in a phylogenetic framework. multiOrthoAlign is shown accurate on simulated and real datasets, and shown to significantly improve the running-time of DupLoCut, an "almost" exact algorithm based on linear programming, developed recently for the same problem.


Asunto(s)
Orden Génico , Genoma , Genómica/métodos , Modelos Genéticos , Algoritmos , Evolución Molecular , Filogenia
13.
BMC Genomics ; 15 Suppl 6: S12, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25572629

RESUMEN

BACKGROUND: A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family  G. But is a given set  C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for  G? While previous studies have focused on full sets of constraints, here we consider the general case where  C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is  C satisfiable, i.e. can we find an event-labeled gene tree G inducing  C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? RESULTS: Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Familia de Multigenes , Filogenia , Algoritmos
14.
BMC Bioinformatics ; 14 Suppl 15: S5, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564227

RESUMEN

BACKGROUND: Reconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny. RESULTS: We explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent. AVAILABILITY: Software and data available upon request to the corresponding author.


Asunto(s)
Alineación de Secuencia , Algoritmos , Animales , Evolución Molecular , Humanos , Filogenia , Programas Informáticos , Sintenía
15.
Algorithms Mol Biol ; 18(1): 9, 2023 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-37518001

RESUMEN

Reconciling a non-binary gene tree with a binary species tree can be done efficiently in the absence of horizontal gene transfers, but becomes NP-hard in the presence of gene transfers. Here, we focus on the special case of endosymbiotic gene transfers (EGT), i.e. transfers between the mitochondrial and nuclear genome of the same species. More precisely, given a multifurcated (non-binary) gene tree with leaves labeled 0 or 1 depending on whether the corresponding genes belong to the mitochondrial or nuclear genome of the corresponding species, we investigate the problem of inferring a most parsimonious Duplication, Loss and EGT (DLE) Reconciliation of any binary refinement of the tree. We present a general two-steps method: ignoring the 0-1 labeling of leaves, output a binary resolution minimizing the Duplication and Loss (DL) Reconciliation and then, for such resolution, assign a known number of 0s and 1s to the leaves in a way minimizing EGT events. While the first step corresponds to the well studied non-binary DL-Reconciliation problem, the complexity of the label assignment problem corresponding to the second step is unknown. We show that this problem is NP-complete, even when the tree is restricted to a single polytomy, and even if transfers can occur in only one direction. We present a general algorithm solving each polytomy separately, which is shown optimal for a unitary cost of operation, and a polynomial-time algorithm for solving a polytomy in the special case where genes are specific to a single genome (mitochondrial or nuclear) in all but one species. This work represents the first algorithmic study for reconciliation with endosymbiotic gene transfers in the case of a multifurcated gene tree.

16.
BMC Bioinformatics ; 13 Suppl 19: S15, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281654

RESUMEN

BACKGROUND: Reconciliation is the classical method for inferring a duplication and loss history from a set of extant genes. It is based upon the notion of embedding the gene tree into the species tree, the incongruence between the two indicating evidence for duplication and loss. However, results obtained by this method are highly dependent upon the considered species and gene trees. Thus, painstaking attention has been given to the development of methods for reconstructing accurate gene trees. RESULTS: This paper highlights the fact that errors in gene trees are not the only reasons for the inference of an erroneous duplication-loss history. More precisely, we prove that, under certain reasonable hypotheses based on the widely accepted link between function and sequence constraints, even a well-supported gene tree yield a reconciliation that does not correspond to the true history. We then provide the theoretical underpinnings for a conservative approach to infer histories given such gene trees. We apply our method to the mammalian interleukin-1 (IL) gene tree, that has been used as a model example to illustrate the role of reconciliation.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Genes , Filogenia , Algoritmos , Animales , Genoma , Interleucina-1/genética , Análisis de Secuencia de ADN
17.
BMC Bioinformatics ; 13 Suppl 19: S4, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23281872

RESUMEN

BACKGROUND: The "small phylogeny" problem consists in inferring ancestral genomes associated with each internal node of a phylogenetic tree of a set of extant species. Existing methods can be grouped into two main categories: the distance-based methods aiming at minimizing a total branch length, and the synteny-based (or mapping) methods that first predict a collection of relations between ancestral markers in term of "synteny", and then assemble this collection into a set of Contiguous Ancestral Regions (CARs). The predicted CARs are likely to be more reliable as they are more directly deduced from observed conservations in extant species. However the challenge is to end up with a completely assembled genome. RESULTS: We develop a new synteny-based method that is flexible enough to handle a model of evolution involving whole genome duplication events, in addition to rearrangements, gene insertions, and losses. Ancestral relationships between markers are defined in term of Gapped Adjacencies, i.e. pairs of markers separated by up to a given number of markers. It improves on a previous restricted to direct adjacencies, which revealed a high accuracy for adjacency prediction, but with the drawback of being overly conservative, i.e. of generating a large number of CARs. Applying our algorithm on various simulated data sets reveals good performance as we usually end up with a completely assembled genome, while keeping a low error rate. AVAILABILITY: All source code is available at http://www.iro.umontreal.ca/~mabrouk.


Asunto(s)
Mapeo Contig/métodos , Evolución Molecular , Genoma , Algoritmos , Filogenia , Sintenía
18.
BMC Bioinformatics ; 12 Suppl 9: S2, 2011 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-22152029

RESUMEN

BACKGROUND: Tandemly Arrayed Gene (TAG) clusters are groups of paralogous genes that are found adjacent on a chromosome. TAGs represent an important repertoire of genes in eukaryotes. In addition to tandem duplication events, TAG clusters are affected during their evolution by other mechanisms, such as inversion and deletion events, that affect the order and orientation of genes. The DILTAG algorithm developed in 1 makes it possible to infer a set of optimal evolutionary histories explaining the evolution of a single TAG cluster, from an ancestral single gene, through tandem duplications (simple or multiple, direct or inverted), deletions and inversion events. RESULTS: We present a general methodology, which is an extension of DILTAG, for the study of the evolutionary history of a set of orthologous TAG clusters in multiple species. In addition to the speciation events reflected by the phylogenetic tree of the considered species, the evolutionary events that are taken into account are simple or multiple tandem duplications, direct or inverted, simple or multiple deletions, and inversions. We analysed the performance of our algorithm on simulated data sets and we applied it to the protocadherin gene clusters of human, chimpanzee, mouse and rat. CONCLUSIONS: Our results obtained on simulated data sets showed a good performance in inferring the total number and size distribution of duplication events. A limitation of the algorithm is however in dealing with multiple gene deletions, as the algorithm is highly exponential in this case, and becomes quickly intractable.


Asunto(s)
Algoritmos , Evolución Molecular , Duplicación de Gen , Familia de Multigenes , Animales , Eliminación de Gen , Humanos , Ratones , Filogenia , Ratas
19.
Mol Biol Evol ; 27(4): 761-72, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19903657

RESUMEN

Gene duplication is frequent within gene clusters and plays a fundamental role in evolution by providing a source of new genetic material upon which natural selection can act. Although classical phylogenetic inference methods provide some insight into the evolutionary history of a gene cluster, they are not sufficient alone to differentiate single- from multiple gene duplication events and to answer other questions regarding the nature and size of evolutionary events. In this paper, we present an algorithm allowing to infer a set of optimal evolutionary histories for a gene cluster in a single species, according to a general cost model involving variable length duplications (in tandem or inverted), deletions, and inversions. We applied our algorithm to the human olfactory receptor and protocadherin gene clusters, showing that the duplication size distribution differs significantly between the two gene families. The algorithm is available through a web interface at http://www-lbit.iro.umontreal.ca/DILTAG/.


Asunto(s)
Algoritmos , Duplicación de Gen , Cadherinas/genética , Evolución Molecular , Humanos , Familia de Multigenes , Receptores Odorantes/genética
20.
Algorithms Mol Biol ; 15: 12, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32508979

RESUMEN

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA