Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(Suppl 1): i237-i246, 2024 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940169

RESUMO

MOTIVATION: Noncoding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis. RESULTS: We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labeled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We also consider a more general tree edit distance that allows the mapping of base pairs that are not perfectly aligned. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility. AVAILABILITY AND IMPLEMENTATION: https://github.com/bmarchand/rna\_small\_parsimony.


Assuntos
Conformação de Ácido Nucleico , RNA não Traduzido , RNA não Traduzido/genética , RNA não Traduzido/química , Algoritmos , Evolução Molecular , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos
2.
Bioinformatics ; 37(Suppl_1): i120-i132, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252921

RESUMO

MOTIVATION: It is largely established that all extant mitochondria originated from a unique endosymbiotic event integrating an α-proteobacterial genome into an eukaryotic cell. Subsequently, eukaryote evolution has been marked by episodes of gene transfer, mainly from the mitochondria to the nucleus, resulting in a significant reduction of the mitochondrial genome, eventually completely disappearing in some lineages. However, in other lineages such as in land plants, a high variability in gene repertoire distribution, including genes encoded in both the nuclear and mitochondrial genome, is an indication of an ongoing process of Endosymbiotic Gene Transfer (EGT). Understanding how both nuclear and mitochondrial genomes have been shaped by gene loss, duplication and transfer is expected to shed light on a number of open questions regarding the evolution of eukaryotes, including rooting of the eukaryotic tree. RESULTS: We address the problem of inferring the evolution of a gene family through duplication, loss and EGT events, the latter considered as a special case of horizontal gene transfer occurring between the mitochondrial and nuclear genomes of the same species (in one direction or the other). We consider both EGT events resulting in maintaining (EGTcopy) or removing (EGTcut) the gene copy in the source genome. We present a linear-time algorithm for computing the DLE (Duplication, Loss and EGT) distance, as well as an optimal reconciled tree, for the unitary cost, and a dynamic programming algorithm allowing to output all optimal reconciliations for an arbitrary cost of operations. We illustrate the application of our EndoRex software and analyze different costs settings parameters on a plant dataset and discuss the resulting reconciled trees. AVAILABILITY AND IMPLEMENTATION: EndoRex implementation and supporting data are available on the GitHub repository via https://github.com/AEVO-lab/EndoRex.


Assuntos
Evolução Molecular , Transferência Genética Horizontal , Algoritmos , Duplicação Gênica , Genoma , Filogenia , Simbiose/genética
3.
J Math Biol ; 83(1): 10, 2021 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-34218334

RESUMO

Several implicit methods to infer horizontal gene transfer (HGT) focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of a graph, the later-divergence-time (LDT) graph, whose vertices correspond to genes colored by their species. We investigate these graphs in the setting of relaxed scenarios, i.e., evolutionary scenarios that encompass all commonly used variants of duplication-transfer-loss scenarios in the literature. We characterize LDT graphs as a subclass of properly vertex-colored cographs, and provide a polynomial-time recognition algorithm as well as an algorithm to construct a relaxed scenario that explains a given LDT. An edge in an LDT graph implies that the two corresponding genes are separated by at least one HGT event. The converse is not true, however. We show that the complete xenology relation is described by an rs-Fitch graph, i.e., a complete multipartite graph satisfying constraints on the vertex coloring. This class of vertex-colored graphs is also recognizable in polynomial time. We finally address the question "how much information about all HGT events is contained in LDT graphs" with the help of simulations of evolutionary scenarios with a wide range of duplication, loss, and HGT events. In particular, we show that a simple greedy graph editing scheme can be used to efficiently detect HGT events that are implicitly contained in LDT graphs.


Assuntos
Algoritmos , Transferência Genética Horizontal , Filogenia
4.
BMC Genomics ; 21(Suppl 2): 198, 2020 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-32299350

RESUMO

BACKGROUND: During cancer progression, malignant cells accumulate somatic mutations that can lead to genetic aberrations. In particular, evolutionary events akin to segmental duplications or deletions can alter the copy-number profile (CNP) of a set of genes in a genome. Our aim is to compute the evolutionary distance between two cells for which only CNPs are known. This asks for the minimum number of segmental amplifications and deletions to turn one CNP into another. This was recently formalized into a model where each event is assumed to alter a copy-number by 1 or -1, even though these events can affect large portions of a chromosome. RESULTS: We propose a general cost framework where an event can modify the copy-number of a gene by larger amounts. We show that any cost scheme that allows segmental deletions of arbitrary length makes computing the distance strongly NP-hard. We then devise a factor 2 approximation algorithm for the problem when copy-numbers are non-zero and provide an implementation called cnp2cnp. We evaluate our approach experimentally by reconstructing simulated cancer phylogenies from the pairwise distances inferred by cnp2cnp and compare it against two other alternatives, namely the MEDICC distance and the Euclidean distance. CONCLUSIONS: The experimental results show that our distance yields more accurate phylogenies on average than these alternatives if the given CNPs are error-free, but that the MEDICC distance is slightly more robust against error in the data. In all cases, our experiments show that either our approach or the MEDICC approach should preferred over the Euclidean distance.


Assuntos
Neoplasias/genética , Algoritmos , Simulação por Computador , Variações do Número de Cópias de DNA , Bases de Dados Genéticas , Evolução Molecular , Amplificação de Genes , Genoma Humano , Humanos , Modelos Genéticos , Filogenia , Deleção de Sequência
5.
BMC Genomics ; 21(Suppl 10): 779, 2020 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-33208096

RESUMO

BACKGROUND: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). RESULTS: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. CONCLUSIONS: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .


Assuntos
Algoritmos , Filogenia
6.
Bioinformatics ; 34(13): i366-i375, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29950018

RESUMO

Motivation: When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results: We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation: HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Duplicação Gênica , Software , Eucariotos/genética , Taxa de Mutação , Filogenia , Análise de Sequência de DNA/métodos
7.
PLoS Comput Biol ; 13(8): e1005611, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28763439

RESUMO

Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network-that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose "horizontal" moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and "vertical" moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves-named rNNI and rSPR-reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results-separating the contributions of horizontal and vertical moves-we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction.


Assuntos
Biologia Computacional/métodos , Rearranjo Gênico/genética , Modelos Genéticos , Filogenia , Animais , Hominidae/genética , Humanos
8.
BMC Bioinformatics ; 16 Suppl 14: S4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26451911

RESUMO

Combining a set of trees on partial datasets into a single tree is a classical method for inferring large phylogenetic trees. Ideally, the combined tree should display each input partial tree, which is only possible if input trees do not contain contradictory phylogenetic information. The simplest version of the supertree problem is thus to state whether a set of trees is compatible, and if so, construct a tree displaying them all. Classically, supertree methods have been applied to the reconstruction of species trees. Here we rather consider reconstructing a super gene tree in light of a known species tree S. We define the supergenetree problem as finding, among all supertrees displaying a set of input gene trees, one supertree minimizing a reconciliation distance with S. We first show how classical exact methods to the supertree problem can be extended to the supergenetree problem. As all these methods are highly exponential, we also exhibit a natural greedy heuristic for the duplication cost, based on minimizing the set of duplications preceding the first speciation event. We then show that both the supergenetree problem and its restriction to minimizing duplications preceding the first speciation are NP-hard to approximate within a n1-ϵ factor, for any 0 < ϵ < 1. Finally, we show that a restriction of this problem to uniquely labeled speciation gene trees, which is relevant to many biological applications, is also NP-hard. Therefore, we introduce new avenues in the field of supertrees, and set the theoretical basis for the exploration of various algorithmic aspects of the problems.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Especiação Genética , Filogenia , Animais , Humanos , Modelos Genéticos , Software
9.
Bioinformatics ; 30(17): i519-26, 2014 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25161242

RESUMO

MOTIVATION: Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gene copies. RESULTS: We introduce the problem of refining polytomies in a gene tree while minimizing the number of created non-apparent duplications in the resulting tree. We show that this problem can be described as a graph-theoretical optimization problem. We provide a bounded heuristic with guaranteed optimality for well-characterized instances. We apply our algorithm to a set of ray-finned fish gene trees from the Ensembl database to illustrate its ability to correct dubious duplications. AVAILABILITY AND IMPLEMENTATION: The C++ source code for the algorithms and simulations described in the article are available at http://www-ens.iro.umontreal.ca/~lafonman/software.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genes , Filogenia , Algoritmos , Animais , Peixes/genética
10.
BMC Genomics ; 15 Suppl 6: S12, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25572629

RESUMO

BACKGROUND: A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family  G. But is a given set  C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for  G? While previous studies have focused on full sets of constraints, here we consider the general case where  C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is  C satisfiable, i.e. can we find an event-labeled gene tree G inducing  C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? RESULTS: Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.


Assuntos
Evolução Molecular , Modelos Genéticos , Família Multigênica , Filogenia , Algoritmos
11.
Algorithms Mol Biol ; 19(1): 6, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38321476

RESUMO

BACKGROUND: Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. OUR CONTRIBUTIONS: We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.

12.
J Comput Biol ; 31(9): 815-833, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39088355

RESUMO

Gene duplication has a central role in evolution; still, little is known on the fates of the duplicated copies, their relative frequency, and on how environmental conditions affect them. Moreover, the lack of rigorous definitions concerning the fate of duplicated genes hinders the development of a global vision of this process. In this paper, we present a new framework aiming at characterizing and formally differentiating the fate of duplicated genes. Our framework has been tested via simulations, where the evolution of populations has been simulated using aevol, an in silico experimental evolution platform. Our results show several patterns that confirm some of the conclusions from previous studies, while also exhibiting new tendencies; this may open up new avenues to better understand the role of duplications as a driver of evolution.


Assuntos
Evolução Molecular , Duplicação Gênica , Modelos Genéticos , Simulação por Computador , Humanos
13.
J Comput Biol ; 31(4): 360-379, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38117611

RESUMO

Phylogenetic networks are increasingly being considered better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that have received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this article, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is the first exact algorithm proposed to solve the MACRS problem. As proven in an earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks.


Assuntos
Algoritmos , Filogenia , Biologia Computacional/métodos
14.
BMC Bioinformatics ; 14 Suppl 15: S5, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564227

RESUMO

BACKGROUND: Reconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny. RESULTS: We explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent. AVAILABILITY: Software and data available upon request to the corresponding author.


Assuntos
Alinhamento de Sequência , Algoritmos , Animais , Evolução Molecular , Humanos , Filogenia , Software , Sintenia
15.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1654-1666, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35349447

RESUMO

In phylogenetic networks, picking a cherry consists of removing a leaf that shares a parent with another leaf, or removing a reticulate edge whose endpoints are parents of leaves. Cherry-picking operations were recently shown to have several structural and algorithmic applications in the study of networks, for instance in determining their reconstructibility or in solving the network hybridization and network containment problems. In particular, some networks within certain classes are isomorphic if they can be reduced to a single node by the same sequence of cherry-picking operations. Therefore, cherry-picking sequences contain information on the level of similarity between two networks. In this paper, we expand on this idea by devising four novel distances on networks based on cherry picking and their reverse operation. We provide bounds between these distances and show that three of them are equal despite their different formulations. We also show that computing these three equivalent distances is NP-hard, even when restricted to comparing a tree and a network. On the positive side, we show that they can be computed in quadratic time on two trees, providing a new comparative measure for phylogenetic trees that can be computed efficiently.


Assuntos
Algoritmos , Hibridização Genética , Filogenia , Modelos Genéticos
16.
Algorithms Mol Biol ; 18(1): 16, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37940998

RESUMO

BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

17.
J Bioinform Comput Biol ; 19(6): 2140010, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34775924

RESUMO

Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.


Assuntos
Algoritmos , Análise por Conglomerados
18.
Comput Biol Chem ; 93: 107516, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34082320

RESUMO

Assembly is a fundamental task in genome sequencing, and many assemblers have been made available in the last decade. Because of the wide range of possible choices, it can be hard to determine which tool or parameter to use for a specific genome sequencing project. In this paper, we propose a consensus approach that takes the best parts of several contigs datasets produced by different methods, and combines them into a better assembly. This amounts to orienting and ordering sets of contigs, which can be viewed as an optimization problem where the aim is to find an alignment of two fragmented strings that maximizes an arbitrary scoring function between matched characters. In this work, we investigate the computational complexity of this problem. We first show that it is NP-hard, even in an alphabet with only two symbols and with all scores being either 0 or 1. On the positive side, we propose an efficient, quadratic time algorithm that achieves approximation factor 3.

19.
Algorithms Mol Biol ; 15: 16, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32843891

RESUMO

BACKGROUND: The history of gene families-which are equivalent to event-labeled gene trees-can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are "biologically feasible" which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way. RESULTS: In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a "time-consistent" species tree for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity.

20.
Algorithms Mol Biol ; 15: 8, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32391071

RESUMO

BACKGROUND: In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. RESULTS: We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. CONCLUSION: Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa