Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(D1): D529-D535, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37843103

RESUMEN

To date, the databases built to gather information on gene orthology do not provide end-users with descriptors of the molecular evolution information and phylogenetic pattern of these orthologues. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of coding sequences in mammalian genomes. OrthoMaM version 12 includes 15,868 alignments of orthologous coding sequences (CDS) from the 190 complete mammalian genomes currently available. All annotations and 1-to-1 orthology assignments are based on NCBI. Orthologous CDS can be mined for potential informative markers at the different taxonomic levels of the mammalian tree. To this end, several evolutionary descriptors of DNA sequences are provided for querying purposes (e.g. base composition and relative substitution rate). The graphical web interface allows the user to easily browse and sort the results of combined queries. The corresponding multiple sequence alignments and ML trees, inferred using state-of-the art approaches, are available for download both at the nucleotide and amino acid levels. OrthoMaM v12 can be used by researchers interested either in reconstructing the phylogenetic relationships of mammalian taxa or in understanding the evolutionary dynamics of coding sequences in their genomes. OrthoMaM is available for browsing, querying and complete or filtered download at https://orthomam.mbb.cnrs.fr/.


Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Secuencia de Bases , Genoma , Genómica/métodos , Mamíferos/clasificación , Mamíferos/genética , Filogenia , Evolución Biológica
2.
Syst Biol ; 2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38330161

RESUMEN

The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss (DTL), and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempt to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models - MLMSC (MultiLocus MultiSpecies Coalescent), which models CNH, and DLCoal (Duplication, Loss, and Coalescence), which does not - approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.

3.
Mol Biol Evol ; 40(10)2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37794645

RESUMEN

Pangolins form a group of scaly mammals that are trafficked at record numbers for their meat and purported medicinal properties. Despite their conservation concern, knowledge of their evolution is limited by a paucity of genomic data. We aim to produce exhaustive genomic resources that include 3,238 orthologous genes and whole-genome polymorphisms to assess the evolution of all eight extant pangolin species. Robust orthologous gene-based phylogenies recovered the monophyly of the three genera and highlighted the existence of an undescribed species closely related to Southeast Asian pangolins. Signatures of middle Miocene admixture between an extinct, possibly European, lineage and the ancestor of Southeast Asian pangolins, provide new insights into the early evolutionary history of the group. Demographic trajectories and genome-wide heterozygosity estimates revealed contrasts between continental versus island populations and species lineages, suggesting that conservation planning should consider intraspecific patterns. With the expected loss of genomic diversity from recent, extensive trafficking not yet realized in pangolins, we recommend that populations be genetically surveyed to anticipate any deleterious impact of the illegal trade. Finally, we produce a complete set of genomic resources that will be integral for future conservation management and forensic endeavors for pangolins, including tracing their illegal trade. These comprise the completion of whole-genomes for pangolins through the hybrid assembly of the first reference genome for the giant pangolin (Smutsia gigantea) and new draft genomes (∼43x-77x) for four additional species, as well as a database of orthologous genes with over 3.4 million polymorphic sites.


Asunto(s)
Mamíferos , Pangolines , Animales , Pangolines/genética , Mamíferos/genética , Genoma , Filogenia , Genómica
4.
Bioinformatics ; 38(15): 3725-3733, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35713506

RESUMEN

MOTIVATION: Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS: We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION: Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Teorema de Bayes , Alineación de Secuencia , Funciones de Verosimilitud
5.
Syst Biol ; 71(3): 526-546, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-34324671

RESUMEN

Introgression is an important biological process affecting at least 10% of the extant species in the animal kingdom. Introgression significantly impacts inference of phylogenetic species relationships where a strictly binary tree model cannot adequately explain reticulate net-like species relationships. Here, we use phylogenomic approaches to understand patterns of introgression along the evolutionary history of a unique, nonmodel insect system: dragonflies and damselflies (Odonata). We demonstrate that introgression is a pervasive evolutionary force across various taxonomic levels within Odonata. In particular, we show that the morphologically "intermediate" species of Anisozygoptera (one of the three primary suborders within Odonata besides Zygoptera and Anisoptera), which retain phenotypic characteristics of the other two suborders, experienced high levels of introgression likely coming from zygopteran genomes. Additionally, we find evidence for multiple cases of deep inter-superfamilial ancestral introgression. [Gene flow; Odonata; phylogenomics; reticulate evolution.].


Asunto(s)
Odonata , Animales , Genoma , Insectos/anatomía & histología , Odonata/anatomía & histología , Odonata/genética , Filogenia
6.
Syst Biol ; 70(4): 822-837, 2021 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-33169795

RESUMEN

Incomplete lineage sorting (ILS), the interaction between coalescence and speciation, can generate incongruence between gene trees and species trees, as can gene duplication (D), transfer (T), and loss (L). These processes are usually modeled independently, but in reality, ILS can affect gene copy number polymorphism, that is, interfere with DTL. This has been previously recognized, but not treated in a satisfactory way, mainly because DTL events are naturally modeled forward-in-time, while ILS is naturally modeled backward-in-time with the coalescent. Here, we consider the joint action of ILS and DTL on the gene tree/species tree problem in all its complexity. In particular, we show that the interaction between ILS and duplications/transfers (without losses) can result in patterns usually interpreted as resulting from gene loss, and that the realized rate of D, T, and L becomes nonhomogeneous in time when ILS is taken into account. We introduce algorithmic solutions to these problems. Our new model, the multilocus multispecies coalescent, which also accounts for any level of linkage between loci, generalizes the multispecies coalescent (MSC) model and offers a versatile, powerful framework for proper simulation, and inference of gene family evolution. [Gene duplication; gene loss; horizontal gene transfer; incomplete lineage sorting; multispecies coalescent; hemiplasy; recombination.].


Asunto(s)
Evolución Molecular , Duplicación de Gen , Modelos Genéticos , Familia de Multigenes , Simulación por Computador , Transferencia de Gen Horizontal , Especiación Genética , Filogenia
7.
PLoS Comput Biol ; 17(9): e1008380, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34478440

RESUMEN

For various species, high quality sequences and complete genomes are nowadays available for many individuals. This makes data analysis challenging, as methods need not only to be accurate, but also time efficient given the tremendous amount of data to process. In this article, we introduce an efficient method to infer the evolutionary history of individuals under the multispecies coalescent model in networks (MSNC). Phylogenetic networks are an extension of phylogenetic trees that can contain reticulate nodes, which allow to model complex biological events such as horizontal gene transfer, hybridization and introgression. We present a novel way to compute the likelihood of biallelic markers sampled along genomes whose evolution involved such events. This likelihood computation is at the heart of a Bayesian network inference method called SnappNet, as it extends the Snapp method inferring evolutionary trees under the multispecies coalescent model, to networks. SnappNet is available as a package of the well-known beast 2 software. Recently, the MCMC_BiMarkers method, implemented in PhyloNet, also extended Snapp to networks. Both methods take biallelic markers as input, rely on the same model of evolution and sample networks in a Bayesian framework, though using different methods for computing priors. However, SnappNet relies on algorithms that are exponentially more time-efficient on non-trivial networks. Using simulations, we compare performances of SnappNet and MCMC_BiMarkers. We show that both methods enjoy similar abilities to recover simple networks, but SnappNet is more accurate than MCMC_BiMarkers on more complex network scenarios. Also, on complex networks, SnappNet is found to be extremely faster than MCMC_BiMarkers in terms of time required for the likelihood computation. We finally illustrate SnappNet performances on a rice data set. SnappNet infers a scenario that is consistent with previous results and provides additional understanding of rice evolution.


Asunto(s)
Cadenas de Markov , Método de Montecarlo , Filogenia , Algoritmos , Teorema de Bayes , Biología Computacional/métodos , Evolución Molecular , Genes de Plantas , Funciones de Verosimilitud , Oryza/clasificación , Oryza/genética
8.
J Math Biol ; 85(3): 22, 2022 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-35976512

RESUMEN

methods seek to infer a species tree from a set of gene trees. A desirable property of such methods is that of statistical consistency; that is, the probability of inferring the wrong species tree (the error probability) tends to 0 as the number of input gene trees becomes large. A popular paradigm is to infer a species tree that agrees with the maximum number of quartets from the input set of gene trees; this has been proved to be statistically consistent under several models of gene evolution. In this paper, we study the asymptotic behaviour of the error probability of such methods in this limit, and show that it decays exponentially. For a 4-taxon species tree, we derive a closed form for the asymptotic behaviour in terms of the probability that the gene evolution process produces the correct topology. We also derive bounds for the sample complexity (the number of gene trees required to infer the true species tree with a given probability), which outperform existing bounds. We then extend our results to bounds for the asymptotic behaviour of the error probability for any species tree, and compare these to the true error probability for some model species trees using simulations.


Asunto(s)
Evolución Molecular , Modelos Genéticos , Especiación Genética , Filogenia , Probabilidad
9.
Mol Biol Evol ; 37(11): 3292-3307, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32886770

RESUMEN

Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods.


Asunto(s)
Técnicas Genéticas , Filogenia , Algoritmos , Plantas/genética , Levaduras/genética
10.
Bioinformatics ; 36(18): 4822-4824, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-33085745

RESUMEN

MOTIVATION: Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS: We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION: Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.


Asunto(s)
Algoritmos , Evolución Molecular , Filogenia , Alineación de Secuencia , Programas Informáticos
11.
Theor Popul Biol ; 137: 22-31, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33333117

RESUMEN

The multispecies coalescent process models the genealogical relationships of genes sampled from several species, enabling useful predictions about phenomena such as the discordance between a gene tree and the species phylogeny due to incomplete lineage sorting. Conversely, knowledge of large collections of gene trees can inform us about several aspects of the species phylogeny, such as its topology and ancestral population sizes. A fundamental open problem in this context is how to efficiently compute the probability of a gene tree topology, given the species phylogeny. Although a number of algorithms for this task have been proposed, they either produce approximate results, or, when they are exact, they do not scale to large data sets. In this paper, we present some progress towards exact and efficient computation of the probability of a gene tree topology. We provide a new algorithm that, given a species tree and the number of genes sampled for each species, calculates the probability that the gene tree topology will be concordant with the species tree. Moreover, we provide an algorithm that computes the probability of any specific gene tree topology concordant with the species tree. Both algorithms run in polynomial time and have been implemented in Python. Experiments show that they are able to analyze data sets where thousands of genes are sampled in a matter of minutes to hours.


Asunto(s)
Algoritmos , Modelos Genéticos , Especiación Genética , Filogenia , Probabilidad
12.
Syst Biol ; 69(1): 38-60, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31062850

RESUMEN

Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recently, it has become possible to sequence the entire genomes of numerous nonbiological models in parallel at reasonable cost, particularly with shotgun sequencing. Here, we identify orthologous coding sequences from whole-genome shotgun sequences, which we then use to investigate the relevance and power of phylogenomic relationship inference and time-calibrated tree estimation. We study an iconic group of butterflies-swallowtails of the family Papilionidae-that has remained phylogenetically unresolved, with continued debate about the timing of their diversification. Low-coverage whole genomes were obtained using Illumina shotgun sequencing for all genera. Genome assembly coupled to BLAST-based orthology searches allowed extraction of 6621 orthologous protein-coding genes for 45 Papilionidae species and 16 outgroup species (with 32% missing data after cleaning phases). Supermatrix phylogenomic analyses were performed with both maximum-likelihood (IQ-TREE) and Bayesian mixture models (PhyloBayes) for amino acid sequences, which produced a fully resolved phylogeny providing new insights into controversial relationships. Species tree reconstruction from gene trees was performed with ASTRAL and SuperTriplets and recovered the same phylogeny. We estimated gene site concordant factors to complement traditional node-support measures, which strengthens the robustness of inferred phylogenies. Bayesian estimates of divergence times based on a reduced data set (760 orthologs and 12% missing data) indicate a mid-Cretaceous origin of Papilionoidea around 99.2 Ma (95% credibility interval: 68.6-142.7 Ma) and Papilionidae around 71.4 Ma (49.8-103.6 Ma), with subsequent diversification of modern lineages well after the Cretaceous-Paleogene event. These results show that shotgun sequencing of whole genomes, even when highly fragmented, represents a powerful approach to phylogenomics and molecular dating in a group that has previously been refractory to resolution.


Asunto(s)
Evolución Biológica , Mariposas Diurnas/clasificación , Mariposas Diurnas/genética , Genoma de los Insectos/genética , Filogenia , Animales , Tiempo
13.
J Math Biol ; 83(5): 52, 2021 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-34676444

RESUMEN

Measures of phylogenetic balance, such as the Colless and Sackin indices, play an important role in phylogenetics. Unfortunately, these indices are specifically designed for phylogenetic trees, and do not extend naturally to phylogenetic networks (which are increasingly used to describe reticulate evolution). This led us to consider a lesser-known balance index, whose definition is based on a probabilistic interpretation that is equally applicable to trees and to networks. This index, known as the [Formula: see text] index, was first proposed by Shao and Sokal (Syst Zool 39(3): 266-276, 1990). Surprisingly, it does not seem to have been studied mathematically since. Likewise, it is used only sporadically in the biological literature, where it tends to be viewed as arcane. In this paper, we study mathematical properties of [Formula: see text] such as its expectation and variance under the most common models of random trees and its extremal values over various classes of phylogenetic networks. We also assess its relevance in biological applications, and find it to be comparable to that of the Colless and Sackin indices. Altogether, our results call for a reevaluation of the status of this somewhat forgotten measure of phylogenetic balance.


Asunto(s)
Algoritmos , Evolución Biológica , Filogenia
14.
Mol Biol Evol ; 36(4): 861-862, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30698751

RESUMEN

We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.


Asunto(s)
Bases de Datos Genéticas , Genoma , Mamíferos/genética , Filogenia , Alineación de Secuencia , Animales
15.
PLoS Comput Biol ; 15(9): e1007347, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31509525

RESUMEN

Phylogenetic networks generalize phylogenetic trees by allowing the modelization of events of reticulate evolution. Among the different kinds of phylogenetic networks that have been proposed in the literature, the subclass of binary tree-child networks is one of the most studied ones. However, very little is known about the combinatorial structure of these networks. In this paper we address the problem of generating all possible binary tree-child (BTC) networks with a given number of leaves in an efficient way via reduction/augmentation operations that extend and generalize analogous operations for phylogenetic trees, and are biologically relevant. Since our solution is recursive, this also provides us with a recurrence relation giving an upper bound on the number of such networks. We also show how the operations introduced in this paper can be employed to extend the evolutive history of a set of sequences, represented by a BTC network, to include a new sequence. An implementation in python of the algorithms described in this paper, along with some computational experiments, can be downloaded from https://github.com/bielcardona/TCGenerators.


Asunto(s)
Biología Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos , Simulación por Computador
16.
PLoS Comput Biol ; 15(10): e1007440, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31596844

RESUMEN

[This corrects the article DOI: 10.1371/journal.pcbi.1007347.].

17.
Nucleic Acids Res ; 46(D1): D718-D725, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29149270

RESUMEN

ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates.


Asunto(s)
Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genoma , Urocordados/genética , Animales , Evolución Biológica , Ciona intestinalis/genética , ADN/metabolismo , Minería de Datos , Evolución Molecular , Expresión Génica , Ontología de Genes , Internet , Anotación de Secuencia Molecular , Filogenia , Unión Proteica , Especificidad de la Especie , Factores de Transcripción/metabolismo , Transcripción Genética , Vertebrados/genética , Navegador Web
18.
Bioinformatics ; 34(21): 3646-3652, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29762653

RESUMEN

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Algoritmos , Filogenia , Programas Informáticos
19.
Syst Biol ; 67(3): 518-542, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29272537

RESUMEN

Phylogenetic networks are well suited to represent evolutionary histories comprising reticulate evolution. Several methods aiming at reconstructing explicit phylogenetic networks have been developed in the last two decades. In this article, we propose a new definition of maximum parsimony for phylogenetic networks that permits to model biological scenarios that cannot be modeled by the definitions currently present in the literature (namely, the "hardwired" and "softwired" parsimony). Building on this new definition, we provide several algorithmic results that lay the foundations for new parsimony-based methods for phylogenetic network reconstruction.


Asunto(s)
Clasificación/métodos , Modelos Biológicos , Filogenia , Algoritmos
20.
J Math Biol ; 78(1-2): 527-547, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30121824

RESUMEN

Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.


Asunto(s)
Modelos Genéticos , Filogenia , Algoritmos , Animales , Biología Computacional , Evolución Molecular , Especiación Genética , Humanos , Conceptos Matemáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA