Búsqueda | Portal Regional de la BVS

1.

The Space of Equidistant Phylogenetic Cactuses.

Huber, Katharina T; Moulton, Vincent; Owen, Megan; Spillner, Andreas; St John, Katherine.

Ann Comb ; 28(1): 1-32, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38433929

RESUMEN

An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.

2.

A distance-based model for convergent evolution.

Holland, Barbara; Huber, Katharina T; Moulton, Vincent.

J Math Biol ; 88(2): 17, 2024 01 18.

Artículo en Inglés | MEDLINE | ID: mdl-38238584

RESUMEN

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model's parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.

Asunto(s)

Evolución Molecular , Modelos Genéticos , Filogenia

3.

Diversities and the Generalized Circumradius.

Bryant, David; Huber, Katharina T; Moulton, Vincent; Tupper, Paul F.

Discrete Comput Geom ; 70(4): 1862-1883, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38022897

RESUMEN

The generalized circumradius of a set of points AâRd with respect to a convex body K equals the minimum value of λ≥0 such that a translate of λK contains A. Each choice of K gives a different function on the set of bounded subsets of Rd; we characterize which functions can arise in this way. Our characterization draws on the theory of diversities, a recently introduced generalization of metrics from functions on pairs to functions on finite subsets. We additionally investigate functions which arise by restricting the generalized circumradius to a finite subset of Rd. We obtain elegant characterizations in the case that K is a simplex or parallelotope.

4.

The rigid hybrid number for two phylogenetic trees.

Huber, Katharina T; Linz, Simone; Moulton, Vincent.

J Math Biol ; 82(5): 40, 2021 03 26.

Artículo en Inglés | MEDLINE | ID: mdl-33770290

RESUMEN

Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n.

Asunto(s)

Modelos Genéticos , Filogenia , Algoritmos , Humanos , Hibridación Genética

5.

Evolution through segmental duplications and losses: a Super-Reconciliation approach.

Delabre, Mattéo; El-Mabrouk, Nadia; Huber, Katharina T; Lafond, Manuel; Moulton, Vincent; Noutahi, Emmanuel; Castellanos, Miguel Sautie.

Algorithms Mol Biol ; 15: 12, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32508979

RESUMEN

The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.

6.

Reconciling event-labeled gene trees with MUL-trees and species networks.

Hellmuth, Marc; Huber, Katharina T; Moulton, Vincent.

J Math Biol ; 79(5): 1885-1925, 2019 10.

Artículo en Inglés | MEDLINE | ID: mdl-31410552

RESUMEN

Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.

Asunto(s)

Evolución Molecular , Redes Reguladoras de Genes , Modelos Genéticos , Filogenia , Algoritmos , Duplicación de Gen , Especiación Genética , Conceptos Matemáticos

7.

OSF-Builder: A New Tool for Constructing and Representing Evolutionary Histories Involving Introgression.

Scholz, Guillaume E; Popescu, Andrei-Alin; Taylor, Martin I; Moulton, Vincent; Huber, Katharina T.

Syst Biol ; 68(5): 717-729, 2019 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-30668824

RESUMEN

Introgression is an evolutionary process which provides an important source of innovation for evolution. Although various methods have been used to detect introgression, very few methods are currently available for constructing evolutionary histories involving introgression. In this article, we propose a new method for constructing such evolutionary histories whose starting point is a species forest (consisting of a collection of lineage trees, usually arising as a collection of clades or monophyletic groups in a species tree), and a gene tree for a specific allele of interest, or allele tree for short. Our method is based on representing introgression in terms of a certain "overlay" of the allele tree over the lineage trees, called an overlaid species forest (OSF). OSFs are similar to phylogenetic networks although a key difference is that they typically have multiple roots because each monophyletic group in the species tree has a different point of origin. Employing a new model for introgression, we derive an efficient algorithm for building OSFs called OSF-Builder that is guaranteed to return an optimal OSF in the sense that the number of potential introgression events is minimized. As well as using simulations to assess the performance of OSF-Builder, we illustrate its use on a butterfly data set in which introgression has been previously inferred. The OSF-Builder software is available for download from https://www.uea.ac.uk/computing/software/OSF-Builder.

Asunto(s)

Evolución Biológica , Clasificación/métodos , Programas Informáticos

8.

Phylogenetic Flexibility via Hall-Type Inequalities and Submodularity.

Huber, Katharina T; Moulton, Vincent; Steel, Mike.

Bull Math Biol ; 81(2): 598-617, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-29589255

RESUMEN

Given a collection [Formula: see text] of subsets of a finite set X, we say that [Formula: see text] is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection [Formula: see text], R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that [Formula: see text] is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being 'slim'. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not [Formula: see text] is slim. This 'slim' condition reduces to a simpler inequality in the case where all of the sets in [Formula: see text] have size 3, a property we call 'thin'. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection [Formula: see text] of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species.

Asunto(s)

Modelos Genéticos , Filogenia , Algoritmos , Biología Computacional , Evolución Molecular , Genómica/estadística & datos numéricos , Conceptos Matemáticos , Modelos Estadísticos

9.

Exploring and Visualizing Spaces of Tree Reconciliations.

Huber, Katharina T; Moulton, Vincent; Sagot, Marie-France; Sinaimeri, Blerina.

Syst Biol ; 68(4): 607-618, 2019 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-30418649

RESUMEN

Tree reconciliation is the mathematical tool that is used to investigate the coevolution of organisms, such as hosts and parasites. A common approach to tree reconciliation involves specifying a model that assigns costs to certain events, such as cospeciation, and then tries to find a mapping between two specified phylogenetic trees which minimizes the total cost of the implied events. For such models, it has been shown that there may be a huge number of optimal solutions, or at least solutions that are close to optimal. It is therefore of interest to be able to systematically compare and visualize whole collections of reconciliations between a specified pair of trees. In this article, we consider various metrics on the set of all possible reconciliations between a pair of trees, some that have been defined before but also new metrics that we shall propose. We show that the diameter for the resulting spaces of reconciliations can in some cases be determined theoretically, information that we use to normalize and compare properties of the metrics. We also implement the metrics and compare their behavior on several host parasite data sets, including the shapes of their distributions. In addition, we show that in combination with multidimensional scaling, the metrics can be useful for visualizing large collections of reconciliations, much in the same way as phylogenetic tree metrics can be used to explore collections of phylogenetic trees. Implementations of the metrics can be downloaded from: https://team.inria.fr/erable/en/team-members/blerina-sinaimeri/reconciliation-distances/.

Asunto(s)

Clasificación/métodos , Interacciones Huésped-Parásitos/fisiología , Filogenia , Modelos Biológicos

10.

Quarnet Inference Rules for Level-1 Networks.

Huber, Katharina T; Moulton, Vincent; Semple, Charles; Wu, Taoyang.

Bull Math Biol ; 80(8): 2137-2153, 2018 08.

Artículo en Inglés | MEDLINE | ID: mdl-29869043

RESUMEN

An important problem in phylogenetics is the construction of phylogenetic trees. One way to approach this problem, known as the supertree method, involves inferring a phylogenetic tree with leaves consisting of a set X of species from a collection of trees, each having leaf-set some subset of X. In the 1980s, Colonius and Schulze gave certain inference rules for deciding when a collection of 4-leaved trees, one for each 4-element subset of X, can be simultaneously displayed by a single supertree with leaf-set X. Recently, it has become of interest to extend this and related results to phylogenetic networks. These are a generalization of phylogenetic trees which can be used to represent reticulate evolution (where species can come together to form a new species). It has recently been shown that a certain type of phylogenetic network, called a (unrooted) level-1 network, can essentially be constructed from 4-leaved trees. However, the problem of providing appropriate inference rules for such networks remains unresolved. Here, we show that by considering 4-leaved networks, called quarnets, as opposed to 4-leaved trees, it is possible to provide such rules. In particular, we show that these rules can be used to characterize when a collection of quarnets, one for each 4-element subset of X, can all be simultaneously displayed by a level-1 network with leaf-set X. The rules are an intriguing mixture of tree inference rules, and an inference rule for building up a cyclic ordering of X from orderings on subsets of X of size 4. This opens up several new directions of research for inferring phylogenetic networks from smaller ones, which could yield new algorithms for solving the supernetwork problem in phylogenetics.

Asunto(s)

Modelos Biológicos , Filogenia , Evolución Biológica , Especiación Genética , Conceptos Matemáticos

11.

Recovering normal networks from shortest inter-taxa distance information.

Bordewich, Magnus; Huber, Katharina T; Moulton, Vincent; Semple, Charles.

J Math Biol ; 77(3): 571-594, 2018 09.

Artículo en Inglés | MEDLINE | ID: mdl-29478083

RESUMEN

Phylogenetic networks are a type of leaf-labelled, acyclic, directed graph used by biologists to represent the evolutionary history of species whose past includes reticulation events. A phylogenetic network is tree-child if each non-leaf vertex is the parent of a tree vertex or a leaf. Up to a certain equivalence, it has been recently shown that, under two different types of weightings, edge-weighted tree-child networks are determined by their collection of distances between each pair of taxa. However, the size of these collections can be exponential in the size of the taxa set. In this paper, we show that, if we have no "shortcuts", that is, the networks are normal, the same results are obtained with only a quadratic number of inter-taxa distances by using the shortest distance between each pair of taxa. The proofs are constructive and give cubic-time algorithms in the size of the taxa sets for building such weighted networks.

Asunto(s)

Evolución Biológica , Modelos Biológicos , Filogenia , Algoritmos , Simulación por Computador , Conceptos Matemáticos

12.

Bounds for phylogenetic network space metrics.

Francis, Andrew; Huber, Katharina T; Moulton, Vincent; Wu, Taoyang.

J Math Biol ; 76(5): 1229-1248, 2018 04.

Artículo en Inglés | MEDLINE | ID: mdl-28836230

RESUMEN

Phylogenetic networks are a generalization of phylogenetic trees that allow for representation of reticulate evolution. Recently, a space of unrooted phylogenetic networks was introduced, where such a network is a connected graph in which every vertex has degree 1 or 3 and whose leaf-set is a fixed set X of taxa. This space, denoted [Formula: see text], is defined in terms of two operations on networks-the nearest neighbor interchange and triangle operations-which can be used to transform any network with leaf set X into any other network with that leaf set. In particular, it gives rise to a metric d on [Formula: see text] which is given by the smallest number of operations required to transform one network in [Formula: see text] into another in [Formula: see text]. The metric generalizes the well-known NNI-metric on phylogenetic trees which has been intensively studied in the literature. In this paper, we derive a bound for the metric d as well as a related metric [Formula: see text] which arises when restricting d to the subset of [Formula: see text] consisting of all networks with [Formula: see text] vertices, [Formula: see text]. We also introduce two new metrics on networks-the SPR and TBR metrics-which generalize the metrics on phylogenetic trees with the same name and give bounds for these new metrics. We expect our results to eventually have applications to the development and understanding of network search algorithms.

Asunto(s)

Evolución Biológica , Modelos Biológicos , Filogenia , Algoritmos , Conceptos Matemáticos

13.

Transforming phylogenetic networks: Moving beyond tree space.

Huber, Katharina T; Moulton, Vincent; Wu, Taoyang.

J Theor Biol ; 404: 30-39, 2016 09 07.

Artículo en Inglés | MEDLINE | ID: mdl-27224010

RESUMEN

Phylogenetic networks are a generalization of phylogenetic trees that are used to represent reticulate evolution. Unrooted phylogenetic networks form a special class of such networks, which naturally generalize unrooted phylogenetic trees. In this paper we define two operations on unrooted phylogenetic networks, one of which is a generalization of the well-known nearest-neighbor interchange (NNI) operation on phylogenetic trees. We show that any unrooted phylogenetic network can be transformed into any other such network using only these operations. This generalizes the well-known fact that any phylogenetic tree can be transformed into any other such tree using only NNI operations. It also allows us to define a generalization of tree space and to define some new metrics on unrooted phylogenetic networks. To prove our main results, we employ some fascinating new connections between phylogenetic networks and cubic graphs that we have recently discovered. Our results should be useful in developing new strategies to search for optimal phylogenetic networks, a topic that has recently generated some interest in the literature, as well as for providing new ways to compare networks.

Asunto(s)

Filogenia , Modelos Biológicos

14.

Folding and unfolding phylogenetic trees and networks.

Huber, Katharina T; Moulton, Vincent; Steel, Mike; Wu, Taoyang.

J Math Biol ; 73(6-7): 1761-1780, 2016 12.

Artículo en Inglés | MEDLINE | ID: mdl-27107869

RESUMEN

Phylogenetic networks are rooted, labelled directed acyclic graphswhich are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be "unfolded" to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be "folded" to obtain aphylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop aphylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.

Asunto(s)

Clasificación/métodos , Modelos Genéticos , Filogenia , Algoritmos

15.

Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations.

Huber, Katharina T; Linz, Simone; Moulton, Vincent; Wu, Taoyang.

J Math Biol ; 72(3): 699-725, 2016 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-26037483

RESUMEN

Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another. We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches.

Asunto(s)

Evolución Biológica , Modelos Biológicos , Filogenia , Algoritmos , Teorema de Bayes , Biología Computacional , Funciones de Verosimilitud , Conceptos Matemáticos

16.

PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS.

Popescu, Andrei-Alin; Huber, Katharina T.

Bioinformatics ; 31(21): 3552-4, 2015 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-26142187

RESUMEN

UNLABELLED: Genome-wide association studies are an invaluable tool for identifying genotypic loci linked with agriculturally important traits or certain diseases. The signal on which such studies rely upon can, however, be obscured by population stratification making it necessary to account for it in some way. Population stratification is dependent on when admixture happened and thus can occur at various levels. To aid in its inference at the genome level, we recently introduced psiko, and comparison with leading methods indicates that it has attractive properties. However, until now, it could not be used for local ancestry inference which is preferable in cases of recent admixture as the genome level tends to be too coarse to properly account for processes acting on small segments of a genome. To also bring the powerful ideas underpinning psiko to bear in such studies, we extended it to psiko2, which we introduce here. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and user manual are freely available at https://www.uea.ac.uk/computing/psiko. CONTACT: Andrei-Alin.Popescu@uea.ac.uk or Katharina.Huber@cmp.uea.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Genotipo , Humanos , Grupos de Población/genética

17.

Reconstructing (super)trees from data sets with missing distances: not all is lost.

Kettleborough, George; Dicks, Jo; Roberts, Ian N; Huber, Katharina T.

Mol Biol Evol ; 32(6): 1628-42, 2015 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-25657329

RESUMEN

The wealth of phylogenetic information accumulated over many decades of biological research, coupled with recent technological advances in molecular sequence generation, presents significant opportunities for researchers to investigate relationships across and within the kingdoms of life. However, to make best use of this data wealth, several problems must first be overcome. One key problem is finding effective strategies to deal with missing data. Here, we introduce Lasso, a novel heuristic approach for reconstructing rooted phylogenetic trees from distance matrices with missing values, for data sets where a molecular clock may be assumed. Contrary to other phylogenetic methods on partial data sets, Lasso possesses desirable properties such as its reconstructed trees being both unique and edge-weighted. These properties are achieved by Lasso restricting its leaf set to a large subset of all possible taxa, which in many practical situations is the entire taxa set. Furthermore, the Lasso approach is distance-based, rendering it very fast to run and suitable for data sets of all sizes, including large data sets such as those generated by modern Next Generation Sequencing technologies. To better understand the performance of Lasso, we assessed it by means of artificial and real biological data sets, showing its effectiveness in the presence of missing data. Furthermore, by formulating the supermatrix problem as a particular case of the missing data problem, we assessed Lasso's ability to reconstruct supertrees. We demonstrate that, although not specifically designed for such a purpose, Lasso performs better than or comparably with five leading supertree algorithms on a challenging biological data set. Finally, we make freely available a software implementation of Lasso so that researchers may, for the first time, perform both rooted tree and supertree reconstruction with branch lengths on their own partial data sets.

Asunto(s)

Bases de Datos Genéticas , Modelos Genéticos , Filogenia , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/genética , Programas Informáticos , Triticum/clasificación , Triticum/genética

18.

How much information is needed to infer reticulate evolutionary histories?

Huber, Katharina T; Van Iersel, Leo; Moulton, Vincent; Wu, Taoyang.

Syst Biol ; 64(1): 102-11, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25236959

RESUMEN

Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.

Asunto(s)

Clasificación/métodos , Filogenia , Cryptococcus gattii/clasificación , Cryptococcus gattii/genética , Modelos Teóricos

19.

A novel and fast approach for population structure inference using kernel-PCA and optimization.

Popescu, Andrei-Alin; Harper, Andrea L; Trick, Martin; Bancroft, Ian; Huber, Katharina T.

Genetics ; 198(4): 1421-31, 2014 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-25326237

RESUMEN

Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko.

Asunto(s)

Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Análisis de Componente Principal , Algoritmos , Simulación por Computador , Conjuntos de Datos como Asunto , Endogamia , Polimorfismo de Nucleótido Simple

20.

Lassoing and corralling rooted phylogenetic trees.

Huber, Katharina T; Popescu, Andrei-Alin.

Bull Math Biol ; 75(3): 444-65, 2013 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-23381929

RESUMEN

The construction of a dendogram on a set of individuals is a key component of a genomewide association study. However, even with modern sequencing technologies the distances on the individuals required for the construction of such a structure may not always be reliable making it tempting to exclude them from an analysis. This, in turn, results in an input set for dendogram construction that consists of only partial distance information, which raises the following fundamental question. For what (proper) subsets of a dendogram's leaf set can we uniquely reconstruct the dendogram from the distances that it induces on the elements of such a subset? By formalizing a dendogram in terms of an edge-weighted, rooted, phylogenetic tree on a pre-given finite set X with |X|≥3 whose edge-weighting is equidistant and subsets Y of X for which the distances between every pair of elements in Y is known in terms of sets [Formula: see text] of 2-subsets of X, we investigate this problem from the perspective of when such a tree is lassoed, that is, uniquely determined by the elements in [Formula: see text]. For this, we consider four different formalizations of the idea of "uniquely determining" giving rise to four distinct types of lassos. We present characterizations for all of them in terms of the child-edge graphs of the interior vertices of such a tree. Our characterizations imply in particular that in case the tree in question is binary, then all four types of lasso must coincide.

Asunto(s)

Modelos Genéticos , Filogenia

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA