Búsqueda | Portal Regional de la BVS

A divide-and-conquer method for scalable phylogenetic network inference from multilocus data.

Zhu, Jiafan; Liu, Xinhao; Ogilvie, Huw A; Nakhleh, Luay K.

Bioinformatics ; 35(14): i370-i378, 2019 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-31510688

RESUMEN

MOTIVATION: Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. RESULTS: In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. AVAILABILITY AND IMPLEMENTATION: We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Filogenia , Evolución Molecular , Genoma , Alineación de Secuencia , Programas Informáticos

Inference of species phylogenies from bi-allelic markers using pseudo-likelihood.

Zhu, Jiafan; Nakhleh, Luay.

Bioinformatics ; 34(13): i376-i385, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29950004

RESUMEN

Motivation: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method's applicability. Results: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss. Availability and implementation: The methods have been implemented in PhyloNet (http://bioinfocs.rice.edu/phylonet).

Asunto(s)

Alelos , Biología Computacional/métodos , Modelos Genéticos , Filogenia , Programas Informáticos , Evolución Molecular , Probabilidad

Inferring Phylogenetic Networks Using PhyloNet.

Wen, Dingqiao; Yu, Yun; Zhu, Jiafan; Nakhleh, Luay.

Syst Biol ; 67(4): 735-740, 2018 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-29514307

RESUMEN

PhyloNet was released in 2008 as a software package for representing and analyzing phylogenetic networks. At the time of its release, the main functionalities in PhyloNet consisted of measures for comparing network topologies and a single heuristic for reconciling gene trees with a species tree. Since then, PhyloNet has grown significantly. The software package now includes a wide array of methods for inferring phylogenetic networks from data sets of unlinked loci while accounting for both reticulation (e.g., hybridization) and incomplete lineage sorting. In particular, PhyloNet now allows for maximum parsimony, maximum likelihood, and Bayesian inference of phylogenetic networks from gene tree estimates. Furthermore, Bayesian inference directly from sequence data (sequence alignments or biallelic markers) is implemented. Maximum parsimony is based on an extension of the "minimizing deep coalescences" criterion to phylogenetic networks, whereas maximum likelihood and Bayesian inference are based on the multispecies network coalescent. All methods allow for multiple individuals per species. As computing the likelihood of a phylogenetic network is computationally hard, PhyloNet allows for evaluation and inference of networks using a pseudolikelihood measure. PhyloNet summarizes the results of the various analyzes and generates phylogenetic networks in the extended Newick format that is readily viewable by existing visualization software.

Asunto(s)

Evolución Molecular , Filogenia , Programas Informáticos , Teorema de Bayes , Hibridación Genética , Alineación de Secuencia

Bayesian inference of phylogenetic networks from bi-allelic genetic markers.

Zhu, Jiafan; Wen, Dingqiao; Yu, Yun; Meudt, Heidi M; Nakhleh, Luay.

PLoS Comput Biol ; 14(1): e1005932, 2018 01.

Artículo en Inglés | MEDLINE | ID: mdl-29320496

RESUMEN

Phylogenetic networks are rooted, directed, acyclic graphs that model reticulate evolutionary histories. Recently, statistical methods were devised for inferring such networks from either gene tree estimates or the sequence alignments of multiple unlinked loci. Bi-allelic markers, most notably single nucleotide polymorphisms (SNPs) and amplified fragment length polymorphisms (AFLPs), provide a powerful source of genome-wide data. In a recent paper, a method called SNAPP was introduced for statistical inference of species trees from unlinked bi-allelic markers. The generative process assumed by the method combined both a model of evolution for the bi-allelic markers, as well as the multispecies coalescent. A novel component of the method was a polynomial-time algorithm for exact computation of the likelihood of a fixed species tree via integration over all possible gene trees for a given marker. Here we report on a method for Bayesian inference of phylogenetic networks from bi-allelic markers. Our method significantly extends the algorithm for exact computation of phylogenetic network likelihood via integration over all possible gene trees. Unlike the case of species trees, the algorithm is no longer polynomial-time on all instances of phylogenetic networks. Furthermore, the method utilizes a reversible-jump MCMC technique to sample the posterior of phylogenetic networks given bi-allelic marker data. Our method has a very good performance in terms of accuracy and robustness as we demonstrate on simulated data, as well as a data set of multiple New Zealand species of the plant genus Ourisia (Plantaginaceae). We implemented the method in the publicly available, open-source PhyloNet software package.

Asunto(s)

Genes de Plantas , Marcadores Genéticos , Filogenia , Plantaginaceae/genética , Algoritmos , Alelos , Teorema de Bayes , Biología Computacional , Simulación por Computador , Funciones de Verosimilitud , Modelos Genéticos , Nueva Zelanda , Hibridación de Ácido Nucleico , Plantaginaceae/fisiología , Polimorfismo de Nucleótido Simple , Probabilidad , Recombinación Genética , Programas Informáticos

In the light of deep coalescence: revisiting trees within networks.

Zhu, Jiafan; Yu, Yun; Nakhleh, Luay.

BMC Bioinformatics ; 17(Suppl 14): 415, 2016 Nov 11.

Artículo en Inglés | MEDLINE | ID: mdl-28185572

RESUMEN

BACKGROUND: Phylogenetic networks model reticulate evolutionary histories. The last two decades have seen an increased interest in establishing mathematical results and developing computational methods for inferring and analyzing these networks. A salient concept underlying a great majority of these developments has been the notion that a network displays a set of trees and those trees can be used to infer, analyze, and study the network. RESULTS: In this paper, we show that in the presence of coalescence effects, the set of displayed trees is not sufficient to capture the network. We formally define the set of parental trees of a network and make three contributions based on this definition. First, we extend the notion of anomaly zone to phylogenetic networks and report on anomaly results for different networks. Second, we demonstrate how coalescence events could negatively affect the ability to infer a species tree that could be augmented into the correct network. Third, we demonstrate how a phylogenetic network can be viewed as a mixture model that lends itself to a novel inference approach via gene tree clustering. CONCLUSIONS: Our results demonstrate the limitations of focusing on the set of trees displayed by a network when analyzing and inferring the network. Our findings can form the basis for achieving higher accuracy when inferring phylogenetic networks and open up new venues for research in this area, including new problem formulations based on the notion of a network's parental trees.

Asunto(s)

Modelos Genéticos , Algoritmos , Evolución Biológica , Filogenia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA