Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38243701

RESUMEN

MOTIVATION: Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS: We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).


Asunto(s)
Algoritmos , Programas Informáticos , Filogenia , Teorema de Bayes , Codón , Nucleótidos
2.
PLoS Comput Biol ; 19(4): e1011084, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37099595

RESUMEN

Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. However, Bayesian phylogenetics faces the challenging computational problem of moving throughout the high-dimensional space of trees. Fortunately, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences as points in hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference in this space. The posterior probability of an embedding is computed by decoding a neighbour-joining tree from the embedding locations of the sequences. We empirically demonstrate the fidelity of this method on eight data sets. We systematically investigated the effect of embedding dimension and hyperbolic curvature on the performance in these data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree over a range of curvatures and dimensions. We systematically investigated the effects of the embedding space's curvature and dimension on the Markov Chain's performance, demonstrating the suitability of hyperbolic space for phylogenetic inference.


Asunto(s)
Filogenia , Teorema de Bayes , Algoritmos
3.
Syst Biol ; 69(2): 280-293, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31504997

RESUMEN

Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.


Asunto(s)
Clasificación/métodos , Filogenia , Algoritmos , Teorema de Bayes , Funciones de Verosimilitud
4.
Syst Biol ; 69(2): 209-220, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31504998

RESUMEN

The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators.


Asunto(s)
Clasificación/métodos , Filogenia , Biología Computacional/normas , Funciones de Verosimilitud
5.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30958812

RESUMEN

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Asunto(s)
Teorema de Bayes , Evolución Biológica , Filogenia , Programas Informáticos , Animales , Biología Computacional , Simulación por Computador , Evolución Molecular , Humanos , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo
6.
Mol Biol Evol ; 35(1): 242-246, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-29029199

RESUMEN

Phylogenetics has seen a steady increase in data set size and substitution model complexity, which require increasing amounts of computational power to compute likelihoods. This motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this article, we develop an approximation to the 1D likelihood function as parametrized by a single branch length. Our method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts.


Asunto(s)
Funciones de Verosimilitud , Filogenia , Análisis de Secuencia de ADN/métodos , Algoritmos , Teorema de Bayes , Evolución Molecular , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo , Análisis de Secuencia de ADN/estadística & datos numéricos
7.
Plasmid ; 102: 56-61, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30885788

RESUMEN

IncHI2-ST1 plasmids play an important role in co-mobilizing genes conferring resistance to critically important antibiotics and heavy metals. Here we present the identification and analysis of IncHI2-ST1 plasmid pSPRC-Echo1, isolated from an Enterobacter hormaechei strain from a Sydney hospital, which predates other multi-drug resistant IncHI2-ST1 plasmids reported from Australia. Our time-resolved phylogeny analysis indicates pSPRC-Echo1 represents a new lineage of IncHI2-ST1 plasmids and show how their diversification relates to the era of antibiotics.


Asunto(s)
Filogenia , Plásmidos/genética , Mapeo Cromosómico , Elementos Transponibles de ADN/genética , Factores de Tiempo
8.
Syst Biol ; 67(3): 490-502, 2018 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-29186587

RESUMEN

Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop "guided" proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.


Asunto(s)
Clasificación/métodos , Modelos Biológicos , Filogenia , Algoritmos , Teorema de Bayes , Internet , Método de Montecarlo
9.
BMC Evol Biol ; 17(1): 118, 2017 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-28545432

RESUMEN

BACKGROUND: Wild birds are the major reservoir hosts for influenza A viruses (AIVs) and have been implicated in the emergence of pandemic events in livestock and human populations. Understanding how AIVs spread within and across continents is therefore critical to the development of successful strategies to manage and reduce the impact of influenza outbreaks. In North America many bird species undergo seasonal migratory movements along a North-South axis, thereby providing opportunities for viruses to spread over long distances. However, the role played by such avian flyways in shaping the genetic structure of AIV populations remains uncertain. RESULTS: To assess the relative contribution of bird migration along flyways to the genetic structure of AIV we performed a large-scale phylogeographic study of viruses sampled in the USA and Canada, involving the analysis of 3805 to 4505 sequences from 36 to 38 geographic localities depending on the gene segment data set. To assist in this we developed a maximum likelihood-based genetic algorithm to explore a wide range of complex spatial models, depicting a more complete picture of the migration network than determined previously. CONCLUSIONS: Based on phylogenies estimated from nucleotide sequence data sets, our results show that AIV migration rates are significantly higher within than between flyways, indicating that the migratory patterns of birds play a key role in viral dispersal. These findings provide valuable insights into the evolution, maintenance and transmission of AIVs, in turn allowing the development of improved programs for surveillance and risk assessment.


Asunto(s)
Migración Animal , Aves/virología , Gripe Aviar/virología , Animales , Animales Salvajes , Canadá/epidemiología , Brotes de Enfermedades , Humanos , Virus de la Influenza A/genética , Gripe Aviar/epidemiología , Funciones de Verosimilitud , Filogenia , Filogeografía , Estados Unidos/epidemiología
10.
J Virol ; 89(18): 9689-92, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26136576

RESUMEN

Influenza B virus causes significant disease but remains understudied in tropical regions. We sequenced 72 influenza B viruses collected in Kuala Lumpur, Malaysia, from 1995 to 2008. The predominant circulating lineage (Victoria or Yamagata) changed every 1 to 3 years, and these shifts were associated with increased incidence of influenza B. We also found poor lineage matches with recommended influenza virus vaccine strains. While most influenza B virus lineages in Malaysia were short-lived, one circulated for 3 to 4 years.


Asunto(s)
Evolución Molecular , Virus de la Influenza B/genética , Gripe Humana/genética , Secuencia de Bases , Femenino , Humanos , Gripe Humana/epidemiología , Malasia/epidemiología , Masculino , Datos de Secuencia Molecular
11.
BMC Evol Biol ; 15: 120, 2015 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-26111936

RESUMEN

BACKGROUND: Wild birds are the major reservoir hosts for influenza A viruses, occasionally transmitting to other species such as domesticated poultry. Despite an abundance of genomic data from avian influenza virus (AIV), little is known about whether AIV evolves differently in wild birds and poultry, although this is critical to revealing the dynamics and time-scale of viral evolution. In particular, because environmental (water-borne) transmission is more common in wild birds, which may reduce the number of replications per unit time, it is possible that evolutionary rates are systematically lower in wild birds than in poultry. RESULTS: We estimated rates of nucleotide substitution in two AIV subtypes that are strongly associated with infections in wild birds - H4 and H6 - and compared these to rates in the H5N1 subtype that has circulated in poultry for almost two decades. Our analyses of three internal genes confirm that H4 and H6 viruses are evolving significantly more slowly than H5N1 viruses, suggesting that evolutionary rates of AIV are reduced in wild birds. This result was verified by the analysis of a poultry-associated H6 lineage that exhibited a markedly higher substitution rate than those H6 viruses circulating in wild birds. Interestingly, we also observed a significant difference in evolutionary rate between H4 and H6, despite frequent reassortment rate among them. CONCLUSIONS: AIV experiences markedly different evolutionary dynamics between wild birds and poultry. These results suggest that rate heterogeneity among viral subtypes and ecological groupings should be taken into account when estimating evolutionary rates and divergence times.


Asunto(s)
Subtipo H5N1 del Virus de la Influenza A/genética , Virus de la Influenza A/genética , Gripe Aviar/virología , Enfermedades de las Aves de Corral/virología , Animales , Animales Salvajes , Evolución Biológica , Aves , Virus de la Influenza A/clasificación , Aves de Corral
12.
PLoS Pathog ; 9(8): e1003570, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24009503

RESUMEN

Wild birds have been implicated in the emergence of human and livestock influenza. The successful prediction of viral spread and disease emergence, as well as formulation of preparedness plans have been hampered by a critical lack of knowledge of viral movements between different host populations. The patterns of viral spread and subsequent risk posed by wild bird viruses therefore remain unpredictable. Here we analyze genomic data, including 287 newly sequenced avian influenza A virus (AIV) samples isolated over a 34-year period of continuous systematic surveillance of North American migratory birds. We use a Bayesian statistical framework to test hypotheses of viral migration, population structure and patterns of genetic reassortment. Our results reveal that despite the high prevalence of Charadriiformes infected in Delaware Bay this host population does not appear to significantly contribute to the North American AIV diversity sampled in Anseriformes. In contrast, influenza viruses sampled from Anseriformes in Alberta are representative of the AIV diversity circulating in North American Anseriformes. While AIV may be restricted to specific migratory flyways over short time frames, our large-scale analysis showed that the long-term persistence of AIV was independent of bird flyways with migration between populations throughout North America. Analysis of long-term surveillance data provides vital insights to develop appropriately informed predictive models critical for pandemic preparedness and livestock protection.


Asunto(s)
Migración Animal , Charadriiformes/virología , Virus de la Influenza A , Gripe Aviar/epidemiología , Modelos Biológicos , Animales , Humanos , Gripe Aviar/transmisión , América del Norte/epidemiología
13.
BMC Evol Biol ; 14: 163, 2014 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-25055743

RESUMEN

BACKGROUND: Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. RESULTS: We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. CONCLUSIONS: These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.


Asunto(s)
Funciones de Verosimilitud , Modelos Genéticos , Filogenia , Teorema de Bayes , Evolución Biológica , Simulación por Computador , Evolución Molecular , Virus de la Influenza B/genética
14.
J Virol ; 87(18): 10182-9, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23864623

RESUMEN

Influenza A H10N7 virus with a hemagglutinin gene of North American origin was detected in Australian chickens and poultry abattoir workers in New South Wales, Australia, in 2010 and in chickens in Queensland, Australia, on a mixed chicken and domestic duck farm in 2012. We investigated their genomic origins by sequencing full and partial genomes of H10 viruses isolated from wild aquatic birds and poultry in Australia and analyzed them with all available avian influenza virus sequences from Oceania and representative viruses from North America and Eurasia. Our analysis showed that the H10N7 viruses isolated from poultry were similar to those that have been circulating since 2009 in Australian aquatic birds and that their initial transmission into Australia occurred during 2007 and 2008. The H10 viruses that appear to have developed endemicity in Australian wild aquatic birds were derived from several viruses circulating in waterfowl along various flyways. Their hemagglutinin gene was derived from aquatic birds in the western states of the United States, whereas the neuraminidase was closely related to that from viruses previously detected in waterfowl in Japan. The remaining genes were derived from Eurasian avian influenza virus lineages. Our analysis of virological data spanning 40 years in Oceania indicates that the long-term evolutionary dynamics of avian influenza viruses in Australia may be determined by climatic changes. The introduction and long-term persistence of avian influenza virus lineages were observed during periods with increased rainfall, whereas bottlenecks and extinction were observed during phases of widespread decreases in rainfall. These results extend our understanding of factors affecting the dynamics of avian influenza and provide important considerations for surveillance and disease control strategies.


Asunto(s)
Virus de la Influenza A/clasificación , Virus de la Influenza A/aislamiento & purificación , Gripe Aviar/epidemiología , Gripe Aviar/virología , Animales , Australia/epidemiología , Aves , Análisis por Conglomerados , Evolución Molecular , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Virus de la Influenza A/genética , Epidemiología Molecular , Datos de Secuencia Molecular , Filogenia , Aves de Corral , ARN Viral/genética , Análisis de Secuencia de ADN
15.
Emerg Infect Dis ; 19(9)2013.
Artículo en Inglés | MEDLINE | ID: mdl-23968540

RESUMEN

Human infection with avian influenza A(H9N2) virus was identified in Bangladesh in 2011. Surveillance for influenza viruses in apparently healthy poultry in live-bird markets in Bangladesh during 2008-2011 showed that subtype H9N2 viruses are isolated year-round, whereas highly pathogenic subtype H5N1 viruses are co-isolated with subtype H9N2 primarily during the winter months. Phylogenetic analysis of the subtype H9N2 viruses showed that they are reassortants possessing 3 gene segments related to subtype H7N3; the remaining gene segments were from the subtype H9N2 G1 clade. We detected no reassortment with subtype H5N1 viruses. Serologic analyses of subtype H9N2 viruses from chickens revealed antigenic conservation, whereas analyses of viruses from quail showed antigenic drift. Molecular analysis showed that multiple mammalian-specific mutations have become fixed in the subtype H9N2 viruses, including changes in the hemagglutinin, matrix, and polymerase proteins. Our results indicate that these viruses could mutate to be transmissible from birds to mammals, including humans.


Asunto(s)
Subtipo H9N2 del Virus de la Influenza A/genética , Subtipo H9N2 del Virus de la Influenza A/inmunología , Gripe Aviar/epidemiología , Gripe Humana/epidemiología , Animales , Antígenos Virales/inmunología , Bangladesh/epidemiología , Pollos , Genes Virales , Humanos , Subtipo H9N2 del Virus de la Influenza A/clasificación , Gripe Aviar/virología , Gripe Humana/virología , Datos de Secuencia Molecular , Filogenia , Prevalencia , Codorniz
16.
Mol Biol Evol ; 29(2): 451-6, 2012 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22045998

RESUMEN

Rate heterogeneity among lineages is a common feature of molecular evolution, and it has long impeded our ability to accurately estimate the age of evolutionary divergence events. The development of relaxed molecular clocks, which model variable substitution rates among lineages, was intended to rectify this problem. Major subtypes of pandemic HIV-1 group M are thought to exemplify closely related lineages with different substitution rates. Here, we report that inferring the time of most recent common ancestor of all these subtypes in a single phylogeny under a single (relaxed) molecular clock produces significantly different dates for many of the subtypes than does analysis of each subtype on its own. We explore various methods to ameliorate this problem. We conclude that current molecular dating methods are inadequate for dealing with this type of substitution rate variation in HIV-1. Through simulation, we show that heterotachy causes root ages to be overestimated.


Asunto(s)
Evolución Molecular , Variación Genética , VIH-1/clasificación , VIH-1/genética , Evolución Biológica , VIH-1/metabolismo , Humanos , Modelos Genéticos , Tasa de Mutación , Filogenia , Factores de Tiempo
17.
Proc Natl Acad Sci U S A ; 107(23): 10561-6, 2010 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-20498054

RESUMEN

We investigated two mitochondrial genes (cytb and cox1), one plastid gene (tufA), and one nuclear gene (ldh) in blood samples from 12 chimpanzees and two gorillas from Cameroon and one lemur from Madagascar. One gorilla sample is related to Plasmodium falciparum, thus confirming the recently reported presence in gorillas of this parasite. The second gorilla sample is more similar to the recently defined Plasmodium gaboni than to the P. falciparum-Plasmodium reichenowi clade, but distinct from both. Two chimpanzee samples are P. falciparum. A third sample is P. reichenowi and two others are P. gaboni. The other chimpanzee samples are different from those in the ape clade: two are Plasmodium ovale, and one is Plasmodium malariae. That is, we have found three human Plasmodium parasites in chimpanzees. Four chimpanzee samples were mixed: one species was P. reichenowi; the other species was P. gaboni in three samples and P. ovale in the fourth sample. The lemur sample, provisionally named Plasmodium malagasi, is a sister lineage to the large cluster of primate parasites that does not include P. falciparum or ape parasites, suggesting that the falciparum + ape parasite cluster (Laverania clade) may have evolved from a parasite present in hosts not ancestral to the primates. If malignant malaria were eradicated from human populations, chimpanzees, in addition to gorillas, might serve as a reservoir for P. falciparum.


Asunto(s)
Gorilla gorilla/parasitología , Lemur/parasitología , Pan troglodytes/parasitología , Plasmodium falciparum/genética , Animales , Datos de Secuencia Molecular , Filogenia
18.
Algorithms Mol Biol ; 18(1): 10, 2023 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-37525243

RESUMEN

Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter and the continuous parameters simultaneously. Variational phylogenetics is a promising alternative to MCMC, in which one fits an approximating distribution to the unnormalized phylogenetic posterior. Previous work fit this variational approximation using stochastic gradient descent, which is the canonical way of fitting general variational approximations. However, phylogenetic trees are special structures, giving opportunities for efficient computation. In this paper we describe a new algorithm that directly generalizes the Felsenstein pruning algorithm (a.k.a. sum-product algorithm) to compute a composite-like likelihood by marginalizing out ancestral states and subtrees simultaneously. We show the utility of this algorithm by rapidly making point estimates for branch lengths of a multi-tree phylogenetic model. These estimates accord with a long MCMC run and with estimates obtained using a variational method, but are much faster to obtain. Thus, although generalized pruning does not lead to a variational algorithm as such, we believe that it will form a useful starting point for variational inference.

19.
Genome Biol Evol ; 15(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37265233

RESUMEN

Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via "automatic differentiation" implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.


Asunto(s)
Aprendizaje Automático , Modelos Estadísticos , Filogenia , Funciones de Verosimilitud , Algoritmos
20.
ArXiv ; 2023 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-36945693

RESUMEN

The rapid growth in genomic pathogen data spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences $N$. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes $\mathcal{O}(N^2)$ operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in $\mathcal{O}(N)$, enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples: carnivores, dengue and yeast, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0, an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA