Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
Proc Natl Acad Sci U S A ; 121(3): e2318989121, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38215186

ABSTRACT

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Algorithms , COVID-19/epidemiology , Markov Chains
2.
Proc Natl Acad Sci U S A ; 120(7): e2208851120, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36757894

ABSTRACT

The birth-death model is commonly used to infer speciation and extinction rates by fitting the model to phylogenetic trees with exclusively extant taxa. Recently, it was demonstrated that speciation and extinction rates are not identifiable if the rates are allowed to vary freely over time. The group of birth-death models that have the same likelihood is called a congruence class, and there is no statistical evidence to favor one model over the other. This issue has led researchers to question if and what patterns can reliably be inferred from phylogenies of only extant taxa and whether time-variable birth-death models should be fitted at all. We explore the congruence class in the context of several empirical phylogenies as well as hypothetical scenarios. For these empirical phylogenies, we assume that we inferred the true congruence class. Thus, our conclusions apply to any empirical phylogeny for which we robustly inferred the true congruence class. When we summarize shared patterns in the congruence class, we show that strong directional trends in speciation and extinction rates are shared among most models. Therefore, we conclude that the inference of strong directional trends is robust. Conversely, estimates of constant rates or gentle slopes are not robust and must be treated with caution. Interestingly, the space of valid speciation rates is narrower and more limited in contrast to extinction rates, which are less constrained. These results provide further evidence and insights that speciation rates can be estimated more reliably than extinction rates.


Subject(s)
Extinction, Biological , Parturition , Female , Pregnancy , Humans , Phylogeny , Probability , Genetic Speciation
3.
Syst Biol ; 2024 May 07.
Article in English | MEDLINE | ID: mdl-38712512

ABSTRACT

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

4.
PLoS Comput Biol ; 20(3): e1011640, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38551979

ABSTRACT

Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.


Subject(s)
Epidemics , Hemorrhagic Fever, Ebola , Influenza, Human , Humans , Influenza A Virus, H3N2 Subtype , Algorithms , Influenza, Human/epidemiology , Hemorrhagic Fever, Ebola/epidemiology , Monte Carlo Method
5.
Mol Biol Evol ; 38(10): 4603-4615, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34043795

ABSTRACT

Likelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pairwise epistasis and can be used in posterior predictive checks.


Subject(s)
Evolution, Molecular , Models, Genetic , Bayes Theorem , Computer Simulation , Epistasis, Genetic , Likelihood Functions , Phylogeny
6.
Syst Biol ; 69(2): 280-293, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31504997

ABSTRACT

Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, "likelihood" of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method "phylogenetic topographer" (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.


Subject(s)
Classification/methods , Phylogeny , Algorithms , Bayes Theorem , Likelihood Functions
7.
Syst Biol ; 69(2): 209-220, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31504998

ABSTRACT

The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators.


Subject(s)
Classification/methods , Phylogeny , Computational Biology/standards , Likelihood Functions
8.
PLoS Comput Biol ; 16(10): e1007999, 2020 10.
Article in English | MEDLINE | ID: mdl-33112848

ABSTRACT

Birth-death processes have given biologists a model-based framework to answer questions about changes in the birth and death rates of lineages in a phylogenetic tree. Therefore birth-death models are central to macroevolutionary as well as phylodynamic analyses. Early approaches to studying temporal variation in birth and death rates using birth-death models faced difficulties due to the restrictive choices of birth and death rate curves through time. Sufficiently flexible time-varying birth-death models are still lacking. We use a piecewise-constant birth-death model, combined with both Gaussian Markov random field (GMRF) and horseshoe Markov random field (HSMRF) prior distributions, to approximate arbitrary changes in birth rate through time. We implement these models in the widely used statistical phylogenetic software platform RevBayes, allowing us to jointly estimate birth-death process parameters, phylogeny, and nuisance parameters in a Bayesian framework. We test both GMRF-based and HSMRF-based models on a variety of simulated diversification scenarios, and then apply them to both a macroevolutionary and an epidemiological dataset. We find that both models are capable of inferring variable birth rates and correctly rejecting variable models in favor of effectively constant models. In general the HSMRF-based model has higher precision than its GMRF counterpart, with little to no loss of accuracy. Applied to a macroevolutionary dataset of the Australian gecko family Pygopodidae (where birth rates are interpretable as speciation rates), the GMRF-based model detects a slow decrease whereas the HSMRF-based model detects a rapid speciation-rate decrease in the last 12 million years. Applied to an infectious disease phylodynamic dataset of sequences from HIV subtype A in Russia and Ukraine (where birth rates are interpretable as the rate of accumulation of new infections), our models detect a strongly elevated rate of infection in the 1990s.


Subject(s)
Birth Rate , Models, Biological , Models, Statistical , Mortality , Algorithms , Animals , Bayes Theorem , Biological Evolution , Computational Biology , Computer Simulation , Lizards/physiology
9.
Biometrics ; 76(3): 677-690, 2020 09.
Article in English | MEDLINE | ID: mdl-32277713

ABSTRACT

Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.


Subject(s)
Genetics, Population , Models, Statistical , Bayes Theorem , Population Density , Population Dynamics
10.
Nature ; 472(7344): 471-5, 2011 Apr 28.
Article in English | MEDLINE | ID: mdl-21525931

ABSTRACT

Innate immune cells must be able to distinguish between direct binding to microbes and detection of components shed from the surface of microbes located at a distance. Dectin-1 (also known as CLEC7A) is a pattern-recognition receptor expressed by myeloid phagocytes (macrophages, dendritic cells and neutrophils) that detects ß-glucans in fungal cell walls and triggers direct cellular antimicrobial activity, including phagocytosis and production of reactive oxygen species (ROS). In contrast to inflammatory responses stimulated upon detection of soluble ligands by other pattern-recognition receptors, such as Toll-like receptors (TLRs), these responses are only useful when a cell comes into direct contact with a microbe and must not be spuriously activated by soluble stimuli. In this study we show that, despite its ability to bind both soluble and particulate ß-glucan polymers, Dectin-1 signalling is only activated by particulate ß-glucans, which cluster the receptor in synapse-like structures from which regulatory tyrosine phosphatases CD45 and CD148 (also known as PTPRC and PTPRJ, respectively) are excluded (Supplementary Fig. 1). The 'phagocytic synapse' now provides a model mechanism by which innate immune receptors can distinguish direct microbial contact from detection of microbes at a distance, thereby initiating direct cellular antimicrobial responses only when they are required.


Subject(s)
Immunity, Innate/immunology , Immunological Synapses/immunology , Membrane Proteins/immunology , Models, Immunological , Nerve Tissue Proteins/immunology , Phagocytosis/immunology , Animals , Cell Wall/chemistry , Cell Wall/immunology , Cells, Cultured , Humans , Lectins, C-Type , Leukocyte Common Antigens/deficiency , Leukocyte Common Antigens/metabolism , Macrophages/immunology , Membrane Proteins/deficiency , Membrane Proteins/genetics , Mice , Nerve Tissue Proteins/deficiency , Nerve Tissue Proteins/genetics , Reactive Oxygen Species/metabolism , Receptor-Like Protein Tyrosine Phosphatases, Class 3/deficiency , Receptor-Like Protein Tyrosine Phosphatases, Class 3/metabolism , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae/immunology , Signal Transduction/immunology , Solubility , beta-Glucans/chemistry , beta-Glucans/immunology
11.
Biopolymers ; 103(12): 665-74, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26015027

ABSTRACT

Scleroglucan is a ß-(1,3)-glucan which is highly branched at the 6-position with a single glucose residue. Acid hydrolysis of a high molecular weight scleroglucan gave a medium molecular weight, freely soluble material. Linkage analysis by the partially methylated alditol acetate method showed that the solubilized material had 30% branching. When the material was subjected to partial Smith degradations, the percent branching was reduced accordingly to 12% or 17%. After the percent branching was reduced, the average molecular weight of the samples increased considerably, indicating the assembly of higher ordered aggregate structures. An aggregate number distribution analysis was applied to confirm the higher aggregated structures. These aggregated structures gave the material significantly enhanced activity in an in vitro oxidative burst assay compared to the highly branched material.


Subject(s)
Biological Assay , Glucans/chemistry , Respiratory Burst , Cell Aggregation , Female , Humans , Leukocytes, Mononuclear/chemistry , Male , Molecular Structure , Oxidation-Reduction
13.
Bayesian Anal ; 19(2): 565-593, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38665694

ABSTRACT

Bayesian inference is a popular and widely-used approach to infer phylogenies (evolutionary trees). However, despite decades of widespread application, it remains difficult to judge how well a given Bayesian Markov chain Monte Carlo (MCMC) run explores the space of phylogenetic trees. In this paper, we investigate the Monte Carlo error of phylogenies, focusing on high-dimensional summaries of the posterior distribution, including variability in estimated edge/branch (known in phylogenetics as "split") probabilities and tree probabilities, and variability in the estimated summary tree. Specifically, we ask if there is any measure of effective sample size (ESS) applicable to phylogenetic trees which is capable of capturing the Monte Carlo error of these three summary measures. We find that there are some ESS measures capable of capturing the error inherent in using MCMC samples to approximate the posterior distributions on phylogenies. We term these tree ESS measures, and identify a set of three which are useful in practice for assessing the Monte Carlo error. Lastly, we present visualization tools that can improve comparisons between multiple independent MCMC runs by accounting for the Monte Carlo error present in each chain. Our results indicate that common post-MCMC workflows are insufficient to capture the inherent Monte Carlo error of the tree, and highlight the need for both within-chain mixing and between-chain convergence assessments.

14.
Animals (Basel) ; 14(6)2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38539994

ABSTRACT

Creative or novel behaviors in bottlenose dolphins (Tursiops truncatus) can be indicators of flexible thinking and problem solving. Over 50 years ago, two rough-tooth dolphins demonstrated creative novel behaviors acquired through reinforcement training in human care. Since this novel training, a variety of species have been trained to respond to this conceptual cue. The current study assessed the creativity of 12 bottlenose dolphins (5 females, 7 males) housed at the Roatan Institute for Marine Sciences (RIMS) in Roatan, Honduras. Individual differences were found across four constructs measured for creativity: fluency, flexibility, elaboration, and originality. Variability in performance occurred across test sessions. Animals with less experience with this task performed fewer "innovative" behaviors as compared to more experienced animals. Despite errors, dolphins continued to attempt the task during test sessions, suggesting the concept of "innovate" was intrinsically rewarding and cognitively engaging. This task may be utilized across species to promote the comparative study of innovative or creative behavior as well as to promote cognitive welfare.

15.
medRxiv ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38947021

ABSTRACT

Nigeria and Cameroon reported their first mpox cases in over three decades in 2017 and 2018 respectively. The outbreak in Nigeria is recognised as an ongoing human epidemic. However, owing to sparse surveillance and genomic data, it is not known whether the increase in cases in Cameroon is driven by zoonotic or sustained human transmission. Notably, the frequency of zoonotic transmission remains unknown in both Cameroon and Nigeria. To address these uncertainties, we investigated the zoonotic transmission dynamics of the mpox virus (MPXV) in Cameroon and Nigeria, with a particular focus on the border regions. We show that in these regions mpox cases are still driven by zoonotic transmission of a newly identified Clade IIb.1. We identify two distinct zoonotic lineages that circulate across the Nigeria-Cameroon border, with evidence of recent and historic cross border dissemination. Our findings support that the complex cross-border forest ecosystems likely hosts shared animal populations that drive cross-border viral spread, which is likely where extant Clade IIb originated. We identify that the closest zoonotic outgroup to the human epidemic circulated in southern Nigeria in October 2013. We also show that the zoonotic precursor lineage circulated in an animal population in southern Nigeria for more than 45 years. This supports findings that southern Nigeria was the origin of the human epidemic. Our study highlights the ongoing MPXV zoonotic transmission in Cameroon and Nigeria, underscoring the continuous risk of MPXV (re)emergence.

16.
medRxiv ; 2024 Jun 19.
Article in English | MEDLINE | ID: mdl-38947052

ABSTRACT

Five years before the 2022-2023 global mpox outbreak Nigeria reported its first cases in nearly 40 years, with the ongoing epidemic since driven by sustained human-to-human transmission. However, limited genomic data has left questions about the timing and origin of the mpox virus' (MPXV) emergence. Here we generated 112 MPXV genomes from Nigeria from 2021-2023. We identify the closest zoonotic outgroup to the human epidemic in southern Nigeria, and estimate that the lineage transmitting from human-to-human emerged around July 2014, circulating cryptically until detected in September 2017. The epidemic originated in Southern Nigeria, particularly Rivers State, which also acted as a persistent and dominant source of viral dissemination to other states. We show that APOBEC3 activity increased MPXV's evolutionary rate twenty-fold during human-to-human transmission. We also show how Delphy, a tool for near-real-time Bayesian phylogenetics, can aid rapid outbreak analytics. Our study sheds light on MPXV's establishment in West Africa before the 2022-2023 global outbreak and highlights the need for improved pathogen surveillance and response.

17.
bioRxiv ; 2023 Nov 02.
Article in English | MEDLINE | ID: mdl-37961423

ABSTRACT

Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.

18.
Annu Rev Stat Appl ; 10: 353-377, 2023.
Article in English | MEDLINE | ID: mdl-38774036

ABSTRACT

Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.

19.
ArXiv ; 2023 Sep 25.
Article in English | MEDLINE | ID: mdl-36994154

ABSTRACT

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

20.
bioRxiv ; 2023 Jul 12.
Article in English | MEDLINE | ID: mdl-37502985

ABSTRACT

The emergence of SARS-CoV in 2002 and SARS-CoV-2 in 2019 has led to increased sampling of related sarbecoviruses circulating primarily in horseshoe bats. These viruses undergo frequent recombination and exhibit spatial structuring across Asia. Employing recombination-aware phylogenetic inference on bat sarbecoviruses, we find that the closest-inferred bat virus ancestors of SARS-CoV and SARS-CoV-2 existed just ~1-3 years prior to their emergence in humans. Phylogeographic analyses examining the movement of related sarbecoviruses demonstrate that they traveled at similar rates to their horseshoe bat hosts and have been circulating for thousands of years in Asia. The closest-inferred bat virus ancestor of SARS-CoV likely circulated in western China, and that of SARS-CoV-2 likely circulated in a region comprising southwest China and northern Laos, both a substantial distance from where they emerged. This distance and recency indicate that the direct ancestors of SARS-CoV and SARS-CoV-2 could not have reached their respective sites of emergence via the bat reservoir alone. Our recombination-aware dating and phylogeographic analyses reveal a more accurate inference of evolutionary history than performing only whole-genome or single gene analyses. These results can guide future sampling efforts and demonstrate that viral genomic fragments extremely closely related to SARS-CoV and SARS-CoV-2 were circulating in horseshoe bats, confirming their importance as the reservoir species for SARS viruses.

SELECTION OF CITATIONS
SEARCH DETAIL