Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
BMC Ecol Evol ; 24(1): 11, 2024 Jan 20.
Article in English | MEDLINE | ID: mdl-38245667

ABSTRACT

Abrupt environmental changes can lead to evolutionary shifts in trait evolution. Identifying these shifts is an important step in understanding the evolutionary history of phenotypes. The detection performances of different methods are influenced by many factors, including different numbers of shifts, shift sizes, where a shift occurs on a tree, and the types of phylogenetic structure. Furthermore, the model assumptions are oversimplified, so are likely to be violated in real data, which could cause the methods to fail. We perform simulations to assess the effect of these factors on the performance of shift detection methods. To make the comparisons more complete, we also propose an ensemble variable selection method (R package ELPASO) and compare it with existing methods (R packages [Formula: see text]1ou and PhylogeneticEM). The performances of methods are highly dependent on the selection criterion. [Formula: see text]1ou+pBIC is usually the most conservative method and it performs well when signal sizes are large. [Formula: see text]1ou+BIC is the least conservative method and it performs well when signal sizes are small. The ensemble method provides more balanced choices between those two methods. Moreover, the performances of all methods are heavily impacted by measurement error, tree reconstruction error and shifts in variance.


Subject(s)
Phylogeny , Phenotype
2.
Bull Math Biol ; 85(8): 71, 2023 06 19.
Article in English | MEDLINE | ID: mdl-37335437

ABSTRACT

Predicting the evolution of diseases is challenging, especially when the data availability is scarce and incomplete. The most popular tools for modelling and predicting infectious disease epidemics are compartmental models. They stratify the population into compartments according to health status and model the dynamics of these compartments using dynamical systems. However, these predefined systems may not capture the true dynamics of the epidemic due to the complexity of the disease transmission and human interactions. In order to overcome this drawback, we propose Sparsity and Delay Embedding based Forecasting (SPADE4) for predicting epidemics. SPADE4 predicts the future trajectory of an observable variable without the knowledge of the other variables or the underlying system. We use random features model with sparse regression to handle the data scarcity issue and employ Takens' delay embedding theorem to capture the nature of the underlying system from the observed variable. We show that our approach outperforms compartmental models when applied to both simulated and real data.


Subject(s)
Communicable Diseases , Epidemics , Humans , Models, Biological , Mathematical Concepts , Communicable Diseases/epidemiology , Forecasting
3.
J Math Biol ; 86(6): 88, 2023 05 04.
Article in English | MEDLINE | ID: mdl-37142869

ABSTRACT

Reconstructing the ancestral state of a group of species helps answer many important questions in evolutionary biology. Therefore, it is crucial to understand when we can estimate the ancestral state accurately. Previous works provide a necessary and sufficient condition, called the big bang condition, for the existence of an accurate reconstruction method under discrete trait evolution models and the Brownian motion model. In this paper, we extend this result to a wide range of continuous trait evolution models. In particular, we consider a general setting where continuous traits evolve along the tree according to stochastic processes that satisfy some regularity conditions. We verify these conditions for popular continuous trait evolution models including Ornstein-Uhlenbeck, reflected Brownian Motion, bounded Brownian Motion, and Cox-Ingersoll-Ross.


Subject(s)
Phylogeny , Stochastic Processes , Phenotype
4.
J Am Stat Assoc ; 117(538): 678-692, 2022.
Article in English | MEDLINE | ID: mdl-36060555

ABSTRACT

Comparative biologists are often interested in inferring covariation between multiple biological traits sampled across numerous related taxa. To properly study these relationships, we must control for the shared evolutionary history of the taxa to avoid spurious inference. An additional challenge arises as obtaining a full suite of measurements becomes increasingly difficult with increasing taxa. This generally necessitates data imputation or integration, and existing control techniques typically scale poorly as the number of taxa increases. We propose an inference technique that integrates out missing measurements analytically and scales linearly with the number of taxa by using a post-order traversal algorithm under a multivariate Brownian diffusion (MBD) model to characterize trait evolution. We further exploit this technique to extend the MBD model to account for sampling error or non-heritable residual variance. We test these methods to examine mammalian life history traits, prokaryotic genomic and phenotypic traits, and HIV infection traits. We find computational efficiency increases that top two orders-of-magnitude over current best practices. While we focus on the utility of this algorithm in phylogenetic comparative methods, our approach generalizes to solve long-standing challenges in computing the likelihood for matrix-normal and multivariate normal distributions with missing data at scale.

5.
Theor Popul Biol ; 148: 22-27, 2022 12.
Article in English | MEDLINE | ID: mdl-36167107

ABSTRACT

Ancestral state reconstruction is one of the most important tasks in evolutionary biology. Conditions under which we can reliably reconstruct the ancestral state have been studied for both discrete and continuous traits. However, the connection between these results is unclear, and it seems that each model needs different conditions. In this work, we provide a unifying theory on the consistency of ancestral state reconstruction for various types of trait evolution models. Notably, we show that for a sequence of nested trees with bounded heights, the necessary and sufficient conditions for the existence of a consistent ancestral state reconstruction method under discrete models, the Brownian motion model, and the threshold model are equivalent. When tree heights are unbounded, we provide a simple counter-example to show that this equivalence is no longer valid.


Subject(s)
Evolution, Molecular , Phylogeny , Phenotype
6.
J Math Biol ; 84(4): 21, 2022 02 21.
Article in English | MEDLINE | ID: mdl-35188616

ABSTRACT

Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of the ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered gets very large. Broader implications of the results are discussed.


Subject(s)
Evolution, Molecular , Bayes Theorem , Likelihood Functions , Phenotype , Phylogeny
7.
J Math Biol ; 80(4): 1119-1138, 2020 03.
Article in English | MEDLINE | ID: mdl-31754778

ABSTRACT

Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a 2-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value.


Subject(s)
Models, Genetic , Phylogeny , Animals , Biological Evolution , Genetic Speciation , Likelihood Functions , Markov Chains , Mathematical Concepts , Stochastic Processes
8.
Theor Popul Biol ; 126: 33-39, 2019 04.
Article in English | MEDLINE | ID: mdl-30641072

ABSTRACT

We consider the ancestral state reconstruction problem where we need to infer phenotypes of ancestors using observations from present-day species. For this problem, we propose a multi-task learning method that uses regularized maximum likelihood to estimate the ancestral states of various traits simultaneously. We then show both theoretically and by simulation that this method improves the estimates of the ancestral states compared to the maximum likelihood method. The result also indicates that for the problem of ancestral state reconstruction under the Brownian motion model, the maximum likelihood method can be improved.


Subject(s)
Likelihood Functions , Machine Learning , Models, Biological , Phenotype , Animals , Biological Evolution , Computer Simulation , Humans , Learning , Mammals , Phylogeny , Stochastic Processes
9.
Article in English | MEDLINE | ID: mdl-29942419

ABSTRACT

Many important stochastic counting models can be written as general birth-death processes (BDPs). BDPs are continuous-time Markov chains on the non-negative integers in which only jumps to adjacent states are allowed. BDPs can be used to easily parameterize a rich variety of probability distributions on the non-negative integers, and straightforward conditions guarantee that these distributions are proper. BDPs also provide a mechanistic interpretation - birth and death of actual particles or organisms - that has proven useful in evolution, ecology, physics, and chemistry. Although the theoretical properties of general BDPs are well understood, traditionally statistical work on BDPs has been limited to the simple linear (Kendall) process. Aside from a few simple cases, it remains impossible to find analytic expressions for the likelihood of a discretely-observed BDP, and computational difficulties have hindered development of tools for statistical inference. But the gap between BDP theory and practical methods for estimation has narrowed in recent years. There are now robust methods for evaluating likelihoods for realizations of BDPs: finite-time transition, first passage, equilibrium probabilities, and distributions of summary statistics that arise commonly in applications. Recent work has also exploited the connection between continuously- and discretely-observed BDPs to derive EM algorithms for maximum likelihood estimation. Likelihood-based inference for previously intractable BDPs is much easier than previously thought and regression approaches analogous to Poisson regression are straightforward to derive. In this review, we outline the basic mathematical theory for BDPs and demonstrate new tools for statistical inference using data from BDPs.

10.
J Math Biol ; 76(4): 911-944, 2018 03.
Article in English | MEDLINE | ID: mdl-28741177

ABSTRACT

Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth/birth-death process, a tractable bivariate extension of the birth-death process, where rates are allowed to be nonlinear. We develop an efficient algorithm to calculate its transition probabilities using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution.


Subject(s)
Models, Biological , Algorithms , Animals , Bayes Theorem , Communicable Diseases/epidemiology , Computational Biology , Computer Simulation , England/epidemiology , Epidemics/statistics & numerical data , History, 17th Century , Host-Parasite Interactions , Humans , Likelihood Functions , Markov Chains , Mathematical Concepts , Monte Carlo Method , Plague/epidemiology , Plague/history , Probability , Stochastic Processes
11.
J Math Biol ; 74(1-2): 355-385, 2017 01.
Article in English | MEDLINE | ID: mdl-27241727

ABSTRACT

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance (Ho and Ané, Ann Stat 41:957-981, 2013) recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory, we study the convergence rate of parameter estimation in the unbounded height case. For the mean of the process, we provide a necessary and sufficient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its convergence rate in terms of the growth of the tree. In particular we show that a loss of [Formula: see text]-consistency (i.e., the variance of the MLE becomes [Formula: see text], where n is the number of tips) occurs when the tree growth is larger than a threshold related to the phase transition of the reconstruction problem. For the covariance parameters, we give a novel, efficient estimation method which achieves [Formula: see text]-consistency under natural assumptions on the tree. Our theoretical results provide practical suggestions for the design of comparative data collection.


Subject(s)
Models, Biological , Phylogeny , Phenotype , Probability
12.
Evolution ; 70(6): 1354-63, 2016 06.
Article in English | MEDLINE | ID: mdl-27139421

ABSTRACT

Since Darwin, biologists have come to recognize that the theory of descent from common ancestry (CA) is very well supported by diverse lines of evidence. However, while the qualitative evidence is overwhelming, we also need formal methods for quantifying the evidential support for CA over the alternative hypothesis of separate ancestry (SA). In this article, we explore a diversity of statistical methods using data from the primates. We focus on two alternatives to CA, species SA (the separate origin of each named species) and family SA (the separate origin of each family). We implemented statistical tests based on morphological, molecular, and biogeographic data and developed two new methods: one that tests for phylogenetic autocorrelation while correcting for variation due to confounding ecological traits and a method for examining whether fossil taxa have fewer derived differences than living taxa. We overwhelmingly rejected both species and family SA with infinitesimal P values. We compare these results with those from two companion papers, which also found tremendously strong support for the CA of all primates, and discuss future directions and general philosophical issues that pertain to statistical testing of historical hypotheses such as CA.


Subject(s)
Biological Evolution , Classification/methods , Models, Genetic , Primates/classification , Animal Distribution , Animals , Fossils/anatomy & histology , Models, Statistical , Phylogeny , Primates/anatomy & histology , Primates/genetics , Primates/physiology
13.
Syst Biol ; 63(3): 397-408, 2014 May.
Article in English | MEDLINE | ID: mdl-24500037

ABSTRACT

We developed a linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees. Our algorithm solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products involving the inverse of V. Applications include Gaussian models such as Brownian motion-derived models like Pagel's lambda, kappa, delta, and the early-burst model; Ornstein-Uhlenbeck models to account for natural selection with possibly varying selection parameters along the tree; as well as non-Gaussian models such as phylogenetic logistic regression, phylogenetic Poisson regression, and phylogenetic generalized linear mixed models. Outside of phylogenetic regression, our algorithm also applies to phylogenetic principal component analysis, phylogenetic discriminant analysis or phylogenetic prediction. The computational gain opens up new avenues for complex models or extensive resampling procedures on very large trees. We identify the class of models that our algorithm can handle as all models whose covariance matrix has a 3-point structure. We further show that this structure uniquely identifies a rooted tree whose branch lengths parametrize the trait covariance matrix, which acts as a similarity matrix. The new algorithm is implemented in the R package phylolm, including functions for phylogenetic linear regression and phylogenetic logistic regression.


Subject(s)
Algorithms , Biological Evolution , Classification/methods , Software/standards , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...