RESUMO
Birth-death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth-death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.
Assuntos
Classificação , Filogenia , Classificação/métodos , Modelos Biológicos , Modelos Genéticos , Teorema de Bayes , Coeficiente de NatalidadeRESUMO
Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree search, using different data structures for easy manipulation (e.g., classes in object-oriented programming languages) and readable representation of trees (e.g., Newick-format strings). Here, we present Phylo2Vec, a parsimonious encoding for phylogenetic trees that serves as a unified approach for both manipulating and representing phylogenetic trees. Phylo2Vec maps any binary tree with n leaves to a unique integer vector of length n - 1. The advantages of Phylo2Vec are fourfold: i) fast tree sampling, (ii) compressed tree representation compared to a Newick string, iii) quick and unambiguous verification if two binary trees are identical topologically, and iv) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for maximum likelihood inference on five real-world datasets and show that a simple hill-climbing-based optimisation scheme can efficiently traverse the vastness of tree space from a random to an optimal tree.
RESUMO
Targeted vaccination policies can have a significant impact on the number of infections and deaths in an epidemic. However, optimising such policies is complicated, and the resultant solution may be difficult to explain to policy-makers and to the public. The key novelty of this paper is a derivation of the leading-order optimal vaccination policy under multi-group susceptible-infected-recovered dynamics in two different cases. Firstly, it considers the case of a small vulnerable subgroup in a population and shows that (in the asymptotic limit) it is optimal to vaccinate this group first, regardless of the properties of the other groups. Then, it considers the case of a small vaccine supply and transforms the optimal vaccination problem into a simple knapsack problem by linearising the final size equations. Both of these cases are then explored further through numerical examples, which show that these solutions are also directly useful for realistic parameter values. Moreover, the findings of this paper give some general principles for optimal vaccination policies which will help policy-makers and the public to understand the reasoning behind optimal vaccination programs in more generic cases.
Assuntos
Epidemias , Modelos Biológicos , Conceitos Matemáticos , Vacinação , Epidemias/prevenção & controle , PolíticasRESUMO
It is widely acknowledged that vaccinating at maximal effort in the face of an ongoing epidemic is the best strategy to minimise infections and deaths from the disease. Despite this, no one has proved that this is guaranteed to be true if the disease follows multi-group SIR (Susceptible-Infected-Recovered) dynamics. This paper provides a novel proof of this principle for the existing SIR framework, showing that the total number of deaths or infections from an epidemic is decreasing in vaccination effort. Furthermore, it presents a novel model for vaccination which assumes that vaccines assigned to a subgroup are distributed randomly to the unvaccinated population of that subgroup. It suggests, using COVID-19 data, that this more accurately captures vaccination dynamics than the model commonly found in the literature. However, as the novel model provides a strictly larger set of possible vaccination policies, the results presented in this paper hold for both models.
Assuntos
COVID-19 , Epidemias , Humanos , Modelos Biológicos , Conceitos Matemáticos , COVID-19/epidemiologia , COVID-19/prevenção & controle , Epidemias/prevenção & controle , Vacinação/métodosRESUMO
Renewal equations are a popular approach used in modelling the number of new infections, i.e., incidence, in an outbreak. We develop a stochastic model of an outbreak based on a time-varying variant of the Crump-Mode-Jagers branching process. This model accommodates a time-varying reproduction number and a time-varying distribution for the generation interval. We then derive renewal-like integral equations for incidence, cumulative incidence and prevalence under this model. We show that the equations for incidence and prevalence are consistent with the so-called back-calculation relationship. We analyse two particular cases of these integral equations, one that arises from a Bellman-Harris process and one that arises from an inhomogeneous Poisson process model of transmission. We also show that the incidence integral equations that arise from both of these specific models agree with the renewal equation used ubiquitously in infectious disease modelling. We present a numerical discretisation scheme to solve these equations, and use this scheme to estimate rates of transmission from serological prevalence of SARS-CoV-2 in the UK and historical incidence data on Influenza, Measles, SARS and Smallpox.
Assuntos
COVID-19 , Doenças Transmissíveis , Humanos , Incidência , SARS-CoV-2 , COVID-19/epidemiologia , Prevalência , Doenças Transmissíveis/epidemiologiaRESUMO
Phylogenetics is now fundamental in life sciences, providing insights into the earliest branches of life and the origins and spread of epidemics. However, finding suitable phylogenies from the vast space of possible trees remains challenging. To address this problem, for the first time, we perform both tree exploration and inference in a continuous space where the computation of gradients is possible. This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima. Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases. The approach is effective in cases of empirical data with negligible amounts of data, which we demonstrate on the phylogeny of jawed vertebrates. Indeed, only a few genes with an ultrametric signal were generally sufficient for resolving the major lineages of vertebrates. Optimization is possible via automatic differentiation and our method presents an effective way forward for exploring the most difficult, data-deficient phylogenetic questions.
Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Simulação por ComputadorRESUMO
Uncertainty can be classified as either aleatoric (intrinsic randomness) or epistemic (imperfect knowledge of parameters). The majority of frameworks assessing infectious disease risk consider only epistemic uncertainty. We only ever observe a single epidemic, and therefore cannot empirically determine aleatoric uncertainty. Here, we characterise both epistemic and aleatoric uncertainty using a time-varying general branching process. Our framework explicitly decomposes aleatoric variance into mechanistic components, quantifying the contribution to uncertainty produced by each factor in the epidemic process, and how these contributions vary over time. The aleatoric variance of an outbreak is itself a renewal equation where past variance affects future variance. We find that, superspreading is not necessary for substantial uncertainty, and profound variation in outbreak size can occur even without overdispersion in the offspring distribution (i.e. the distribution of the number of secondary infections an infected person produces). Aleatoric forecasting uncertainty grows dynamically and rapidly, and so forecasting using only epistemic uncertainty is a significant underestimate. Therefore, failure to account for aleatoric uncertainty will ensure that policymakers are misled about the substantially higher true extent of potential risk. We demonstrate our method, and the extent to which potential risk is underestimated, using two historical examples.
RESUMO
First developed in 1982, the double Poisson model, where goals scored by each team are assumed to be Poisson distributed with a mean depending on attacking and defensive strengths, remains a popular choice for predicting football scores, despite the multitude of newer methods that have been developed. This paper examines the pre-tournament predictions made using this model for the Euro 2020 football tournament. These predictions won the Royal Statistical Society's prediction competition, demonstrating that even this simple model can produce high-quality results. Moreover, the paper also presents a range of novel analytic results which exactly quantify the conditions for the existence and uniqueness of the solution to the equations for the model parameters. After deriving these results, it provides a novel examination of a potential problem with the model-the over-weighting of the results of weaker teams-and illustrates the effectiveness of ignoring results against the weakest opposition. It also compares the predictions with the actual results of Euro 2020, showing that they were extremely accurate in predicting the number of goals scored. Finally, it considers the choice of start date for the dataset, and illustrates that the choice made by the authors (which was to start the dataset just after the previous major international tournament) was close to optimal, at least in this case. The findings of this study give a better understanding of the mathematical behaviour of the double Poisson model and provide evidence for its effectiveness as a match prediction tool.
Assuntos
Desempenho Atlético , Futebol Americano , FutebolRESUMO
The infrared solar spectrum contains a wealth of physical data about our Sun, and is explored using modern detectors and technology with new ground-based solar telescopes. The scientific motivation behind exploring these wavelengths is presented, along with a brief look at the rich history of observations here. Several avenues of solar physics research exploiting and benefiting from observations at infrared wavelengths from roughly 1000 nm to 12 400 nm are discussed, and the instrument and detector technology driving this research is briefly summarized. Finally, goals for future work at infrared wavelengths are presented in conjunction with ground and space-based observations.