RESUMO
Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.
Assuntos
Teorema de Bayes , Genética Populacional , Modelos Genéticos , Animais , Genética Populacional/métodos , Cavalos , Densidade Demográfica , Simulação por Computador , Software , DemografiaRESUMO
Poor fit between models of sequence or trait evolution and empirical data is known to cause biases and lead to spurious conclusions about evolutionary patterns and processes. Bayesian posterior prediction is a flexible and intuitive approach for detecting such cases of poor fit. However, the expected behavior of posterior predictive tests has never been characterized for evolutionary models, which is critical for their proper interpretation. Here, we show that the expected distribution of posterior predictive P-values is generally not uniform, in contrast to frequentist P-values used for hypothesis testing, and extreme posterior predictive P-values often provide more evidence of poor fit than typically appreciated. Posterior prediction assesses model adequacy under highly favorable circumstances, because the model is fitted to the data, which leads to expected distributions that are often concentrated around intermediate values. Nonuniform expected distributions of P-values do not pose a problem for the application of these tests, however, and posterior predictive P-values can be interpreted as the posterior probability that the fitted model would predict a dataset with a test statistic value as extreme as the value calculated from the observed data.
Assuntos
Modelos Estatísticos , Teorema de Bayes , ProbabilidadeRESUMO
Phylogenies are central to many research areas in biology and commonly estimated using likelihood-based methods. Unfortunately, any likelihood-based method, including Bayesian inference, can be restrictively slow for large datasets-with many taxa and/or many sites in the sequence alignment-or complex substitutions models. The primary limiting factor when using large datasets and/or complex models in probabilistic phylogenetic analyses is the likelihood calculation, which dominates the total computation time. To address this bottleneck, we incorporated the high-performance phylogenetic library BEAGLE into RevBayes, which enables multi-threading on multi-core CPUs and GPUs, as well as hardware specific vectorized instructions for faster likelihood calculations. Our new implementation of RevBayes+BEAGLE retains the flexibility and dynamic nature that users expect from vanilla RevBayes. In addition, we implemented native parallelization within RevBayes without an external library using the message passing interface (MPI); RevBayes+MPI. We evaluated our new implementation of RevBayes+BEAGLE using multi-threading on CPUs and 2 different powerful GPUs (NVidia Titan V and NVIDIA A100) against our native implementation of RevBayes+MPI. We found good improvements in speedup when multiple cores were used, with up to 20-fold speedup when using multiple CPU cores and over 90-fold speedup when using multiple GPU cores. The improvement depended on the data type used, DNA or amino acids, and the size of the alignment, but less on the size of the tree. We additionally investigated the cost of rescaling partial likelihoods to avoid numerical underflow and showed that unnecessarily frequent and inefficient rescaling can increase runtimes up to 4-fold. Finally, we presented and compared a new approach to store partial likelihoods on branches instead of nodes that can speed up computations up to 1.7 times but comes at twice the memory requirements.
Assuntos
Teorema de Bayes , Filogenia , Software , Classificação/métodos , Biologia Computacional/métodosRESUMO
The ideal approach to Bayesian phylogenetic inference is to estimate all parameters of interest jointly in a single hierarchical model. However, this is often not feasible in practice due to the high computational cost. Instead, phylogenetic pipelines generally consist of sequential analyses, whereby a single point estimate from a given analysis is used as input for the next analysis (e.g., a single multiple sequence alignment is used to estimate a gene tree). In this framework, uncertainty is not propagated from step to step, which can lead to inaccurate or spuriously confident results. Here, we formally develop and test a sequential inference approach for Bayesian phylogenetic inference, which uses importance sampling to generate observations for the next step of an analysis pipeline from the posterior distribution produced in the previous step. Our sequential inference approach presented here not only accounts for uncertainty between analysis steps but also allows for greater flexibility in software choice (and hence model availability) and can be computationally more efficient than the traditional joint inference approach when multiple models are being tested. We show that our sequential inference approach is identical in practice to the joint inference approach only if sufficient information in the data is present (a narrow posterior distribution) and/or sufficiently many important samples are used. Conversely, we show that the common practice of using a single point estimate can be biased, for example, a single phylogeny estimate can transform an unrooted phylogeny into a time-calibrated phylogeny. We demonstrate the theory of sequential Bayesian inference using both a toy example and an empirical case study of divergence-time estimation in insects using a relaxed clock model from transcriptome data. In the empirical example, we estimate 3 posterior distributions of branch lengths from the same data (DNA character matrix with a GTR+Γ+I substitution model, an amino acid data matrix with empirical substitution models, and an amino acid data matrix with the PhyloBayes CAT-GTR model). Finally, we apply 3 different node-calibration strategies and show that divergence time estimates are affected by both the data source and underlying substitution process to estimate branch lengths as well as the node-calibration strategies. Thus, our new sequential Bayesian phylogenetic inference provides the opportunity to efficiently test different approaches for divergence time estimation, including branch-length estimation from other software.
Assuntos
Teorema de Bayes , Classificação , Filogenia , Classificação/métodos , AnimaisRESUMO
Phylogenetic trees establish a historical context for the study of organismal form and function. Most phylogenetic trees are estimated using a model of evolution. For molecular data, modeling evolution is often based on biochemical observations about changes between character states. For example, there are 4 nucleotides, and we can make assumptions about the probability of transitions between them. By contrast, for morphological characters, we may not know a priori how many characters states there are per character, as both extant sampling and the fossil record may be highly incomplete, which leads to an observer bias. For a given character, the state space may be larger than what has been observed in the sample of taxa collected by the researcher. In this case, how many evolutionary rates are needed to even describe transitions between morphological character states may not be clear, potentially leading to model misspecification. To explore the impact of this model misspecification, we simulated character data with varying numbers of character states per character. We then used the data to estimate phylogenetic trees using models of evolution with the correct number of character states and an incorrect number of character states. The results of this study indicate that this observer bias may lead to phylogenetic error, particularly in the branch lengths of trees. If the state space is wrongly assumed to be too large, then we underestimate the branch lengths, and the opposite occurs when the state space is wrongly assumed to be too small.
Assuntos
Teorema de Bayes , Classificação , Filogenia , Classificação/métodos , Simulação por Computador , Modelos BiológicosRESUMO
Diversity of feeding mechanisms is a hallmark of reef fishes, but the history of this variation is not fully understood. Here, we explore the emergence and proliferation of a biting mode of feeding, which enables fishes to feed on attached benthic prey. We find that feeding modes other than suction, including biting, ram biting, and an intermediate group that uses both biting and suction, were nearly absent among the lineages of teleost fishes inhabiting reefs prior to the end-Cretaceous mass extinction, but benthic biting has rapidly increased in frequency since then, accounting for about 40% of reef species today. Further, we measured the impact of feeding mode on body shape diversification in reef fishes. We fit a model of multivariate character evolution to a dataset comprising three-dimensional body shape of 1,530 species of teleost reef fishes across 111 families. Dedicated biters have accumulated over half of the body shape variation that suction feeders have in just 18% of the evolutionary time by evolving body shape â¼1.7 times faster than suction feeders. As a possible response to the ecological and functional diversity of attached prey, biters have dynamically evolved both into shapes that resemble suction feeders as well as novel body forms characterized by lateral compression and small jaws. The ascendance of species that use biting mechanisms to feed on attached prey reshaped modern reef fish assemblages and has been a major contributor to their ecological and phenotypic diversification.
Assuntos
Evolução Biológica , Recifes de Corais , Extinção Biológica , Comportamento Alimentar , Peixes , Somatotipos , Animais , Peixes/anatomia & histologia , Peixes/fisiologia , MasculinoRESUMO
Color is among the most striking features of organisms, varying not only in spectral properties like hue and brightness, but also in where and how it is produced on the body. Different combinations of colors on a bird's body are important in both environmental and social contexts. Previous comparative studies have treated plumage patches individually or derived plumage complexity scores from color measurements across a bird's body. However, these approaches do not consider the multivariate nature of plumages (allowing for plumage to evolve as a whole) or account for interpatch distances. Here, we leverage a rich toolkit used in historical biogeography to assess color pattern evolution in a cosmopolitan radiation of birds, kingfishers (Aves: Alcedinidae). We demonstrate the utility of this approach and test hypotheses about the tempo and mode of color evolution in kingfishers. Our results highlight the importance of considering interpatch distances in understanding macroevolutionary trends in color diversity and demonstrate how historical biogeography models are a useful way to model plumage color pattern evolution. Furthermore, they show that distinct color mechanisms (pigments or structural colors) spread across the body in different ways and at different rates. Specifically, net rates are higher for structural colors than pigment-based colors. Together, our study suggests a role for both development and selection in driving extraordinary color pattern diversity in kingfishers. We anticipate this approach will be useful for modeling other complex phenotypes besides color, such as parasite evolution across the body.
Assuntos
Evolução Biológica , Aves/anatomia & histologia , Aves/classificação , Classificação/métodos , Modelos Biológicos , Pigmentação/fisiologia , AnimaisRESUMO
PREMISE OF THE STUDY: Inferring the evolution of characters in Isoëtes has been problematic, as these plants are morphologically conservative and yet highly variable and homoplasious within that conserved base morphology. However, molecular phylogenies have given us a valuable tool for testing hypotheses of character evolution within the genus, such as the hypothesis of ongoing morphological reductions. METHODS: We examined the reduction in lobe number on the underground trunk, or corm, by combining the most recent molecular phylogeny with morphological descriptions gathered from the literature and observations of living specimens. Ancestral character states were inferred using nonstationary evolutionary models, reversible-jump MCMC, and Bayesian model averaging. KEY RESULTS: Our results support the hypothesis of a directional reduction in lobe number in Isoëtes, with the best-supported model of character evolution being one of irreversible reduction. Furthermore, the most probable ancestral corm lobe number of extant Isoëtes is three, and a reduction to two lobes has occurred at least six times. CONCLUSIONS: From our results, we can infer that corm lobation, like many other traits in Isoëtes, shows a degree of homoplasy, and yet also shows ongoing evolutionary reduction.
Assuntos
Caules de Planta/anatomia & histologia , Plantas/anatomia & histologia , Teorema de Bayes , Evolução Biológica , FilogeniaRESUMO
BACKGROUND: Body size and echolocation call frequencies are related in bats. However, it is unclear if this allometry applies to the entire clade. Differences have been suggested between nasal and oral emitting bats, as well as between some taxonomic families. Additionally, the scaling of other echolocation parameters, such as bandwidth and call duration, needs further testing. Moreover, it would be also interesting to test whether changes in body size have been coupled with changes in these echolocation parameters throughout bat evolution. Here, we test the scaling of peak frequency, bandwidth, and call duration with body mass using phylogenetically informed analyses for 314 bat species. We specifically tested whether all these scaling patterns differ between nasal and oral emitting bats. Then, we applied recently developed Bayesian statistical techniques based on large-scale simulations to test for the existence of correlated evolution between body mass and echolocation. RESULTS: Our results showed that echolocation peak frequencies, bandwidth, and duration follow significant allometric patterns in both nasal and oral emitting bats. Changes in these traits seem to have been coupled across the laryngeal echolocation bats diversification. Scaling and correlated evolution analyses revealed that body mass is more related to peak frequency and call duration than to bandwidth. We exposed two non-exclusive kinds of mechanisms to explain the link between size and each of the echolocation parameters. CONCLUSIONS: The incorporation of Bayesian statistics based on large-scale simulations could be helpful for answering macroevolutionary patterns related to the coevolution of traits in bats and other taxonomic groups.
Assuntos
Quirópteros , Ecolocação , Humanos , Animais , Teorema de Bayes , Tamanho CorporalRESUMO
Identifying along which lineages shifts in diversification rates occur is a central goal of comparative phylogenetics; these shifts may coincide with key evolutionary events such as the development of novel morphological characters, the acquisition of adaptive traits, polyploidization or other structural genomic changes, or dispersal to a new habitat and subsequent increase in environmental niche space. However, while multiple methods now exist to estimate diversification rates and identify shifts using phylogenetic topologies, the appropriate use and accuracy of these methods are hotly debated. Here we test whether five Bayesian methods-Bayesian Analysis of Macroevolutionary Mixtures (BAMM), two implementations of the Lineage-Specific Birth-Death-Shift model (LSBDS and PESTO), the approximate Multi-Type Birth-Death model (MTBD; implemented in BEAST2), and the Cladogenetic Diversification Rate Shift model (ClaDS2)-produce comparable results. We apply each of these methods to a set of 65 empirical time-calibrated phylogenies and compare inferences of speciation rate, extinction rate, and net diversification rate. We find that the five methods often infer different speciation, extinction, and net-diversification rates. Consequently, these different estimates may lead to different interpretations of the macroevolutionary dynamics. The different estimates can be attributed to fundamental differences among the compared models. Therefore, the inference of shifts in diversification rates is strongly method dependent. We advise biologists to apply multiple methods to test the robustness of the conclusions or to carefully select the method based on the validity of the underlying model assumptions to their particular empirical system.
RESUMO
This chapter describes the usage of homologizer to phase gene copies into polyploid subgenomes. Allopolyploids contain multiple copies of each genetic locus, where each copy potentially belongs to a different subgenome with its own distinct evolutionary history. If gene copies across different loci are incorrectly phased (i.e., assigned to the wrong subgenome), then the bifurcating tree assumption underlying multilocus phylogenetic inference and related analyses will be violated, leading to unsound results. homologizer is a highly flexible Bayesian method that uses a phylogenetic framework to infer the posterior probabilities of the phasing of gene copies into subgenomes. We describe how to prepare input data and other considerations needed to perform homologizer analyses and demonstrate how to visualize and interpret the results. We first walk through a basic example using homologizer to phase gene copies into polyploid subgenomes and then demonstrate how homologizer can be used as a hypothesis-testing tool to detect non-homeologous sequences such as hidden paralogs or allelic variation through the tools of Bayesian model comparison.
Assuntos
Evolução Biológica , Poliploidia , Humanos , Filogenia , Teorema de Bayes , AlelosRESUMO
Chromosome number change is a driver of speciation in eukaryotic organisms. Carnivorous sundews in the plant genus Drosera L. exhibit single chromosome number variation both among and within species, especially in the Australian Drosera subg. Ergaleium D.C., potentially linked to atypical centromeres that span much of the length of the chromosomes. We critically reviewed the literature on chromosome counts in Drosera, verified the taxonomy and quality of the original counts, and reconstructed dated phylogenies. We used the BiChrom model to test whether rates of single chromosome number increase and decrease, and chromosome number doubling differed between D. subg. Ergaleium and the other subgenera and between self-compatible and self-incompatible lineages. The best model for chromosome evolution among subgenera had equal rates of chromosome number doubling but higher rates of single chromosome number change in D. subg. Ergaleium than in the other subgenera. Contrary to expectation, self-incompatible lineages had a significantly higher rate of single chromosome loss than self-compatible lineages. We found no evidence for an association between differences in single chromosome number changes and diploidization after polyploidy or centromere type. This study presents an exemplar for critically examining published cytological data and rigorously testing factors that may impact the rates of chromosome number evolution.
Assuntos
Drosera , Droseraceae , Drosera/genética , Droseraceae/genética , Austrália , Cromossomos , FilogeniaRESUMO
Functional decoupling of oral and pharyngeal jaws is widely considered to have expanded the ecological repertoire of cichlid fishes. But, the degree to which the evolution of these jaw systems is decoupled and whether decoupling has impacted trophic diversification remains unknown. Focusing on the large Neotropical radiation of cichlids, we ask whether oral and pharyngeal jaw evolution is correlated and how their evolutionary rates respond to feeding ecology. In support of decoupling, we find relaxed evolutionary integration between the two jaw systems, resulting in novel trait combinations that potentially facilitate feeding mode diversification. These outcomes are made possible by escaping the mechanical trade-off between force transmission and mobility, which characterizes a single jaw system that functions in isolation. In spite of the structural independence of the two jaw systems, results using a Bayesian, state-dependent, relaxed-clock model of multivariate Brownian motion indicate strongly aligned evolutionary responses to feeding ecology. So, although decoupling of prey capture and processing functions released constraints on jaw evolution and promoted trophic diversity in cichlids, the natural diversity of consumed prey has also induced a moderate degree of evolutionary integration between the jaw systems, reminiscent of the original mechanical trade-off between force and mobility.
Assuntos
Evolução Biológica , Ciclídeos/fisiologia , Dieta/veterinária , Comportamento Alimentar , Arcada Osseodentária/anatomia & histologia , Animais , Ciclídeos/anatomia & histologia , Arcada Osseodentária/fisiologiaRESUMO
Investigating gene expression evolution over micro- and macroevolutionary timescales will expand our understanding of the role of gene expression in adaptation and speciation. In this study, we characterized the evolutionary forces acting on gene expression levels in eye and brain tissue of five Heliconius butterflies with divergence times of â¼5-12 MYA. We developed and applied Brownian motion (BM) and Ornstein-Uhlenbeck (OU) models to identify genes whose expression levels are evolving through drift, stabilizing selection, or a lineage-specific shift. We found that 81% of the genes evolve under genetic drift. When testing for branch-specific shifts in gene expression, we detected 368 (16%) shift events. Genes showing a shift toward upregulation have significantly lower gene expression variance than those genes showing a shift leading toward downregulation. We hypothesize that directional selection is acting in shifts causing upregulation, since transcription is costly. We further uncovered through simulations that parameter estimation of OU models is biased when using small phylogenies and only becomes reliable with phylogenies having ≥ 50 taxa. Therefore, we developed a new statistical test based on BM to identify highly conserved genes (i.e., evolving under strong stabilizing selection), which comprised 3% of the orthoclusters. In conclusion, we found that drift is the dominant evolutionary force driving gene expression evolution in eye and brain tissue in Heliconius Nevertheless, the higher proportion of genes evolving under directional than under stabilizing selection might reflect species-specific selective pressures on vision and the brain that are necessary to fulfill species-specific requirements.