Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 599(7886): 616-621, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34759322

RESUMEN

The origin and early dispersal of speakers of Transeurasian languages-that is, Japanese, Korean, Tungusic, Mongolic and Turkic-is among the most disputed issues of Eurasian population history1-3. A key problem is the relationship between linguistic dispersals, agricultural expansions and population movements4,5. Here we address this question by 'triangulating' genetics, archaeology and linguistics in a unified perspective. We report wide-ranging datasets from these disciplines, including a comprehensive Transeurasian agropastoral and basic vocabulary; an archaeological database of 255 Neolithic-Bronze Age sites from Northeast Asia; and a collection of ancient genomes from Korea, the Ryukyu islands and early cereal farmers in Japan, complementing previously published genomes from East Asia. Challenging the traditional 'pastoralist hypothesis'6-8, we show that the common ancestry and primary dispersals of Transeurasian languages can be traced back to the first farmers moving across Northeast Asia from the Early Neolithic onwards, but that this shared heritage has been masked by extensive cultural interaction since the Bronze Age. As well as marking considerable progress in the three individual disciplines, by combining their converging evidence we show that the early spread of Transeurasian speakers was driven by agriculture.


Asunto(s)
Agricultura/historia , Arqueología , Genética de Población , Migración Humana/historia , Lenguaje/historia , Lingüística , China , Conjuntos de Datos como Asunto , Mapeo Geográfico , Historia Antigua , Humanos , Japón , Corea (Geográfico) , Mongolia
2.
Nucleic Acids Res ; 52(2): 558-571, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38048305

RESUMEN

How genetic information gained its exquisite control over chemical processes needed to build living cells remains an enigma. Today, the aminoacyl-tRNA synthetases (AARS) execute the genetic codes in all living systems. But how did the AARS that emerged over three billion years ago as low-specificity, protozymic forms then spawn the full range of highly-specific enzymes that distinguish between 22 diverse amino acids? A phylogenetic reconstruction of extant AARS genes, enhanced by analysing modular acquisitions, reveals six AARS with distinct bacterial, archaeal, eukaryotic, or organellar clades, resulting in a total of 36 families of AARS catalytic domains. Small structural modules that differentiate one AARS family from another played pivotal roles in discriminating between amino acid side chains, thereby expanding the genetic code and refining its precision. The resulting model shows a tendency for less elaborate enzymes, with simpler catalytic domains, to activate amino acids that were not synthesised until later in the evolution of the code. The most probable evolutionary route for an emergent amino acid type to establish a place in the code was by recruiting older, less specific AARS, rather than adapting contemporary lineages. This process, retrofunctionalisation, differs from previously described mechanisms through which amino acids would enter the code.


Asunto(s)
Aminoacil-ARNt Sintetasas , Evolución Molecular , Código Genético , Aminoácidos/genética , Aminoácidos/metabolismo , Aminoacil-ARNt Sintetasas/química , Aminoacil-ARNt Sintetasas/genética , Aminoacil-ARNt Sintetasas/metabolismo , Bacterias/enzimología , Bacterias/genética , Filogenia , Archaea/enzimología , Archaea/genética , Eucariontes/enzimología , Eucariontes/genética
3.
Proc Natl Acad Sci U S A ; 119(32): e2112853119, 2022 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-35914165

RESUMEN

The Bantu expansion transformed the linguistic, economic, and cultural composition of sub-Saharan Africa. However, the exact dates and routes taken by the ancestors of the speakers of the more than 500 current Bantu languages remain uncertain. Here, we use the recently developed "break-away" geographical diffusion model, specially designed for modeling migrations, with "augmented" geographic information, to reconstruct the Bantu language family expansion. This Bayesian phylogeographic approach with augmented geographical data provides a powerful way of linking linguistic, archaeological, and genetic data to test hypotheses about large language family expansions. We compare four hypotheses: an early major split north of the rainforest; a migration through the Sangha River Interval corridor around 2,500 BP; a coastal migration around 4,000 BP; and a migration through the rainforest before the corridor opening, at 4,000 BP. Our results produce a topology and timeline for the Bantu language family, which supports the hypothesis of an expansion through Central African tropical forests at 4,420 BP (4,040 to 5,000 95% highest posterior density interval), well before the Sangha River Interval was open.


Asunto(s)
Lenguaje , Bosque Lluvioso , África Central , Teorema de Bayes , Población Negra , Migración Humana , Humanos , Filogeografía , Ríos
4.
Syst Biol ; 71(6): 1549-1560, 2022 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-35212733

RESUMEN

We present a two-headed approach called Bayesian Integrated Coalescent Epoch PlotS (BICEPS) for efficient inference of coalescent epoch models. Firstly, we integrate out population size parameters, and secondly, we introduce a set of more powerful Markov chain Monte Carlo (MCMC) proposals for flexing and stretching trees. Even though population sizes are integrated out and not explicitly sampled through MCMC, we are still able to generate samples from the population size posteriors. This allows demographic reconstruction through time and estimating the timing and magnitude of population bottlenecks and full population histories. Altogether, BICEPS can be considered a more muscular version of the popular Bayesian skyline model. We demonstrate its power and correctness by a well-calibrated simulation study. Furthermore, we demonstrate with an application to SARS-CoV-2 genomic data that some analyses that have trouble converging with the traditional Bayesian skyline prior and standard MCMC proposals can do well with the BICEPS approach. BICEPS is available as open-source package for BEAST 2 under GPL license and has a user-friendly graphical user interface.[Bayesian phylogenetics; BEAST 2; BICEPS; coalescent model.].


Asunto(s)
COVID-19 , Programas Informáticos , Algoritmos , Teorema de Bayes , Humanos , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo , Filogenia , SARS-CoV-2
5.
Syst Biol ; 71(4): 901-916, 2022 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-35176772

RESUMEN

As genomic sequence data become increasingly available, inferring the phylogeny of the species as that of concatenated genomic data can be enticing. However, this approach makes for a biased estimator of branch lengths and substitution rates and an inconsistent estimator of tree topology. Bayesian multispecies coalescent (MSC) methods address these issues. This is achieved by constraining a set of gene trees within a species tree and jointly inferring both under a Bayesian framework. However, this approach comes at the cost of increased computational demand. Here, we introduce StarBeast3-a software package for efficient Bayesian inference under the MSC model via Markov chain Monte Carlo. We gain efficiency by introducing cutting-edge proposal kernels and adaptive operators, and StarBeast3 is particularly efficient when a relaxed clock model is applied. Furthermore, gene-tree inference is parallelized, allowing the software to scale with the size of the problem. We validated our software and benchmarked its performance using three real and two synthetic data sets. Our results indicate that StarBeast3 is up to one-and-a-half orders of magnitude faster than StarBeast2, and therefore more than two orders faster than *BEAST, depending on the data set and on the parameter, and can achieve convergence on large data sets with hundreds of genes. StarBeast3 is open-source and is easy to set up with a friendly graphical user interface. [Adaptive; Bayesian inference; BEAST 2; effective population sizes; high performance; multispecies coalescent; parallelization; phylogenetics.].


Asunto(s)
Modelos Genéticos , Programas Informáticos , Teorema de Bayes , Cadenas de Markov , Método de Montecarlo , Filogenia
6.
Syst Biol ; 70(1): 145-161, 2021 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-33005955

RESUMEN

We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.].


Asunto(s)
Algoritmos , Modelos Genéticos , Teorema de Bayes , Simulación por Computador , Filogenia , Probabilidad
7.
PLoS Comput Biol ; 17(2): e1008322, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33529184

RESUMEN

Relaxed clock models enable estimation of molecular substitution rates across lineages and are widely used in phylogenetics for dating evolutionary divergence times. Under the (uncorrelated) relaxed clock model, tree branches are associated with molecular substitution rates which are independently and identically distributed. In this article we delved into the internal complexities of the relaxed clock model in order to develop efficient MCMC operators for Bayesian phylogenetic inference. We compared three substitution rate parameterisations, introduced an adaptive operator which learns the weights of other operators during MCMC, and we explored how relaxed clock model estimation can benefit from two cutting-edge proposal kernels: the AVMVN and Bactrian kernels. This work has produced an operator scheme that is up to 65 times more efficient at exploring continuous relaxed clock parameters compared with previous setups, depending on the dataset. Finally, we explored variants of the standard narrow exchange operator which are specifically designed for the relaxed clock model. In the most extreme case, this new operator traversed tree space 40% more efficiently than narrow exchange. The methodologies introduced are adaptive and highly effective on short as well as long alignments. The results are available via the open source optimised relaxed clock (ORC) package for BEAST 2 under a GNU licence (https://github.com/jordandouglas/ORC).


Asunto(s)
Evolución Molecular , Modelos Genéticos , Filogenia , Algoritmos , Animales , Teorema de Bayes , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Funciones de Verosimilitud , Cadenas de Markov , Método de Montecarlo , Tasa de Mutación , Programas Informáticos , Factores de Tiempo
8.
Int J Mol Sci ; 23(3)2022 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-35163448

RESUMEN

The role of aminoacyl-tRNA synthetases (aaRS) in the emergence and evolution of genetic coding poses challenging questions concerning their provenance. We seek evidence about their ancestry from curated structure-based multiple sequence alignments of a structurally invariant "scaffold" shared by all 10 canonical Class I aaRS. Three uncorrelated phylogenetic metrics-mutation frequency, its uniformity, and row-by-row cladistic congruence-imply that the Class I scaffold is a mosaic assembled from successive genetic sources. Metrics for different modules vary in accordance with their presumed functionality. Sequences derived from the ATP- and amino acid- binding sites exhibit specific two-way coupling to those derived from Connecting Peptide 1, a third module whose metrics suggest later acquisition. The data help validate: (i) experimental fragmentations of the canonical Class I structure into three partitions that retain catalytic activities in proportion to their length; and (ii) evidence that the ancestral Class I aaRS gene also encoded a Class II ancestor in frame on the opposite strand. A 46-residue Class I "protozyme" roots the Class I tree prior to the adaptive radiation of the Rossmann dinucleotide binding fold that refined substrate discrimination. Such rooting implies near simultaneous emergence of genetic coding and the origin of the proteome, resolving a conundrum posed by previous inferences that Class I aaRS evolved after the genetic code had been implemented in an RNA world. Further, pinpointing discontinuous enhancements of aaRS fidelity establishes a timeline for the growth of coding from a binary amino acid alphabet.


Asunto(s)
Aminoacil-ARNt Sintetasas/química , Aminoacil-ARNt Sintetasas/genética , Mutación , Benchmarking , Sitios de Unión , Evolución Molecular , Código Genético , Modelos Moleculares , Filogenia , Conformación Proteica , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
9.
Emerg Infect Dis ; 27(9): 2361-2368, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34424164

RESUMEN

Since severe acute respiratory syndrome coronavirus 2 was first eliminated in New Zealand in May 2020, a total of 13 known coronavirus disease (COVID-19) community outbreaks have occurred, 2 of which led health officials to issue stay-at-home orders. These outbreaks originated at the border via isolating returnees, airline workers, and cargo vessels. Because a public health system was informed by real-time viral genomic sequencing and complete genomes typically were available within 12 hours of community-based positive COVID-19 test results, every outbreak was well-contained. A total of 225 community cases resulted in 3 deaths. Real-time genomics were essential for establishing links between cases when epidemiologic data could not do so and for identifying when concurrent outbreaks had different origins.


Asunto(s)
COVID-19 , Virus , Genómica , Humanos , Nueva Zelanda/epidemiología , SARS-CoV-2
10.
Syst Biol ; 68(2): 358-364, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29945220

RESUMEN

Rapidly evolving pathogens, such as viruses and bacteria, accumulate genetic change at a similar timescale over which their epidemiological processes occur, such that, it is possible to make inferences about their infectious spread using phylogenetic time-trees. For this purpose it is necessary to choose a phylodynamic model. However, the resulting inferences are contingent on whether the model adequately describes key features of the data. Model adequacy methods allow formal rejection of a model if it cannot generate the main features of the data. We present TreeModelAdequacy, a package for the popular BEAST2 software that allows assessing the adequacy of phylodynamic models. We illustrate its utility by analyzing phylogenetic trees from two viral outbreaks of Ebola and H1N1 influenza. The main features of the Ebola data were adequately described by the coalescent exponential-growth model, whereas the H1N1 influenza data were best described by the birth-death susceptible-infected-recovered model.


Asunto(s)
Simulación por Computador , Ebolavirus/clasificación , Ebolavirus/genética , Genoma Viral/genética , Subtipo H1N1 del Virus de la Influenza A/clasificación , Subtipo H1N1 del Virus de la Influenza A/genética , Filogenia , Fiebre Hemorrágica Ebola/epidemiología , Fiebre Hemorrágica Ebola/virología , Humanos , Gripe Humana/epidemiología , Gripe Humana/virología , Programas Informáticos
11.
Syst Biol ; 68(2): 219-233, 2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29961836

RESUMEN

Bayesian inference methods rely on numerical algorithms for both model selection and parameter inference. In general, these algorithms require a high computational effort to yield reliable estimates. One of the major challenges in phylogenetics is the estimation of the marginal likelihood. This quantity is commonly used for comparing different evolutionary models, but its calculation, even for simple models, incurs high computational cost. Another interesting challenge relates to the estimation of the posterior distribution. Often, long Markov chains are required to get sufficient samples to carry out parameter inference, especially for tree distributions. In general, these problems are addressed separately by using different procedures. Nested sampling (NS) is a Bayesian computation algorithm, which provides the means to estimate marginal likelihoods together with their uncertainties, and to sample from the posterior distribution at no extra cost. The methods currently used in phylogenetics for marginal likelihood estimation lack in practicality due to their dependence on many tuning parameters and their inability of most implementations to provide a direct way to calculate the uncertainties associated with the estimates, unlike NS. In this article, we introduce NS to phylogenetics. Its performance is analysed under different scenarios and compared to established methods. We conclude that NS is a competitive and attractive algorithm for phylogenetic inference. An implementation is available as a package for BEAST 2 under the LGPL licence, accessible at https://github.com/BEAST2-Dev/nested-sampling.


Asunto(s)
Clasificación/métodos , Modelos Genéticos , Filogenia , Algoritmos
12.
PLoS Comput Biol ; 15(8): e1007189, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31386651

RESUMEN

Model-based phylodynamic approaches recently employed generalized linear models (GLMs) to uncover potential predictors of viral spread. Very recently some of these models have allowed both the predictors and their coefficients to be time-dependent. However, these studies mainly focused on predictors that are assumed to be constant through time. Here we inferred the phylodynamics of avian influenza A virus H9N2 isolated in 12 Asian countries and regions under both discrete trait analysis (DTA) and structured coalescent (MASCOT) approaches. Using MASCOT we applied a new time-dependent GLM to uncover the underlying factors behind H9N2 spread. We curated a rich set of time-series predictors including annual international live poultry trade and national poultry production figures. This time-dependent phylodynamic prediction model was compared to commonly employed time-independent alternatives. Additionally the time-dependent MASCOT model allowed for the estimation of viral effective sub-population sizes and their changes through time, and these effective population dynamics within each country were predicted by a GLM. International annual poultry trade is a strongly supported predictor of virus migration rates. There was also strong support for geographic proximity as a predictor of migration rate in all GLMs investigated. In time-dependent MASCOT models, national poultry production was also identified as a predictor of virus genetic diversity through time and this signal was obvious in mainland China. Our application of a recently introduced time-dependent GLM predictors integrated rich time-series data in Bayesian phylodynamic prediction. We demonstrated the contribution of poultry trade and geographic proximity (potentially unheralded wild bird movements) to avian influenza spread in Asia. To gain a better understanding of the drivers of H9N2 spread, we suggest increased surveillance of the H9N2 virus in countries that are currently under-sampled as well as in wild bird populations in the most affected countries.


Asunto(s)
Subtipo H9N2 del Virus de la Influenza A , Gripe Aviar/transmisión , Modelos Biológicos , Migración Animal , Animales , Animales Salvajes/virología , Asia/epidemiología , Teorema de Bayes , Aves/virología , Comercio , Biología Computacional , Monitoreo del Ambiente , Subtipo H9N2 del Virus de la Influenza A/clasificación , Subtipo H9N2 del Virus de la Influenza A/genética , Gripe Aviar/epidemiología , Gripe Aviar/virología , Modelos Lineales , Filogeografía/estadística & datos numéricos , Dinámica Poblacional , Aves de Corral/virología , Análisis Espacio-Temporal
13.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30958812

RESUMEN

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Asunto(s)
Teorema de Bayes , Evolución Biológica , Filogenia , Programas Informáticos , Animales , Biología Computacional , Simulación por Computador , Evolución Molecular , Humanos , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo
14.
Mol Biol Evol ; 34(8): 2101-2114, 2017 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-28431121

RESUMEN

Fully Bayesian multispecies coalescent (MSC) methods like *BEAST estimate species trees from multiple sequence alignments. Today thousands of genes can be sequenced for a given study, but using that many genes with *BEAST is intractably slow. An alternative is to use heuristic methods which compromise accuracy or completeness in return for speed. A common heuristic is concatenation, which assumes that the evolutionary history of each gene tree is identical to the species tree. This is an inconsistent estimator of species tree topology, a worse estimator of divergence times, and induces spurious substitution rate variation when incomplete lineage sorting is present. Another class of heuristics directly motivated by the MSC avoids many of the pitfalls of concatenation but cannot be used to estimate divergence times. To enable fuller use of available data and more accurate inference of species tree topologies, divergence times, and substitution rates, we have developed a new version of *BEAST called StarBEAST2. To improve convergence rates we add analytical integration of population sizes, novel MCMC operators and other optimizations. Computational performance improved by 13.5× and 13.8× respectively when analyzing two empirical data sets, and an average of 33.1× across 30 simulated data sets. To enable accurate estimates of per-species substitution rates, we introduce species tree relaxed clocks, and show that StarBEAST2 is a more powerful and robust estimator of rate variation than concatenation. StarBEAST2 is available through the BEAUTi package manager in BEAST 2.4 and above.


Asunto(s)
Alineación de Secuencia/métodos , Secuencia de Bases , Teorema de Bayes , Evolución Biológica , Simulación por Computador , Especiación Genética , Modelos Genéticos , Tasa de Mutación , Filogenia , Programas Informáticos
15.
Syst Biol ; 66(1): 3-22, 2017 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-28173588

RESUMEN

Divergence-time estimation based on molecular phylogenies and the fossil record has provided insights into fundamental questions of evolutionary biology. In Bayesian node dating, phylogenies are commonly time calibrated through the specification of calibration densities on nodes representing clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. As possible solutions to this problem, two nonexclusive alternative approaches have recently been developed, the "fossilized birth­death" (FBD) model and "total-evidence dating." While these approaches have been shown to perform well under certain conditions, they require including all (or a random subset) of the fossils of each clade in the analysis, rather than just relying on the oldest fossils of clades. In addition, both approaches assume that fossil records of different clades in the phylogeny are all the product of the same underlying fossil sampling rate, even though this rate has been shown to differ strongly between higher level taxa. We here develop a flexible new approach to Bayesian age estimation that combines advantages of node dating and the FBD model. In our new approach, calibration densities are defined on the basis of first fossil occurrences and sampling rate estimates that can be specified separately for all clades. We verify our approach with a large number of simulated data sets, and compare its performance to that of the FBD model. We find that our approach produces reliable age estimates that are robust to model violation, on par with the FBD model. By applying our approach to a large data set including sequence data from over 1000 species of teleost fishes as well as 147 carefully selected fossil constraints, we recover a timeline of teleost diversification that is incompatible with previously assumed vicariant divergences of freshwater fishes. Our results instead provide strong evidence for transoceanic dispersal of cichlids and other groups of teleost fishes.


Asunto(s)
Cíclidos/clasificación , Modelos Biológicos , Filogenia , Animales , Océano Atlántico , Teorema de Bayes , Biodiversidad , Fósiles , Especiación Genética , Tiempo
16.
BMC Evol Biol ; 17(1): 42, 2017 02 06.
Artículo en Inglés | MEDLINE | ID: mdl-28166715

RESUMEN

BACKGROUND: Reconstructing phylogenies through Bayesian methods has many benefits, which include providing a mathematically sound framework, providing realistic estimates of uncertainty and being able to incorporate different sources of information based on formal principles. Bayesian phylogenetic analyses are popular for interpreting nucleotide sequence data, however for such studies one needs to specify a site model and associated substitution model. Often, the parameters of the site model is of no interest and an ad-hoc or additional likelihood based analysis is used to select a single site model. RESULTS: bModelTest allows for a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis. It is based on trans-dimensional Markov chain Monte Carlo (MCMC) proposals that allow switching between substitution models as well as estimating the posterior probability for gamma-distributed rate heterogeneity, a proportion of invariable sites and unequal base frequencies. The model can be used with the full set of time-reversible models on nucleotides, but we also introduce and demonstrate the use of two subsets of time-reversible substitution models. CONCLUSION: With the new method the site model can be inferred (and marginalized) during the MCMC analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood-based methods. The method is implemented in the bModelTest package of the popular BEAST 2 software, which is open source, licensed under the GNU Lesser General Public License and allows joint site model and tree inference under a wide range of models.


Asunto(s)
Modelos Genéticos , Filogenia , Programas Informáticos , Algoritmos , Secuencia de Bases , Teorema de Bayes , Funciones de Verosimilitud , Cadenas de Markov , Método de Montecarlo , Incertidumbre
17.
Syst Biol ; 63(4): 534-42, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24627183

RESUMEN

The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested. [Bayes factor; model testing; phylogeography; RADseq; simulation; speciation.].


Asunto(s)
Genoma/genética , Filogenia , Filogeografía/métodos , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Animales , Teorema de Bayes , Simulación por Computador , Lagartos/genética
18.
PLoS Comput Biol ; 10(4): e1003537, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24722319

RESUMEN

We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.


Asunto(s)
Teorema de Bayes , Evolución Biológica , Programas Informáticos , Lenguajes de Programación
19.
PeerJ ; 12: e17276, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38699195

RESUMEN

In this article, we study the distance matrix as a representation of a phylogeny by way of hierarchical clustering. By defining a multivariate normal distribution on (a subset of) the entries in a matrix, this allows us to represent a distribution over rooted time trees. Here, we demonstrate tree distributions can be represented accurately this way for a number of published tree distributions. Though such a representation does not map to unique trees, restriction to a subspace, in particular one we call a "cube", makes the representation bijective at the cost of not being able to represent all possible trees. We introduce an algorithm "cubeVB" specifically for cubes and show through well calibrated simulation study that it is possible to recover parameters of interest like tree height and length. Although a cube cannot represent all of tree space, it is a great improvement over a single summary tree, and it opens up exciting new opportunities for scaling up Bayesian phylogenetic inference. We also demonstrate how to use a matrix representation of a tree distribution to get better summary trees than commonly used maximum clade credibility trees. An open source implementation of the cubeVB algorithm is available from https://github.com/rbouckaert/cubevb as the cubevb package for BEAST 2.


Asunto(s)
Algoritmos , Teorema de Bayes , Filogenia , Análisis por Conglomerados , Simulación por Computador
20.
bioRxiv ; 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38496513

RESUMEN

The spread of infectious diseases is shaped by spatial and temporal aspects, such as host population structure or changes in the transmission rate or number of infected individuals over time. These spatiotemporal dynamics are imprinted in the genome of pathogens and can be recovered from those genomes using phylodynamics methods. However, phylodynamic methods typically quantify either the temporal or spatial transmission dynamics, which leads to unclear biases, as one can potentially not be inferred without the other. Here, we address this challenge by introducing a structured coalescent skyline approach, MASCOT-Skyline that allows us to jointly infer spatial and temporal transmission dynamics of infectious diseases using Markov chain Monte Carlo inference. To do so, we model the effective population size dynamics in different locations using a non-parametric function, allowing us to approximate a range of population size dynamics. We show, using a range of different viral outbreak datasets, potential issues with phylogeographic methods. We then use these viral datasets to motivate simulations of outbreaks that illuminate the nature of biases present in the different phylogeographic methods. We show that spatial and temporal dynamics should be modeled jointly even if one seeks to recover just one of the two. Further, we showcase conditions under which we can expect phylogeographic analyses to be biased, particularly different subsampling approaches, as well as provide recommendations of when we can expect them to perform well. We implemented MASCOT-Skyline as part of the open-source software package MASCOT for the Bayesian phylodynamics platform BEAST2.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA