Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
PeerJ ; 12: e17276, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38699195

RESUMEN

In this article, we study the distance matrix as a representation of a phylogeny by way of hierarchical clustering. By defining a multivariate normal distribution on (a subset of) the entries in a matrix, this allows us to represent a distribution over rooted time trees. Here, we demonstrate tree distributions can be represented accurately this way for a number of published tree distributions. Though such a representation does not map to unique trees, restriction to a subspace, in particular one we call a "cube", makes the representation bijective at the cost of not being able to represent all possible trees. We introduce an algorithm "cubeVB" specifically for cubes and show through well calibrated simulation study that it is possible to recover parameters of interest like tree height and length. Although a cube cannot represent all of tree space, it is a great improvement over a single summary tree, and it opens up exciting new opportunities for scaling up Bayesian phylogenetic inference. We also demonstrate how to use a matrix representation of a tree distribution to get better summary trees than commonly used maximum clade credibility trees. An open source implementation of the cubeVB algorithm is available from https://github.com/rbouckaert/cubevb as the cubevb package for BEAST 2.


Asunto(s)
Algoritmos , Teorema de Bayes , Filogenia , Análisis por Conglomerados , Simulación por Computador
2.
BMC Genom Data ; 25(1): 4, 2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38166646

RESUMEN

BACKGROUND: We tackle the problem of estimating species TMRCAs (Time to Most Recent Common Ancestor), given a genome sequence from each species and a large known phylogenetic tree with a known structure (typically from one of the species). The number of transitions at each site from the first sequence to the other is assumed to be Poisson distributed, and only the parity of the number of transitions is observed. The detailed phylogenetic tree contains information about the transition rates in each site. We use this formulation to develop and analyze multiple estimators of the species' TMRCA. To test our methods, we use mtDNA substitution statistics from the well-established Phylotree as a baseline for data simulation such that the substitution rate per site mimics the real-world observed rates. RESULTS: We evaluate our methods using simulated data and compare them to the Bayesian optimizing software BEAST2, showing that our proposed estimators are accurate for a wide range of TMRCAs and significantly outperform BEAST2. We then apply the proposed estimators on Neanderthal, Denisovan, and Chimpanzee mtDNA genomes to better estimate their TMRCA with modern humans and find that their TMRCA is substantially later, compared to values cited recently in the literature. CONCLUSIONS: Our methods utilize the transition statistics from the entire known human mtDNA phylogenetic tree (Phylotree), eliminating the requirement to reconstruct a tree encompassing the specific sequences of interest. Moreover, they demonstrate notable improvement in both running speed and accuracy compared to BEAST2, particularly for earlier TMRCAs like the human-Chimpanzee split. Our results date the human - Neanderthal TMRCA to be [Formula: see text] years ago, considerably later than values cited in other recent studies.


Asunto(s)
Hominidae , Hombre de Neandertal , Animales , Humanos , Hombre de Neandertal/genética , Filogenia , Pan troglodytes/genética , Teorema de Bayes , Hominidae/genética , ADN Mitocondrial/genética
3.
Syst Biol ; 72(5): 1180-1187, 2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37161619

RESUMEN

Bayesian phylogenetic inference requires a tree prior, which models the underlying diversification process that gives rise to the phylogeny. Existing birth-death diversification models include a wide range of features, for instance, lineage-specific variations in speciation and extinction (SSE) rates. While across-lineage variation in SSE rates is widespread in empirical datasets, few heterogeneous rate models have been implemented as tree priors for Bayesian phylogenetic inference. As a consequence, rate heterogeneity is typically ignored when reconstructing phylogenies, and rate heterogeneity is usually investigated on fixed trees. In this paper, we present a new BEAST2 package implementing the cladogenetic diversification rate shift (ClaDS) model as a tree prior. ClaDS is a birth-death diversification model designed to capture small progressive variations in birth and death rates along a phylogeny. Unlike previous implementations of ClaDS, which were designed to be used with fixed, user-chosen phylogenies, our package is implemented in the BEAST2 framework and thus allows full phylogenetic inference, where the phylogeny and model parameters are co-estimated from a molecular alignment. Our package provides all necessary components of the inference, including a new tree object and operators to propose moves to the Monte-Carlo Markov chain. It also includes a graphical interface through BEAUti. We validate our implementation of the package by comparing the produced distributions to simulated data and show an empirical example of the full inference, using a dataset of cetaceans.


Asunto(s)
Especiación Genética , Filogenia , Teorema de Bayes , Método de Montecarlo , Cadenas de Markov
4.
Open Res Eur ; 3: 204, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38481771

RESUMEN

Phylogenetic estimation is, and has always been, a complex endeavor. Estimating a phylogenetic tree involves evaluating many possible solutions and possible evolutionary histories that could explain a set of observed data, typically by using a model of evolution. Values for all model parameters need to be evaluated as well. Modern statistical methods involve not just the estimation of a tree, but also solutions to more complex models involving fossil record information and other data sources. Markov chain Monte Carlo (MCMC) is a leading method for approximating the posterior distribution of parameters in a mathematical model. It is deployed in all Bayesian phylogenetic tree estimation software. While many researchers use MCMC in phylogenetic analyses, interpreting results and diagnosing problems with MCMC remain vexing issues to many biologists. In this manuscript, we will offer an overview of how MCMC is used in Bayesian phylogenetic inference, with a particular emphasis on complex hierarchical models, such as the fossilized birth-death (FBD) model. We will discuss strategies to diagnose common MCMC problems and troubleshoot difficult analyses, in particular convergence issues. We will show how the study design, the choice of models and priors, but also technical features of the inference tools themselves can all be adjusted to obtain the best results. Finally, we will also discuss the unique challenges created by the incorporation of fossil information in phylogenetic inference, and present tips to address them.

5.
Front Plant Sci ; 13: 842842, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35783934

RESUMEN

Polyploidization is one of the most common speciation mechanisms in plants. This is particularly relevant in high mountain environments and/or in areas heavily affected by climatic oscillations. Although the role of polyploidy and the temporal and geographical frameworks of polyploidization have been intensively investigated in the alpine regions of the temperate and arctic biomes, fewer studies are available with a specific focus on the Mediterranean region. Leucanthemopsis (Asteraceae) consists of six to ten species with several infraspecific entities, mainly distributed in the western Mediterranean Basin. It is a polyploid complex including montane, subalpine, and strictly alpine lineages, which are locally distributed in different mountain ranges of Western Europe and North Africa. We used a mixed approach including Sanger sequencing and (Roche-454) high throughput sequencing of amplicons to gather information from single-copy nuclear markers and plastid regions. Nuclear regions were carefully tested for recombinants/PCR artifacts and for paralogy. Coalescent-based methods were used to infer the number of polyploidization events and the age of formation of polyploid lineages, and to reconstruct the reticulate evolution of the genus. Whereas the polyploids within the widespread Leucanthemopsis alpina are autopolyploids, the situation is more complex among the taxa endemic to the western Mediterranean. While the hexaploid, L. longipectinata, confined to the northern Moroccan mountain ranges (north-west Africa), is an autopolyploid, the Iberian polyploids are clearly of allopolyploid origins. At least two different polyploidization events gave rise to L. spathulifolia and to all other tetraploid Iberian taxa, respectively. The formation of the Iberian allopolyploids took place in the early Pleistocene and was probably caused by latitudinal and elevational range shifts that brought into contact previously isolated Leucanthemopsis lineages. Our study thus highlights the importance of the Pleistocene climatic oscillations and connected polyploidization events for the high plant diversity in the Mediterranean Basin.

6.
Int J Mol Sci ; 23(3)2022 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-35163448

RESUMEN

The role of aminoacyl-tRNA synthetases (aaRS) in the emergence and evolution of genetic coding poses challenging questions concerning their provenance. We seek evidence about their ancestry from curated structure-based multiple sequence alignments of a structurally invariant "scaffold" shared by all 10 canonical Class I aaRS. Three uncorrelated phylogenetic metrics-mutation frequency, its uniformity, and row-by-row cladistic congruence-imply that the Class I scaffold is a mosaic assembled from successive genetic sources. Metrics for different modules vary in accordance with their presumed functionality. Sequences derived from the ATP- and amino acid- binding sites exhibit specific two-way coupling to those derived from Connecting Peptide 1, a third module whose metrics suggest later acquisition. The data help validate: (i) experimental fragmentations of the canonical Class I structure into three partitions that retain catalytic activities in proportion to their length; and (ii) evidence that the ancestral Class I aaRS gene also encoded a Class II ancestor in frame on the opposite strand. A 46-residue Class I "protozyme" roots the Class I tree prior to the adaptive radiation of the Rossmann dinucleotide binding fold that refined substrate discrimination. Such rooting implies near simultaneous emergence of genetic coding and the origin of the proteome, resolving a conundrum posed by previous inferences that Class I aaRS evolved after the genetic code had been implemented in an RNA world. Further, pinpointing discontinuous enhancements of aaRS fidelity establishes a timeline for the growth of coding from a binary amino acid alphabet.


Asunto(s)
Aminoacil-ARNt Sintetasas/química , Aminoacil-ARNt Sintetasas/genética , Mutación , Benchmarking , Sitios de Unión , Evolución Molecular , Código Genético , Modelos Moleculares , Filogenia , Conformación Proteica , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
7.
PeerJ ; 8: e9368, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32617191

RESUMEN

Tip dating, a method of phylogenetic analysis in which fossils are included as terminals and assigned an age, is becoming increasingly widely used in evolutionary studies. Current implementations of tip dating allow fossil ages to be assigned as a point estimate, or incorporate uncertainty through the use of uniform tip age priors. However, the use of tip age priors has the unwanted effect of decoupling the ages of fossils from the same fossil site. Here we introduce a new Markov Chain Monte Carlo (MCMC) proposal, which allows fossils from the same site to have linked ages, while still incorporating uncertainty in the age of the fossil site itself. We also include an extension, allowing fossil sites to be ordered in a stratigraphic column with age bounds applied only to the top and bottom of the sequence. These MCMC proposals are implemented in a new open-source BEAST2 package, palaeo. We test these new proposals on a dataset of early vertebrate fossils, concentrating on the effects on two sites with multiple acanthodian fossil taxa but wide age uncertainty, the Man On The Hill (MOTH) site from northern Canada, and the Turin Hill site from Scotland, both of Lochkovian (Early Devonian) age. The results show an increased precision of age estimates when fossils have linked tip ages compared to when ages are unlinked, and in this example leads to support for a younger age for the MOTH site compared with the Turin Hill site. There is also a minor effect on the tree topology of acanthodians. These new MCMC proposals should be widely applicable to studies that employ tip dating, particularly when the terminals are coded as individual specimens.

8.
Mol Biol Evol ; 37(10): 3061-3075, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-32492139

RESUMEN

Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.


Asunto(s)
Teorema de Bayes , Técnicas Genéticas , Modelos Genéticos , Filogenia , Programas Informáticos , Conjuntos de Datos como Asunto
9.
Front Genet ; 10: 1064, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31737047

RESUMEN

The fossilized birth-death (FBD) model allows the estimation of species divergence times from molecular and fossil information in a coherent framework of diversification and fossil sampling. Some assumptions of the FBD model, however, are difficult to meet in phylogenetic analyses of highly diverse groups. Here, I use simulations to assess the impact of extreme model violations, including diversified sampling of species and the exclusive use of the oldest fossils per clade, on divergence times estimated with the FBD model. My results demonstrate that selective sampling of fossils can produce dramatically overestimated divergence times when the FBD model is used for inference, due to an interplay of underestimates for the model parameters net diversification rate, turnover, and fossil-sampling proportion. In contrast, divergence times estimated with CladeAge, a method that uses information about the oldest fossils per clade together with estimates of sampling and diversification rates, are accurate under these conditions. Practitioners of Bayesian divergence-time estimation should therefore ensure that the dataset conforms to the expectations of the FBD model, or estimates of sampling and diversification rates should be obtained a priori so that CladeAge can be used for the inference.

10.
R Soc Open Sci ; 5(3): 171504, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29657761

RESUMEN

The Dravidian language family consists of about 80 varieties (Hammarström H. 2016 Glottolog 2.7) spoken by 220 million people across southern and central India and surrounding countries (Steever SB. 1998 In The Dravidian languages (ed. SB Steever), pp. 1-39: 1). Neither the geographical origin of the Dravidian language homeland nor its exact dispersal through time are known. The history of these languages is crucial for understanding prehistory in Eurasia, because despite their current restricted range, these languages played a significant role in influencing other language groups including Indo-Aryan (Indo-European) and Munda (Austroasiatic) speakers. Here, we report the results of a Bayesian phylogenetic analysis of cognate-coded lexical data, elicited first hand from native speakers, to investigate the subgrouping of the Dravidian language family, and provide dates for the major points of diversification. Our results indicate that the Dravidian language family is approximately 4500 years old, a finding that corresponds well with earlier linguistic and archaeological studies. The main branches of the Dravidian language family (North, Central, South I, South II) are recovered, although the placement of languages within these main branches diverges from previous classifications. We find considerable uncertainty with regard to the relationships between the main branches.

11.
Mol Phylogenet Evol ; 125: 147-162, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29535031

RESUMEN

Members of the family Pleuronectidae are common representatives of the marine benthic fauna inhabiting northern regions of the Atlantic and Pacific oceans. The most recent comprehensive classification of the family, based entirely on morphological synapomorphies, recognized five subfamilies, 23 genera, and 61 extant species. However, several subsequent molecular studies have shown that many synapomorphic characters discovered in the morphological study might represent homoplasies, thereby questioning the reliance on these characters with the warning that they may provide misleading information for testing other morphology-based evolutionary hypotheses. In the present study, we propose a comprehensive taxonomic reassessment of the family Pleuronectidae based on the molecular phylogeny reconstructed from four nuclear and three mitochondrial loci and represented by complete taxon sampling of all but one valid species currently assigned to this family. To check for robustness of the phylogenetic hypothesis, we analyzed the effect of base compositional heterogeneity on phylogenetic signal for each locus and compared six different gene partitioning schemes. The final dataset, comprising 14 partitions and 154 individuals, was used to reconstruct phylogenetic trees in RAxML, MrBayes and BEAST2. Alternative topologies for several questionable nodes were compared using Bayes factors. The topology with the highest marginal likelihood was selected as the final phylogenetic tree for inferring pleuronectid relationships and character evolution. Based on our results, we recognize the Pleuronectidae comprising five subfamilies, 24 genera and 59 species. Our new phylogeny comprises five major monophyletic groups within the family, which we define as the subfamilies within the family: Atheresthinae, Pleuronichthyinae, Microstominae, Hippoglossinae and Pleuronectinae. Taxonomic composition of most of these subfamilies is different from that proposed in previous classifications. We also re-assess hypotheses proposed in earlier studies regarding intra-relationships of species of each lineage. Results of the current study contribute to better understanding of the evolutionary relationships of pleuronectid flatfishes based on molecular evidence, and they also provide the framework towards future comprehensive morphological revision of constituent lineages within the family Pleuronectidae.


Asunto(s)
Lenguado/clasificación , Lenguado/genética , Sitios Genéticos , Filogenia , Animales , Teorema de Bayes , ADN Mitocondrial/genética , Geografía , Funciones de Verosimilitud , Alineación de Secuencia , Análisis de Secuencia de ADN , Especificidad de la Especie
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA