Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Syst Biol ; 72(1): 92-105, 2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36575813

RESUMO

In molecular phylogenetics, partition models and mixture models provide different approaches to accommodating heterogeneity in genomic sequencing data. Both types of models generally give a superior fit to data than models that assume the process of sequence evolution is homogeneous across sites and lineages. The Akaike Information Criterion (AIC), an estimator of Kullback-Leibler divergence, and the Bayesian Information Criterion (BIC) are popular tools to select models in phylogenetics. Recent work suggests that AIC should not be used for comparing mixture and partition models. In this work, we clarify that this difficulty is not fully explained by AIC misestimating the Kullback-Leibler divergence. We also investigate the performance of the AIC and BIC at comparing amongst mixture models and amongst partition models. We find that under nonstandard conditions (i.e. when some edges have small expected number of changes), AIC underestimates the expected Kullback-Leibler divergence. Under such conditions, AIC preferred the complex mixture models and BIC preferred the simpler mixture models. The mixture models selected by AIC had a better performance in estimating the edge length, while the simpler models selected by BIC performed better in estimating the base frequencies and substitution rate parameters. In contrast, AIC and BIC both prefer simpler partition models over more complex partition models under nonstandard conditions, despite the fact that the more complex partition model was the generating model. We also investigated how mispartitioning (i.e., grouping sites that have not evolved under the same process) affects both the performance of partition models compared with mixture models and the model selection process. We found that as the level of mispartitioning increases, the bias of AIC in estimating the expected Kullback-Leibler divergence remains the same, and the branch lengths and evolutionary parameters estimated by partition models become less accurate. We recommend that researchers are cautious when using AIC and BIC to select among partition and mixture models; other alternatives, such as cross-validation and bootstrapping, should be explored, but may suffer similar limitations [AIC; BIC; mispartitioning; partitioning; partition model; mixture model].


Assuntos
Genômica , Filogenia , Teorema de Bayes
2.
Syst Biol ; 71(6): 1541-1548, 2022 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-35041002

RESUMO

The use of information criteria to distinguish between phylogenetic models has become ubiquitous within the field. However, the variety and complexity of available models are much greater now than when these practices were established. The literature shows an increasing trajectory of healthy skepticism with regard to the use of information theory-based model selection within phylogenetics. We add to this by analyzing the specific case of comparison between partition and mixture models. We argue from a theoretical basis that information criteria are inherently more likely to favor partition models over mixture models, and we then demonstrate this through simulation. Based on our findings, we suggest that partition and mixture models are not suitable for information-theory based model comparison. [AIC, BIC; information criteria; maximum likelihood; mixture models; partitioned model; phylogenetics.].


Assuntos
Teorema de Bayes , Simulação por Computador , Filogenia
3.
Bull Math Biol ; 84(10): 118, 2022 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-36103093

RESUMO

Phylogenetic trees describe relationships between extant species, but beyond that their shape and their relative branch lengths can provide information on broader evolutionary processes of speciation and extinction. However, currently many of the most widely used macro-evolutionary models make predictions about the shapes of phylogenetic trees that differ considerably from what is observed in empirical phylogenies. Here, we propose a flexible and biologically plausible macroevolutionary model for phylogenetic trees where times to speciation or extinction events are drawn from a Coxian phase-type (PH) distribution. First, we show that different choices of parameters in our model lead to a range of tree balances as measured by Aldous' [Formula: see text] statistic. In particular, we demonstrate that it is possible to find parameters that correspond well to empirical tree balance. Next, we provide a natural extension of the [Formula: see text] statistic to sets of trees. This extension produces less biased estimates of [Formula: see text] compared to using the median [Formula: see text] values from individual trees. Furthermore, we derive a likelihood expression for the probability of observing an edge-weighted tree under a model with speciation but no extinction. Finally, we illustrate the application of our model by performing both absolute and relative goodness-of-fit tests for two large empirical phylogenies (squamates and angiosperms) that compare models with Coxian PH distributed times to speciation with models that assume exponential or Weibull distributed waiting times. In our numerical analysis, we found that, in most cases, models assuming a Coxian PH distribution provided the best fit.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Evolução Biológica , Filogenia , Probabilidade
4.
Syst Biol ; 69(2): 249-264, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31364711

RESUMO

Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (http://www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by sampling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.


Assuntos
Classificação/métodos , Alinhamento de Sequência/métodos , Software , Animais , Evolução Molecular , Peixes/classificação , Peixes/genética , Modelos Genéticos , Filogenia
5.
J Mol Evol ; 88(7): 575-597, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32725409

RESUMO

The function of a protein is primarily determined by its structure and amino acid sequence. Many biological questions of interest rely on being able to accurately determine the group of structures to which domains of a protein belong; this can be done through alignment and comparison of protein structures. Dozens of different methods for Protein Structure Alignment (PSA) have been proposed that use a wide range of techniques. The aim of this study is to determine the ability of PSA methods to identify pairs of protein domains known to share differing levels of structural similarity, and to assess their utility for clustering domains from several different folds into known groups. We present the results of a comprehensive investigation into eighteen PSA methods, to our knowledge the largest piece of independent research on this topic. Overall, SP-AlignNS (non-sequential) was found to be the best method for classification, and among the best performing methods for clustering. Methods (where possible) were split into the algorithm used to find the optimal alignment and the score used to assess similarity. This allowed us to largely separate the algorithm from the score it maximizes and thus, to assess their effectiveness independently of each other. Surprisingly, we found that some hybrids of mismatched scores and algorithms performed better than either of the native methods at classification and, in some cases, clustering as well. It is hoped that this investigation and the accompanying discussion will be useful for researchers selecting or designing methods to align protein structures.


Assuntos
Algoritmos , Conformação Proteica , Análise de Sequência de Proteína/métodos , Análise por Conglomerados , Modelos Moleculares , Alinhamento de Sequência/métodos , Software
6.
J Mol Evol ; 88(2): 136-150, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31781936

RESUMO

The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.


Assuntos
Substituição de Aminoácidos , Aminoacil-tRNA Sintetases/genética , Código Genético , Filogenia , Aminoácidos/genética , Anticódon , Evolução Molecular , Modelos Genéticos
7.
Proc Biol Sci ; 287(1919): 20192876, 2020 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-31992170

RESUMO

The size of plant stomata (adjustable pores that determine the uptake of CO2 and loss of water from leaves) is considered to be evolutionarily important. This study uses fossils from the major Southern Hemisphere family Proteaceae to test whether stomatal cell size responded to Cenozoic climate change. We measured the length and abundance of guard cells (the cells forming stomata), the area of epidermal pavement cells, stomatal index and maximum stomatal conductance from a comprehensive sample of fossil cuticles of Proteaceae, and extracted published estimates of past temperature and atmospheric CO2. We developed a novel test based on stochastic modelling of trait evolution to test correlations among traits. Guard cell length increased, and stomatal density decreased significantly with decreasing palaeotemperature. However, contrary to expectations, stomata tended to be smaller and more densely packed at higher atmospheric CO2. Thus, associations between stomatal traits and palaeoclimate over the last 70 million years in Proteaceae suggest that stomatal size is significantly affected by environmental factors other than atmospheric CO2. Guard cell length, pavement cell area, stomatal density and stomatal index covaried in ways consistent with coordinated development of leaf tissues.


Assuntos
Evolução Biológica , Estômatos de Plantas/fisiologia , Proteaceae/fisiologia , Fósseis , Folhas de Planta
8.
J Math Biol ; 81(2): 549-573, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32710155

RESUMO

A matrix Lie algebra is a linear space of matrices closed under the operation [Formula: see text]. The "Lie closure" of a set of matrices is the smallest matrix Lie algebra which contains the set. In the context of Markov chain theory, if a set of rate matrices form a Lie algebra, their corresponding Markov matrices are closed under matrix multiplication; this has been found to be a useful property in phylogenetics. Inspired by previous research involving Lie closures of DNA models, it was hypothesised that finding the Lie closure of a codon model could help to solve the problem of mis-estimation of the non-synonymous/synonymous rate ratio, [Formula: see text]. We propose two different methods of finding a linear space from a model: the first is the linear closure which is the smallest linear space which contains the model, and the second is the linear version which changes multiplicative constraints in the model to additive ones. For each of these linear spaces we then find the Lie closures of them. Under both methods, it was found that closed codon models would require thousands of parameters, and that any partial solution to this problem that was of a reasonable size violated stochasticity. Investigation of toy models indicated that finding the Lie closure of matrix linear spaces which deviated only slightly from a simple model resulted in a Lie closure that was close to having the maximum number of parameters possible. Given that Lie closures are not practical, we propose further consideration of the two variants of linearly closed models.


Assuntos
Códon , DNA , Modelos Biológicos , Cadeias de Markov , Filogenia
9.
Syst Biol ; 67(5): 905-915, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29788496

RESUMO

We give a non-technical introduction to convergence-divergence models, a new modeling approach for phylogenetic data that allows for the usual divergence of lineages after lineage-splitting but also allows for taxa to converge, i.e. become more similar over time. By examining the $3$-taxon case in some detail, we illustrate that phylogeneticists have been "spoiled" in the sense of not having to think about the structural parameters in their models by virtue of the strong assumption that evolution is tree-like. We show that there are not always good statistical reasons to prefer the usual class of tree-like models over more general convergence-divergence models. Specifically, we show many $3$-taxon data sets can be equally well explained by supposing violation of the molecular clock due to change in the rate of evolution along different edges, or by keeping the assumption of a constant rate of evolution but instead assuming that evolution is not a purely divergent process. Given the abundance of evidence that evolution is not strictly tree-like, our discussion is an illustration that as phylogeneticists we need to think clearly about the structural form of the models we use. For cases with four taxa, we show that there will be far greater ability to distinguish models with convergence from non-clock-like tree models. [Akaike information criterion; convergence-divergence models; distinguishability; identifiability; likelihood; molecular clock; phylogeny.].


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Evolução Biológica
10.
BMC Evol Biol ; 17(1): 38, 2017 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-28143390

RESUMO

BACKGROUND: Gene duplication has been identified as a key process driving functional change in many genomes. Several biological models exist for the evolution of a pair of duplicates after a duplication event, and it is believed that gene duplicates can evolve in different ways, according to one process, or a mix of processes. Subfunctionalization is one such process, under which the two duplicates can be preserved by dividing up the function of the original gene between them. Analysis of genomic data using subfunctionalization and related processes has thus far been relatively coarse-grained, with mathematical treatments usually focusing on the phenomenological features of gene duplicate evolution. RESULTS: Here, we develop and analyze a mathematical model using the mechanics of subfunctionalization and the assumption of Poisson rates of mutation. By making use of the results from the literature on the Phase-Type distribution, we are able to derive exact analytical results for the model. The main advantage of the mechanistic model is that it leads to testable predictions of the phenomenological behavior (instead of building this behavior into the model a priori), and allows for the estimation of biologically meaningful parameters. We fit the survival function implied by this model to real genome data (Homo sapiens, Mus musculus, Rattus norvegicus and Canis familiaris), and compare the fit against commonly used phenomenological survival functions. We estimate the number of regulatory regions, and rates of mutation (relative to silent site mutation) in the coding and regulatory regions. We find that for the four genomes tested the subfunctionalization model predicts that duplicates most-likely have just a few regulatory regions, and the rate of mutation in the coding region is around 5-10 times greater than the rate in the regulatory regions. This is the first model-based estimate of the number of regulatory regions in duplicates. CONCLUSIONS: Strong agreement between empirical results and the predictions of our model suggest that subfunctionalization provides a consistent explanation for the evolution of many gene duplicates.


Assuntos
Evolução Molecular , Duplicação Gênica , Genes Reguladores/genética , Modelos Genéticos , Mutação , Animais , Evolução Biológica , Genes Duplicados , Genoma , Cadeias de Markov
11.
J Theor Biol ; 423: 31-40, 2017 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-28435014

RESUMO

Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Corresponding corrections for genome rearrangement distances fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here, we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using a group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. The second aspect tackles the problem of accounting for the symmetries of circular arrangements. While, generally, a frame of reference is locked, and all computation made accordingly, this work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference. The philosophy of accounting for symmetries can be applied to any existing correction method, for which examples are offered.


Assuntos
Evolução Molecular , Genoma/genética , Filogenia , Funções Verossimilhança , Análise Espacial
12.
J Math Biol ; 75(6-7): 1619-1654, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28434023

RESUMO

Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.


Assuntos
Filogenia , Bioestatística/métodos , Simulação por Computador , DNA/genética , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Alinhamento de Sequência
13.
Curr Genet ; 62(1): 81-5, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26353943

RESUMO

The yeast Candida albicans, a commensal colonizer and occasional pathogen of humans, has a rudimentary mating ability. However, mating is a cumbersome process that has never been observed outside the laboratory, and the population structure of the species is predominantly clonal. Here we discuss recent findings that indicate that mating ability is under selection in C. albicans, i.e. that it is a biologically relevant process. C. albicans strains can only mate after they have sustained genetic damage. We propose that the rescue of such damaged strains by mating may be the primary reason why mating ability is under selection.


Assuntos
Candida albicans/fisiologia , Genes Fúngicos Tipo Acasalamento , Seleção Genética
14.
Environ Sci Technol ; 50(9): 4760-8, 2016 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-27007609

RESUMO

Wastewater-based epidemiology is increasingly being used as a tool to monitor drug use trends. To minimize costs, studies have typically monitored a small number of days. However, cycles of drug use may display weekly and seasonal trends that affect the accuracy of monthly or annual drug use estimates based on a limited number of samples. This study aimed to rationalize sampling methods for minimizing the number of samples required while maximizing information about temporal trends. A range of sampling strategies were examined: (i) targeted days (e.g., weekends), (ii) completely random or stratified random sampling, and (iii) a number of sampling strategies informed by known weekly cycles in drug use data. Using a time-series approach, analysis was performed for four drugs (MDMA, methamphetamine, cocaine, methadone) collected through a continuous sampling program over 14 months. Results showed, for drugs with weekly cycles (MDMA, methamphetamine and cocaine in this sample), sampling strategies which made use of those weekly cycles required fewer samples to obtain similar information as sampling 5 days per week and had better accuracy than stratified random sampling techniques.


Assuntos
Metanfetamina , Águas Residuárias , Cocaína , Metadona , N-Metil-3,4-Metilenodioxianfetamina , Detecção do Abuso de Substâncias
15.
BMC Evol Biol ; 14: 236, 2014 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-25472897

RESUMO

BACKGROUND: Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called "edge length" and "sequence" spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. RESULTS: For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. CONCLUSION: We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.


Assuntos
Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov
16.
Syst Biol ; 62(1): 78-92, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22914976

RESUMO

In their 2008 and 2009 articles, Sumner and colleagues introduced the "squangles"-a small set of Markov invariants for phylogenetic quartets. The squangles are consistent with the general Markov (GM) model and can be used to infer quartets without the need to explicitly estimate all parameters. As the GM model is inhomogeneous and hence nonstationary, the squangles are expected to perform well compared with standard approaches when there are changes in base composition among species. However, the GM model assumes constant rates across sites, so the squangles should be confounded by data generated with invariant sites or other forms of rate-variation across sites. Here we implement the squangles in a least-squares setting that returns quartets weighted by either confidence or internal edge lengths, and we show how these weighted quartets can be used as input into a variety of supertree and supernetwork methods. For the first time, we quantitatively investigate the robustness of the squangles to breaking of the constant rates-across-sites assumption on both simulated and real data sets; and we suggest a modification that improves the performance of the squangles in the presence of invariant sites. Our conclusion is that the squangles provide a novel tool for phylogenetic estimation that is complementary to methods that explicitly account for rate-variation across sites, but rely on homogeneous-and hence stationary-models.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Análise dos Mínimos Quadrados , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Reprodutibilidade dos Testes
17.
Syst Biol ; 62(2): 250-63, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23179601

RESUMO

The relationships of the 3 major clades of winged insects-Ephemeroptera, Odonata, and Neoptera-are still unclear. Many morphologists favor a clade Metapterygota (Odonata +Neoptera), but Chiastomyaria (Ephemeroptera + Neoptera) or Palaeoptera (Ephemeroptera +Odonata) has also been supported in some older and more recent studies. A possible explanation for the difficulties in resolving these relationships is concerted convergence-the convergent evolution of entire character complexes under the same or similar selective pressures. In this study, we analyze possible instances of this phenomenon in the context of head structures of Ephemeroptera, Odonata, and Neoptera. We apply a recently introduced formal approach to detect the occurrence of concerted convergence. We found that characters of the tentorium and mandibles in particular, but also some other head structures, have apparently not evolved independently, and thus can cause artifacts in tree reconstruction. Our subsequent analyses, which exclude character sets that may be affected by concerted convergence, corroborate the Palaeoptera concept. We show that the analysis of homoplasy and its influence on tree inference can be formally improved with important consequences for the identification of incompatibilities between data sets. Our results suggest that modified weighting (or exclusion of characters) in cases of formally identified correlated cliques of characters may improve morphology-based tree reconstruction.


Assuntos
Insetos/anatomia & histologia , Insetos/classificação , Filogenia , Animais , Cabeça/anatomia & histologia , Insetos/genética , RNA Ribossômico 18S/genética , RNA Ribossômico 28S/genética
18.
Syst Biol ; 62(1): 62-77, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22914977

RESUMO

We investigate distances on binary (presence/absence) data in the context of a Dollo process, where a trait can only arise once on a phylogenetic tree but may be lost many times. We introduce a novel distance, the Additive Dollo Distance (ADD), that applies to data generated under a Dollo model and show that it has some useful theoretical properties including an intriguing link to the LogDet/paralinear distance. Simulations of Dollo data are used to compare a number of binary distances including ADD, LogDet, a restriction-site-based distance, and some simple, but to our knowledge previously unstudied, variations on common binary distances. The simulations suggest that ADD outperforms other distances on Dollo data. Interestingly, we found that the LogDet distance performs poorly in the context of a Dollo process; this may have implications for its use in connection with conditioned genome reconstruction. We apply the ADD to two Diversity Arrays Technology data sets, one that broadly covers Eucalyptus species and one that focuses on the Eucalyptus series Adnataria. We also reanalyze gene family presence/absence data from bacterial genomes obtained from the COG database and compare the results with previous phylogenies estimated using the conditioned genome reconstruction approach. The results for these case studies are largely congruent with previous studies, in some cases giving more phylogenetic resolution.


Assuntos
Modelos Genéticos , Filogenia , Bactérias/classificação , Bactérias/genética , Simulação por Computador , Eucalyptus/classificação , Eucalyptus/genética , Reprodutibilidade dos Testes , Estatística como Assunto
19.
Biol Lett ; 10(11): 20140619, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25376800

RESUMO

The Tasmanian devil (Sarcophilus harrisii) was widespread in Australia during the Late Pleistocene but is now endemic to the island of Tasmania. Low genetic diversity combined with the spread of devil facial tumour disease have raised concerns for the species' long-term survival. Here, we investigate the origin of low genetic diversity by inferring the species' demographic history using temporal sampling with summary statistics, full-likelihood and approximate Bayesian computation methods. Our results show extensive population declines across Tasmania correlating with environmental changes around the last glacial maximum and following unstable climate related to increased 'El Niño-Southern Oscillation' activity.


Assuntos
Espécies em Perigo de Extinção , Variação Genética , Marsupiais/fisiologia , Repetições de Microssatélites , Animais , Austrália , Teorema de Bayes , Marsupiais/genética , Dinâmica Populacional , Fatores de Tempo
20.
Genome Biol Evol ; 16(3)2024 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-38412309

RESUMO

Microsatellites are widely used in population genetics, but their evolutionary dynamics remain poorly understood. It is unclear whether microsatellite loci drift in length over time. This is important because the mutation processes that underlie these important genetic markers are central to the evolutionary models that employ microsatellites. We identify more than 27 million microsatellites using a novel and unique dataset of modern and ancient Adélie penguin genomes along with data from 63 published chordate genomes. We investigate microsatellite evolutionary dynamics over 2 timescales: one based on Adélie penguin samples dating to ∼46.5 ka and the other dating to the diversification of chordates aged more than 500 Ma. We show that the process of microsatellite allele length evolution is at dynamic equilibrium; while there is length polymorphism among individuals, the length distribution for a given locus remains stable. Many microsatellites persist over very long timescales, particularly in exons and regulatory sequences. These often retain length variability, suggesting that they may play a role in maintaining phenotypic variation within populations.


Assuntos
Genética Populacional , Genoma , Humanos , Mutação , Repetições de Microssatélites , Polimorfismo Genético
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA