RESUMO
IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
Assuntos
Evolução Molecular , Genômica , Modelos Genéticos , Filogenia , SoftwareRESUMO
The size of plant stomata (adjustable pores that determine the uptake of CO2 and loss of water from leaves) is considered to be evolutionarily important. This study uses fossils from the major Southern Hemisphere family Proteaceae to test whether stomatal cell size responded to Cenozoic climate change. We measured the length and abundance of guard cells (the cells forming stomata), the area of epidermal pavement cells, stomatal index and maximum stomatal conductance from a comprehensive sample of fossil cuticles of Proteaceae, and extracted published estimates of past temperature and atmospheric CO2. We developed a novel test based on stochastic modelling of trait evolution to test correlations among traits. Guard cell length increased, and stomatal density decreased significantly with decreasing palaeotemperature. However, contrary to expectations, stomata tended to be smaller and more densely packed at higher atmospheric CO2. Thus, associations between stomatal traits and palaeoclimate over the last 70 million years in Proteaceae suggest that stomatal size is significantly affected by environmental factors other than atmospheric CO2. Guard cell length, pavement cell area, stomatal density and stomatal index covaried in ways consistent with coordinated development of leaf tissues.
Assuntos
Evolução Biológica , Estômatos de Plantas/fisiologia , Proteaceae/fisiologia , Fósseis , Folhas de PlantaRESUMO
We present and explore a general method for deriving a Lie-Markov model from a finite semigroup. If the degree of the semigroup is k, the resulting model is a continuous-time Markov chain on k-states and, as a consequence of the product rule in the semigroup, satisfies the property of multiplicative closure. This means that the product of any two probability substitution matrices taken from the model produces another substitution matrix also in the model. We show that our construction is a natural generalization of the concept of group-based models.
Assuntos
Cadeias de Markov , Filogenia , Biologia Computacional , Evolução Molecular , Conceitos Matemáticos , Modelos Genéticos , Modelos Estatísticos , Processos EstocásticosRESUMO
When the process underlying DNA substitutions varies across evolutionary history, some standard Markov models underlying phylogenetic methods are mathematically inconsistent. The most prominent example is the general time-reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, nonhomogeneous Lie Markov models have been identified as the class of models that are consistent in the face of a changing process of DNA substitutions regardless of taxon sampling. Some well-known models in popular use are within this class, but are either overly simplistic (e.g., the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform well, with the best performing models having 8-10 parameters and the ability to recognize the distinction between purines and pyrimidines.
Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Animais , DNA/química , DNA/genética , DNA Mitocondrial/química , DNA Mitocondrial/genética , Humanos , Nucleotídeos/genética , Nucleotídeos/metabolismo , Plantas/genéticaRESUMO
Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of "Lie Markov models" which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines-that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.
Assuntos
Modelos Genéticos , Nucleotídeos de Purina/genética , Nucleotídeos de Pirimidina/genética , Animais , DNA/genética , Evolução Molecular , Humanos , Cadeias de Markov , Conceitos Matemáticos , Filogenia , Processos EstocásticosRESUMO
Precise estimations of molecular rates are fundamental to our understanding of the processes of evolution. In principle, mutation and evolutionary rates for neutral regions of the same species are expected to be equal. However, a number of recent studies have shown that mutation rates estimated from pedigree material are much faster than evolutionary rates measured over longer time periods. To resolve this apparent contradiction, we have examined the hypervariable region (HVR I) of the mitochondrial genome using families of Adélie penguins (Pygoscelis adeliae) from the Antarctic. We sequenced 344 bps of the HVR I from penguins comprising 508 families with 915 chicks, together with both their parents. All of the 62 germline heteroplasmies that we detected in mothers were also detected in their offspring, consistent with maternal inheritance. These data give an estimated mutation rate (micro) of 0.55 mutations/site/Myrs (HPD 95% confidence interval of 0.29-0.88 mutations/site/Myrs) after accounting for the persistence of these heteroplasmies and the sensitivity of current detection methods. In comparison, the rate of evolution (k) of the same HVR I region, determined using DNA sequences from 162 known age sub-fossil bones spanning a 37,000-year period, was 0.86 substitutions/site/Myrs (HPD 95% confidence interval of 0.53 and 1.17). Importantly, the latter rate is not statistically different from our estimate of the mutation rate. These results are in contrast to the view that molecular rates are time dependent.
Assuntos
Evolução Molecular , Mutação , Spheniscidae/genética , Animais , Regiões Antárticas , DNA Mitocondrial/genética , Deriva Genética , Genética Populacional , Haplótipos , LinhagemRESUMO
The assumptions underpinning ancestral state reconstruction are violated in many evolutionary systems, especially for traits under directional selection. However, the accuracy of ancestral state reconstruction for non-neutral traits is poorly understood. To investigate the accuracy of ancestral state reconstruction methods, trees and binary characters were simulated under the BiSSE (Binary State Speciation and Extinction) model using a wide range of character-state-dependent rates of speciation, extinction and character-state transition. We used maximum parsimony (MP), BiSSE and two-state Markov (Mk2) models to reconstruct ancestral states. Under each method, error rates increased with node depth, true number of state transitions, and rates of state transition and extinction; exceeding 30% for the deepest 10% of nodes and highest rates of extinction and character-state transition. Where rates of character-state transition were asymmetrical, error rates were greater when the rate away from the ancestral state was largest. Preferential extinction of species with the ancestral character state also led to higher error rates. BiSSE outperformed Mk2 in all scenarios where either speciation or extinction was state dependent and outperformed MP under most conditions. MP outperformed Mk2 in most scenarios except when the rates of character-state transition and/or extinction were highly asymmetrical and the ancestral state was unfavoured.
RESUMO
BACKGROUND: Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. An example is RNase MRP processing ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simpler network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, herein called the Eukaryotic Ancestor. RESULTS: We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches, find previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of RNAs for RNase MRP, along with analysis of the target substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution. CONCLUSION: We conclude that RNase MRP can now be placed in the RNA-processing cascade of the Eukaryotic Ancestor, highlighting the complexity of RNA-processing in early eukaryotes. Promoter analyses of MRP-RNA suggest that regulation of the critical processes of rRNA cleavage can vary, showing that even these key cellular processes (for which we expect high conservation) show some species-specific variability. We present our consensus MRP-RNA secondary structure as a useful model for further searches.
Assuntos
Endorribonucleases/genética , Células Eucarióticas , Evolução Molecular , Processamento Pós-Transcricional do RNA , Animais , Sequência de Bases , Células Eucarióticas/metabolismo , Humanos , Modelos Biológicos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Filogenia , Regiões Promotoras Genéticas , RNA/metabolismo , Análise de Sequência de DNA , Transdução de SinaisRESUMO
We introduce a gene tree simulator that is designed for use in conjunction with approximate Bayesian computation approaches. We show that it can be used to determine the relative importance of hybrid speciation and introgression compared with incomplete lineage sorting (ILS) in producing patterns of incongruence across gene trees. Important features of the new simulator are (1) a choice of models to capture the decreasing probability of successful hybrid species formation or introgression as a function of genetic distance between potential parent species; (2) the ability for hybrid speciation to result in asymmetrical contributions of genetic material from each parent species; (3) the ability to vary the rates of hybrid speciation, introgression, and divergence speciation in different epochs; and (4) incorporation of the coalescent, so that patterns of incongruence due to ILS can be compared with those due to hybrid evolution. Given a set of gene trees generated by the simulator, we calculate a set of statistics, each measuring in a different way the discordance between the gene trees. We show that these statistics can be used to differentiate whether the gene tree discordance was largely due to hybridization, or only due to lineage sorting.
Assuntos
Simulação por Computador , Fluxo Gênico , Especiação Genética , Hibridização Genética , Modelos Genéticos , Teorema de Bayes , Evolução Molecular , Humanos , FilogeniaRESUMO
We present a mathematical model of mitochondrial inheritance evolving under neutral evolution to interpret the heteroplasmies observed at some sites. A comparison of the levels of heteroplasmies transmitted from mother to her offspring allows us to estimate the number N(x) of inherited mitochondrial genomes (segregating units). The model demonstrates the necessity of accounting for both the multiplicity of an unknown number N(x), and the threshold , below which heteroplasmy cannot be detected reliably, in order to estimate the mitochondrial mutation rate mu(m) in the maternal line of descent. Our model is applicable to pedigree studies of any eukaryotic species where site heteroplasmies are observed in regions of the mitochondria, provided neutrality can be assumed. The model is illustrated with an analysis of site heteroplasmies in the first hypervariable region of mitochondrial sequence data sampled from Adélie penguin families, providing an estimate N(x) and mu(m). This estimate of mu(m) was found to be consistent with earlier estimates from ancient DNA analysis.