Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38142434

RESUMO

Tree tests like the Kishino-Hasegawa (KH) test and chi-square test suffer a selection bias that tests like the Shimodaira-Hasegawa (SH) test and approximately unbiased test were intended to correct. We investigate tree-testing performance in the presence of severe selection bias. The SH test is found to be very conservative and, surprisingly, its uncorrected analog, the KH test has low Type I error even in the presence of extreme selection bias, leading to a recommendation that the SH test be abandoned. A chi-square test is found to usually behave well and but to require correction in extreme cases. We show how topology testing procedures can be used to get support values for splits and compare the likelihood-based support values to the approximate likelihood ratio test (aLRT) support values. We find that the aLRT support values are reasonable even in settings with severe selection bias that they were not designed for. We also show how they can be used to construct tests of topologies and, in doing so, point out a multiple comparisons issue that should be considered when looking at support values for splits.


Assuntos
Funções Verossimilhança , Filogenia , Viés de Seleção
2.
Mol Biol Evol ; 40(12)2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37987557

RESUMO

Marine algae are central to global carbon fixation, and their productivity is dictated largely by resource availability. Reduced nutrient availability is predicted for vast oceanic regions as an outcome of climate change; however, there is much to learn regarding response mechanisms of the tiny picoplankton that thrive in these environments, especially eukaryotic phytoplankton. Here, we investigate responses of the picoeukaryote Micromonas commoda, a green alga found throughout subtropical and tropical oceans. Under shifting phosphate availability scenarios, transcriptomic analyses revealed altered expression of transfer RNA modification enzymes and biased codon usage of transcripts more abundant during phosphate-limiting versus phosphate-replete conditions, consistent with the role of transfer RNA modifications in regulating codon recognition. To associate the observed shift in the expression of the transfer RNA modification enzyme complement with the transfer RNAs encoded by M. commoda, we also determined the transfer RNA repertoire of this alga revealing potential targets of the modification enzymes. Codon usage bias was particularly pronounced in transcripts encoding proteins with direct roles in managing phosphate limitation and photosystem-associated proteins that have ill-characterized putative functions in "light stress." The observed codon usage bias corresponds to a proposed stress response mechanism in which the interplay between stress-induced changes in transfer RNA modifications and skewed codon usage in certain essential response genes drives preferential translation of the encoded proteins. Collectively, we expose a potential underlying mechanism for achieving growth under enhanced nutrient limitation that extends beyond the catalog of up- or downregulated protein-encoding genes to the cell biological controls that underpin acclimation to changing environmental conditions.


Assuntos
Clorófitas , Uso do Códon , Fosfatos/metabolismo , RNA de Transferência/genética , RNA de Transferência/metabolismo , Códon/genética , Códon/metabolismo , Clorófitas/genética , Clorófitas/metabolismo , Biossíntese de Proteínas
3.
Syst Biol ; 2023 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-37843172

RESUMO

Biochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies. However, it is unknown whether the large number of parameters in such models associated with the many amino acid frequency vectors can adversely affect tree topology estimates because of over-parameterization. Here we demonstrate theoretically that for long sequences, over-parameterization does not create problems for estimation with profile mixture models. Under mild conditions, tree, amino acid frequencies, and other model parameters converge to true values as sequence length increases, even when there are large numbers of components in the frequency profile distributions. Because large sample theory does not necessarily imply good behavior for shorter alignments we explore the performance of these models with short alignments simulated with tree topologies that are prone to LBA artifacts. We find that over-parameterization is not a problem for complex profile mixture models even when there are many amino acid frequency vectors. In fact, simple models with few site classes behave poorly. Interestingly, we also found that misspecification of the amino acid frequency vectors does not lead to increased LBA artifacts as long as the estimated cumulative distribution function of the amino acid frequencies at sites adequately approximates the true one. In contrast, misspecification of the amino acid exchangeability rates can severely negatively affect parameter estimation. Finally, we explore the effects of including in the profile mixture model an additional 'F-class' representing the overall frequencies of amino acids in the data set. Surprisingly, the F-class does not help parameter estimation significantly and can decrease the probability of correct tree estimation, depending on the scenario, even though it tends to improve likelihood scores.

4.
Mol Biol Evol ; 39(3)2022 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-35134997

RESUMO

Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites-a nonadaptive phenomenon referred to as the "evolutionary Stokes shift." Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. By examining propensity shifts from when an amino acid is first accepted at a site until it is subsequently replaced, we find that ≈50% of sites show a decrease in the propensity for the newly resident amino acid while the remaining sites show an increase. Furthermore, the distributions of the magnitudes of positive and negative shifts were comparable. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.


Assuntos
Aminoácidos , Evolução Molecular , Substituição de Aminoácidos , Aminoácidos/química , Aminoácidos/genética , Epistasia Genética , Proteínas/genética , Termodinâmica
5.
Syst Biol ; 70(4): 838-843, 2021 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-33528562

RESUMO

Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood. We argue here that it is largely due to differences in the sizes of the model spaces corresponding to different trees. Trees with long branches together allow much more flexible internal branch length parameter estimation. Consequently, although each tree has the same number of parameters, trees with long branches together have larger effective model spaces. The problem of LBA becomes particularly pronounced with partitioned data. Formulation of tree estimation as model selection leads us to propose bootstrap bias corrections as cross-checks on estimation when long branches end up being estimated together. [Bootstrap; long branch attraction; maximum likelihood; model selection; partitioned model; phylogenetics.].


Assuntos
Modelos Genéticos , Viés , Simulação por Computador , Funções Verossimilhança , Filogenia
6.
J Math Biol ; 84(4): 21, 2022 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-35188616

RESUMO

Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of the ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered gets very large. Broader implications of the results are discussed.


Assuntos
Evolução Molecular , Teorema de Bayes , Funções Verossimilhança , Fenótipo , Filogenia
7.
Mol Biol Evol ; 37(2): 549-562, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31688943

RESUMO

The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback-Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.


Assuntos
Biologia Computacional/métodos , Filogenia , Algoritmos , Teorema de Bayes , Funções Verossimilhança , Modelos Genéticos , Seleção Genética
8.
Mol Biol Evol ; 37(11): 3131-3148, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32897316

RESUMO

Do interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.


Assuntos
Epistasia Genética , Evolução Molecular , Modelos Genéticos , Mutação , Seleção Genética
9.
Syst Biol ; 69(4): 722-738, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31730199

RESUMO

A central objective in biology is to link adaptive evolution in a gene to structural and/or functional phenotypic novelties. Yet most analytic methods make inferences mainly from either phenotypic data or genetic data alone. A small number of models have been developed to infer correlations between the rate of molecular evolution and changes in a discrete or continuous life history trait. But such correlations are not necessarily evidence of adaptation. Here, we present a novel approach called the phenotype-genotype branch-site model (PG-BSM) designed to detect evidence of adaptive codon evolution associated with discrete-state phenotype evolution. An episode of adaptation is inferred under standard codon substitution models when there is evidence of positive selection in the form of an elevation in the nonsynonymous-to-synonymous rate ratio $\omega$ to a value $\omega > 1$. As it is becoming increasingly clear that $\omega > 1$ can occur without adaptation, the PG-BSM was formulated to infer an instance of adaptive evolution without appealing to evidence of positive selection. The null model makes use of a covarion-like component to account for general heterotachy (i.e., random changes in the evolutionary rate at a site over time). The alternative model employs samples of the phenotypic evolutionary history to test for phenomenological patterns of heterotachy consistent with specific mechanisms of molecular adaptation. These include 1) a persistent increase/decrease in $\omega$ at a site following a change in phenotype (the pattern) consistent with an increase/decrease in the functional importance of the site (the mechanism); and 2) a transient increase in $\omega$ at a site along a branch over which the phenotype changed (the pattern) consistent with a change in the site's optimal amino acid (the mechanism). Rejection of the null is followed by post hoc analyses to identify sites with strongest evidence for adaptation in association with changes in the phenotype as well as the most likely evolutionary history of the phenotype. Simulation studies based on a novel method for generating mechanistically realistic signatures of molecular adaptation show that the PG-BSM has good statistical properties. Analyses of real alignments show that site patterns identified post hoc are consistent with the specific mechanisms of adaptation included in the alternate model. Further simulation studies show that the covarion-like component of the PG-BSM plays a crucial role in mitigating recently discovered statistical pathologies associated with confounding by accounting for heterotachy-by-any-cause. [Adaptive evolution; branch-site model; confounding; mutation-selection; phenotype-genotype.].


Assuntos
Classificação/métodos , Códon/genética , Genótipo , Fenótipo , Filogenia , Adaptação Fisiológica/genética , Simulação por Computador
10.
J Theor Biol ; 526: 110788, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34097914

RESUMO

Two recent high profile studies have attempted to use edge (branch) length ratios from large sets of phylogenetic trees to determine the relative ages of genes of different origins in the evolution of eukaryotic cells. This approach can be straightforwardly justified if substitution rates are constant over the tree for a given protein. However, such strict molecular clock assumptions are not expected to hold on the billion-year timescale. Here we propose an alternative set of conditions under which comparisons of edge length distributions from multiple sets of phylogenies of proteins with different origins can be validly used to discern the order of their origins. We also point out scenarios where these conditions are not expected to hold and caution is warranted.


Assuntos
Células Eucarióticas , Evolução Molecular , Modelos Genéticos , Filogenia
11.
Bioinformatics ; 35(15): 2545-2554, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30541063

RESUMO

MOTIVATION: Likelihood ratio tests are commonly used to test for positive selection acting on proteins. They are usually applied with thresholds for declaring a protein under positive selection determined from a chi-square or mixture of chi-square distributions. Although it is known that such distributions are not strictly justified due to the statistical irregularity of the problem, the hope has been that the resulting tests are conservative and do not lose much power in comparison with the same test using the unknown, correct threshold. We show that commonly used thresholds need not yield conservative tests, but instead give larger than expected Type I error rates. Statistical regularity can be restored by using a modified likelihood ratio test. RESULTS: We give theoretical results to prove that, if the number of sites is not too small, the modified likelihood ratio test gives approximately correct Type I error probabilities regardless of the parameter settings of the underlying null hypothesis. Simulations show that modification gives Type I error rates closer to those stated without a loss of power. The simulations also show that parameter estimation for mixture models of codon evolution can be challenging in certain data-generation settings with very different mixing distributions giving nearly identical site pattern distributions unless the number of taxa and tree length are large. Because mixture models are widely used for a variety of problems in molecular evolution, the challenges and general approaches to solving them presented here are applicable in a broader context. AVAILABILITY AND IMPLEMENTATION: https://github.com/jehops/codeml_modl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biometria , Distribuição de Qui-Quadrado , Evolução Molecular , Funções Verossimilhança
12.
Syst Biol ; 68(6): 1003-1019, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31140564

RESUMO

Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.


Assuntos
Classificação/métodos , Modelos Teóricos , Filogenia , Simulação por Computador , Microsporídios/classificação , Microsporídios/genética
13.
Mol Biol Evol ; 35(5): 1266-1283, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29688541

RESUMO

As a consequence of structural and functional constraints, proteins tend to have site-specific preferences for particular amino acids. Failing to adjust for heterogeneity of frequencies over sites can lead to artifacts in phylogenetic estimation. Site-heterogeneous mixture-models have been developed to address this problem. However, due to prohibitive computational times, maximum likelihood implementations utilize fixed component frequency vectors inferred from sequences in a database that are external to the alignment under analysis. Here, we propose a composite likelihood approach to estimation of component frequencies for a mixture model that directly uses the data from the alignment of interest. In the common case that the number of taxa under study is not large, several adjustments to the default composite likelihood are shown to be necessary. In simulations, the approach is shown to provide large improvements over hierarchical clustering. For empirical data, substantial improvements in likelihoods are found over mixtures using fixed components.


Assuntos
Substituição de Aminoácidos , Modelos Genéticos , Simulação por Computador , Funções Verossimilhança , Filogenia
14.
Mol Biol Evol ; 35(6): 1473-1488, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29596684

RESUMO

When a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.


Assuntos
Mamíferos/genética , Modelos Genéticos , Mutação , Seleção Genética , Mutação Silenciosa , Animais , DNA Mitocondrial , Evolução Molecular , Funções Verossimilhança , Alinhamento de Sequência
15.
Syst Biol ; 67(2): 216-235, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28950365

RESUMO

Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with $k$ classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately $k$/1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Substituição de Aminoácidos , Simulação por Computador , Evolução Molecular , Estatísticas não Paramétricas
16.
Mol Biol Evol ; 34(2): 391-407, 2017 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28110273

RESUMO

A version of the mechanistic mutation-selection (MutSel) model that accounts for temporal dynamics at a site is presented. This is used to show that the rate ratio dN/dS at a site can be transiently >1 even when fitness coefficients are fixed or the fitness landscape is static. This occurs whenever a site drifts away from its fitness peak and is then forced back by selection, a process reminiscent of shifting balance. Shifting balance is strongest when the substitution process is not dominated by selection or drift, but admits interplay between the two. Under this condition, site-specific changes in dN/dS were inferred in 78-100% of trials, and positive selection (i.e., dN/dS>1) in 10-40% of trials, when sequence alignments generated under MutSel were fitted to two popular phenomenological branch-site models. These results demonstrate that positive selection can occur without a change in fitness regime, and that this is detectable by branch-site models. In addition, MutSel is used to show that a site can be occupied by a sub-optimal amino acid for long periods on a fixed landscape when selection is stringent. This has implications for the interpretation of constant-but-different site patterns typically attributed to changes in fitness. Furthermore, a version of MutSel with episodic changes in fitness coefficients is used to illustrate systematic differences between parameters used to generate data under MutSel and their counterparts estimated by a simple codon model. Motivated by a discrepancy in the literature, interpretation of dN/dS in the context of MutSel is also discussed.


Assuntos
Códon , Genética Populacional/métodos , Modelos Genéticos , Seleção Genética , Substituição de Aminoácidos , Aminoácidos/genética , Animais , Drosophila , Evolução Molecular , Variação Genética , Humanos , Mutação , Filogenia , Alinhamento de Sequência
17.
BMC Biol ; 15(1): 8, 2017 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-28193262

RESUMO

BACKGROUND: Departures from the standard genetic code in eukaryotic nuclear genomes are known for only a handful of lineages and only a few genetic code variants seem to exist outside the ciliates, the most creative group in this regard. Most frequent code modifications entail reassignment of the UAG and UAA codons, with evidence for at least 13 independent cases of a coordinated change in the meaning of both codons. However, no change affecting each of the two codons separately has been documented, suggesting the existence of underlying evolutionary or mechanistic constraints. RESULTS: Here, we present the discovery of two new variants of the nuclear genetic code, in which UAG is translated as an amino acid while UAA is kept as a termination codon (along with UGA). The first variant occurs in an organism noticed in a (meta)transcriptome from the heteropteran Lygus hesperus and demonstrated to be a novel insect-dwelling member of Rhizaria (specifically Sainouroidea). This first documented case of a rhizarian with a non-canonical genetic code employs UAG to encode leucine and represents an unprecedented change among nuclear codon reassignments. The second code variant was found in the recently described anaerobic flagellate Iotanema spirale (Metamonada: Fornicata). Analyses of transcriptomic data revealed that I. spirale uses UAG to encode glutamine, similarly to the most common variant of a non-canonical code known from several unrelated eukaryotic groups, including hexamitin diplomonads (also a lineage of fornicates). However, in these organisms, UAA also encodes glutamine, whereas it is the primary termination codon in I. spirale. Along with phylogenetic evidence for distant relationship of I. spirale and hexamitins, this indicates two independent genetic code changes in fornicates. CONCLUSIONS: Our study documents, for the first time, that evolutionary changes of the meaning of UAG and UAA codons in nuclear genomes can be decoupled and that the interpretation of the two codons by the cytoplasmic translation apparatus is mechanistically separable. The latter conclusion has interesting implications for possibilities of genetic code engineering in eukaryotes. We also present a newly developed generally applicable phylogeny-informed method for inferring the meaning of reassigned codons.


Assuntos
Núcleo Celular/genética , Códon/genética , Código Genético , Animais , Cilióforos/genética , Evolução Molecular , Glutamina/genética , Insetos/parasitologia , Leucina/genética , Fases de Leitura Aberta/genética , Filogenia , Rhizaria/genética
18.
Mol Biol Evol ; 33(11): 2976-2989, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27486222

RESUMO

To detect positive selection at individual amino acid sites, most methods use an empirical Bayes approach. After parameters of a Markov process of codon evolution are estimated via maximum likelihood, they are passed to Bayes formula to compute the posterior probability that a site evolved under positive selection. A difficulty with this approach is that parameter estimates with large errors can negatively impact Bayesian classification. By assigning priors to some parameters, Bayes Empirical Bayes (BEB) mitigates this problem. However, as implemented, it imposes uniform priors, which causes it to be overly conservative in some cases. When standard regularity conditions are not met and parameter estimates are unstable, inference, even under BEB, can be negatively impacted. We present an alternative to BEB called smoothed bootstrap aggregation (SBA), which bootstraps site patterns from an alignment of protein coding DNA sequences to accommodate the uncertainty in the parameter estimates. We show that deriving the correction for parameter uncertainty from the data in hand, in combination with kernel smoothing techniques, improves site specific inference of positive selection. We compare BEB to SBA by simulation and real data analysis. Simulation results show that SBA balances accuracy and power at least as well as BEB, and when parameter estimates are unstable, the performance gap between BEB and SBA can widen in favor of SBA. SBA is applicable to a wide variety of other inference problems in molecular evolution.


Assuntos
Aminoácidos/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Teorema de Bayes , Evolução Biológica , Códon/genética , Simulação por Computador , Evolução Molecular , Funções Verossimilhança , Cadeias de Markov , Modelos Genéticos , Modelos Estatísticos , Probabilidade , Seleção Genética , Incerteza
19.
Mol Phylogenet Evol ; 105: 114-125, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27568211

RESUMO

Assessing the robustness of an inferred phylogeny is an important element of phylogenetics. This is typically done with measures of stabilities at the internal branches and the variation of the positions of the leaf nodes. The bootstrap support for branches in maximum parsimony, distance and maximum likelihood estimation, or posterior probabilities in Bayesian inference, measure the uncertainty about a branch due to the sampling of the sites from genes or sampling genes from genomes. However, these measures do not reveal how taxon sampling affects branch support and the effects of taxon sampling on the estimated phylogeny. An internal branch in a phylogenetic tree can be viewed as a split that separates the taxa into two nonempty complementary subsets. We develop several split-specific measures of stability determined from bootstrap support for quartets. These include BPtaxon_split (average bootstrap percentage [BP] for all quartets involving a taxon within a split), BPsplit (BPtaxon_split averaged over taxa), BPtaxon (BPtaxon_split averaged over splits) and RBIC-taxon (average BP over all splits after removing a taxon). We also develop a pruned-tree distance metric. Application of our measures to empirical and simulated data illustrate that existing measures of overall stability can fail to detect taxa that are the primary source of a split-specific instability. Moreover, we show that the use of many reduced sets of quartets is important in being able to detect the influence of joint sets of taxa rather than individual taxa. These new measures are valuable diagnostic tools to guide taxon sampling in phylogenetic experimental design.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Filogenia , Sequência de Bases , Simulação por Computador , Funções Verossimilhança
20.
Syst Biol ; 64(2): 243-55, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25432892

RESUMO

Previous work on the star-tree paradox has shown that Bayesian methods suffer from a long branch attraction bias. That work is extended to settings involving more taxa and partially resolved trees. The long branch attraction bias is confirmed to arise more broadly and an additional source of bias is found. A by-product of the analysis is methods that correct for biases toward particular topologies. The corrections can be easily calculated using existing Bayesian software. Posterior support for a set of two or more trees can thus be supplemented with corrected versions to cross-check or replace results. Simulations show the corrections to be highly effective.


Assuntos
Simulação por Computador , Modelos Estatísticos , Filogenia , Software/normas , Teorema de Bayes , Viés
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA