Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
1.
Nat Microbiol ; 9(4): 964-975, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38519541

RESUMO

Extremely halophilic archaea (Haloarchaea, Nanohaloarchaeota, Methanonatronarchaeia and Halarchaeoplasmatales) thrive in saturating salt concentrations where they must maintain osmotic equilibrium with their environment. The evolutionary history of adaptations enabling salt tolerance remains poorly understood, in particular because the phylogeny of several lineages is conflicting. Here we present a resolved phylogeny of extremely halophilic archaea obtained using improved taxon sampling and state-of-the-art phylogenetic approaches designed to cope with the strong compositional biases of their proteomes. We describe two uncultured lineages, Afararchaeaceae and Asbonarchaeaceae, which break the long branches at the base of Haloarchaea and Nanohaloarchaeota, respectively. We obtained 13 metagenome-assembled genomes (MAGs) of these archaea from metagenomes of hypersaline aquatic systems of the Danakil Depression (Ethiopia). Our phylogenomic analyses including these taxa show that at least four independent adaptations to extreme halophily occurred during archaeal evolution. Gene-tree/species-tree reconciliation suggests that gene duplication and horizontal gene transfer played an important role in this process, for example, by spreading key genes (such as those encoding potassium transporters) across extremely halophilic lineages.


Assuntos
Euryarchaeota , Salinidade , Filogenia , Archaea/genética , Euryarchaeota/genética , Metagenoma
2.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38142434

RESUMO

Tree tests like the Kishino-Hasegawa (KH) test and chi-square test suffer a selection bias that tests like the Shimodaira-Hasegawa (SH) test and approximately unbiased test were intended to correct. We investigate tree-testing performance in the presence of severe selection bias. The SH test is found to be very conservative and, surprisingly, its uncorrected analog, the KH test has low Type I error even in the presence of extreme selection bias, leading to a recommendation that the SH test be abandoned. A chi-square test is found to usually behave well and but to require correction in extreme cases. We show how topology testing procedures can be used to get support values for splits and compare the likelihood-based support values to the approximate likelihood ratio test (aLRT) support values. We find that the aLRT support values are reasonable even in settings with severe selection bias that they were not designed for. We also show how they can be used to construct tests of topologies and, in doing so, point out a multiple comparisons issue that should be considered when looking at support values for splits.


Assuntos
Funções Verossimilhança , Filogenia , Viés de Seleção
3.
Mol Biol Evol ; 40(12)2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37987557

RESUMO

Marine algae are central to global carbon fixation, and their productivity is dictated largely by resource availability. Reduced nutrient availability is predicted for vast oceanic regions as an outcome of climate change; however, there is much to learn regarding response mechanisms of the tiny picoplankton that thrive in these environments, especially eukaryotic phytoplankton. Here, we investigate responses of the picoeukaryote Micromonas commoda, a green alga found throughout subtropical and tropical oceans. Under shifting phosphate availability scenarios, transcriptomic analyses revealed altered expression of transfer RNA modification enzymes and biased codon usage of transcripts more abundant during phosphate-limiting versus phosphate-replete conditions, consistent with the role of transfer RNA modifications in regulating codon recognition. To associate the observed shift in the expression of the transfer RNA modification enzyme complement with the transfer RNAs encoded by M. commoda, we also determined the transfer RNA repertoire of this alga revealing potential targets of the modification enzymes. Codon usage bias was particularly pronounced in transcripts encoding proteins with direct roles in managing phosphate limitation and photosystem-associated proteins that have ill-characterized putative functions in "light stress." The observed codon usage bias corresponds to a proposed stress response mechanism in which the interplay between stress-induced changes in transfer RNA modifications and skewed codon usage in certain essential response genes drives preferential translation of the encoded proteins. Collectively, we expose a potential underlying mechanism for achieving growth under enhanced nutrient limitation that extends beyond the catalog of up- or downregulated protein-encoding genes to the cell biological controls that underpin acclimation to changing environmental conditions.


Assuntos
Clorófitas , Uso do Códon , Fosfatos/metabolismo , RNA de Transferência/genética , RNA de Transferência/metabolismo , Códon/genética , Códon/metabolismo , Clorófitas/genética , Clorófitas/metabolismo , Biossíntese de Proteínas
4.
Syst Biol ; 2023 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-37843172

RESUMO

Biochemical constraints on the admissible amino acids at specific sites in proteins lead to heterogeneity of the amino acid substitution process over sites in alignments. It is well known that phylogenetic models of protein sequence evolution that do not account for site heterogeneity are prone to long-branch attraction (LBA) artifacts. Profile mixture models were developed to model heterogeneity of preferred amino acids at sites via a finite distribution of site classes each with a distinct set of equilibrium amino acid frequencies. However, it is unknown whether the large number of parameters in such models associated with the many amino acid frequency vectors can adversely affect tree topology estimates because of over-parameterization. Here we demonstrate theoretically that for long sequences, over-parameterization does not create problems for estimation with profile mixture models. Under mild conditions, tree, amino acid frequencies, and other model parameters converge to true values as sequence length increases, even when there are large numbers of components in the frequency profile distributions. Because large sample theory does not necessarily imply good behavior for shorter alignments we explore the performance of these models with short alignments simulated with tree topologies that are prone to LBA artifacts. We find that over-parameterization is not a problem for complex profile mixture models even when there are many amino acid frequency vectors. In fact, simple models with few site classes behave poorly. Interestingly, we also found that misspecification of the amino acid frequency vectors does not lead to increased LBA artifacts as long as the estimated cumulative distribution function of the amino acid frequencies at sites adequately approximates the true one. In contrast, misspecification of the amino acid exchangeability rates can severely negatively affect parameter estimation. Finally, we explore the effects of including in the profile mixture model an additional 'F-class' representing the overall frequencies of amino acids in the data set. Surprisingly, the F-class does not help parameter estimation significantly and can decrease the probability of correct tree estimation, depending on the scenario, even though it tends to improve likelihood scores.

5.
J Math Biol ; 84(4): 21, 2022 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-35188616

RESUMO

Likelihood-based methods are widely considered the best approaches for reconstructing ancestral states. Although much effort has been made to study properties of these methods, previous works often assume that both the tree topology and edge lengths are known. In some scenarios the tree topology might be reasonably well known for the taxa under study. When sequence length is much smaller than the number of species, however, edge lengths are not likely to be accurately estimated. We study the consistency of the maximum likelihood and empirical Bayes estimators of the ancestral state of discrete traits in such settings under a star tree. We prove that the likelihood-based reconstruction is consistent under symmetric models but can be inconsistent under non-symmetric models. We show, however, that a simple consistent estimator for the ancestral states is available under non-symmetric models. The results illustrate that likelihood methods can unexpectedly have undesirable properties as the number of sequences considered gets very large. Broader implications of the results are discussed.


Assuntos
Evolução Molecular , Teorema de Bayes , Funções Verossimilhança , Fenótipo , Filogenia
6.
Mol Biol Evol ; 39(3)2022 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-35134997

RESUMO

Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites-a nonadaptive phenomenon referred to as the "evolutionary Stokes shift." Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. By examining propensity shifts from when an amino acid is first accepted at a site until it is subsequently replaced, we find that ≈50% of sites show a decrease in the propensity for the newly resident amino acid while the remaining sites show an increase. Furthermore, the distributions of the magnitudes of positive and negative shifts were comparable. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.


Assuntos
Aminoácidos , Evolução Molecular , Substituição de Aminoácidos , Aminoácidos/química , Aminoácidos/genética , Epistasia Genética , Proteínas/genética , Termodinâmica
7.
Nat Ecol Evol ; 6(3): 253-262, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35027725

RESUMO

Determining the phylogenetic origin of mitochondria is key to understanding the ancestral mitochondrial symbiosis and its role in eukaryogenesis. However, the precise evolutionary relationship between mitochondria and their closest bacterial relatives remains hotly debated. The reasons include pervasive phylogenetic artefacts as well as limited protein and taxon sampling. Here we developed a new model of protein evolution that accommodates both across-site and across-branch compositional heterogeneity. We applied this site-and-branch-heterogeneous model (MAM60 + GFmix) to a considerably expanded dataset that comprises 108 mitochondrial proteins of alphaproteobacterial origin, and novel metagenome-assembled genomes from microbial mats, microbialites and sediments. The MAM60 + GFmix model fits the data much better and agrees with analyses of compositionally homogenized datasets with conventional site-heterogenous models. The consilience of evidence thus suggests that mitochondria are sister to the Alphaproteobacteria to the exclusion of MarineProteo1 and Magnetococcia. We also show that the ancestral presence of the crista-developing mitochondrial contact site and cristae organizing system (a mitofilin-domain-containing Mic60 protein) in mitochondria and the Alphaproteobacteria only supports their close relationship.


Assuntos
Alphaproteobacteria , Alphaproteobacteria/genética , Alphaproteobacteria/metabolismo , Metagenoma , Mitocôndrias/genética , Mitocôndrias/metabolismo , Proteínas Mitocondriais , Filogenia
8.
Sci Adv ; 7(32)2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34362734

RESUMO

Micronutrients control phytoplankton growth in the ocean, influencing carbon export and fisheries. It is currently unclear how micronutrient scarcity affects cellular processes and how interdependence across micronutrients arises. We show that proximate causes of micronutrient growth limitation and interdependence are governed by cumulative cellular costs of acquiring and using micronutrients. Using a mechanistic proteomic allocation model of a polar diatom focused on iron and manganese, we demonstrate how cellular processes fundamentally underpin micronutrient limitation, and how they interact and compensate for each other to shape cellular elemental stoichiometry and resource interdependence. We coupled our model with metaproteomic and environmental data, yielding an approach for estimating biogeochemical metrics, including taxon-specific growth rates. Our results show that cumulative cellular costs govern how environmental conditions modify phytoplankton growth.

9.
Protein Sci ; 30(10): 2009-2028, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34322924

RESUMO

Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.


Assuntos
Aminoácidos , Evolução Molecular , Modelos Genéticos , Filogenia , Proteínas , Aminoácidos/química , Aminoácidos/genética , Aminoácidos/metabolismo , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
10.
J Theor Biol ; 526: 110788, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34097914

RESUMO

Two recent high profile studies have attempted to use edge (branch) length ratios from large sets of phylogenetic trees to determine the relative ages of genes of different origins in the evolution of eukaryotic cells. This approach can be straightforwardly justified if substitution rates are constant over the tree for a given protein. However, such strict molecular clock assumptions are not expected to hold on the billion-year timescale. Here we propose an alternative set of conditions under which comparisons of edge length distributions from multiple sets of phylogenies of proteins with different origins can be validly used to discern the order of their origins. We also point out scenarios where these conditions are not expected to hold and caution is warranted.


Assuntos
Células Eucarióticas , Evolução Molecular , Modelos Genéticos , Filogenia
11.
Curr Biol ; 31(4): R193-R196, 2021 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-33621507

RESUMO

Timing the events in the evolution of eukaryotic cells is crucial to understanding this major transition. A recent study reconstructs the origins of thousands of gene families ancestral to eukaryotes and, using a controversial approach, aims to order the events of eukaryogenesis.


Assuntos
Eucariotos , Células Eucarióticas , Evolução Molecular , Filogenia , Eucariotos/genética , Humanos , Fatores de Tempo
12.
Syst Biol ; 70(4): 838-843, 2021 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-33528562

RESUMO

Long branch attraction (LBA) is a prevalent form of bias in phylogenetic estimation but the reasons for it are only partially understood. We argue here that it is largely due to differences in the sizes of the model spaces corresponding to different trees. Trees with long branches together allow much more flexible internal branch length parameter estimation. Consequently, although each tree has the same number of parameters, trees with long branches together have larger effective model spaces. The problem of LBA becomes particularly pronounced with partitioned data. Formulation of tree estimation as model selection leads us to propose bootstrap bias corrections as cross-checks on estimation when long branches end up being estimated together. [Bootstrap; long branch attraction; maximum likelihood; model selection; partitioned model; phylogenetics.].


Assuntos
Modelos Genéticos , Viés , Simulação por Computador , Funções Verossimilhança , Filogenia
13.
Mol Biol Evol ; 37(11): 3131-3148, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32897316

RESUMO

Do interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.


Assuntos
Epistasia Genética , Evolução Molecular , Modelos Genéticos , Mutação , Seleção Genética
14.
Syst Biol ; 69(4): 722-738, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-31730199

RESUMO

A central objective in biology is to link adaptive evolution in a gene to structural and/or functional phenotypic novelties. Yet most analytic methods make inferences mainly from either phenotypic data or genetic data alone. A small number of models have been developed to infer correlations between the rate of molecular evolution and changes in a discrete or continuous life history trait. But such correlations are not necessarily evidence of adaptation. Here, we present a novel approach called the phenotype-genotype branch-site model (PG-BSM) designed to detect evidence of adaptive codon evolution associated with discrete-state phenotype evolution. An episode of adaptation is inferred under standard codon substitution models when there is evidence of positive selection in the form of an elevation in the nonsynonymous-to-synonymous rate ratio $\omega$ to a value $\omega > 1$. As it is becoming increasingly clear that $\omega > 1$ can occur without adaptation, the PG-BSM was formulated to infer an instance of adaptive evolution without appealing to evidence of positive selection. The null model makes use of a covarion-like component to account for general heterotachy (i.e., random changes in the evolutionary rate at a site over time). The alternative model employs samples of the phenotypic evolutionary history to test for phenomenological patterns of heterotachy consistent with specific mechanisms of molecular adaptation. These include 1) a persistent increase/decrease in $\omega$ at a site following a change in phenotype (the pattern) consistent with an increase/decrease in the functional importance of the site (the mechanism); and 2) a transient increase in $\omega$ at a site along a branch over which the phenotype changed (the pattern) consistent with a change in the site's optimal amino acid (the mechanism). Rejection of the null is followed by post hoc analyses to identify sites with strongest evidence for adaptation in association with changes in the phenotype as well as the most likely evolutionary history of the phenotype. Simulation studies based on a novel method for generating mechanistically realistic signatures of molecular adaptation show that the PG-BSM has good statistical properties. Analyses of real alignments show that site patterns identified post hoc are consistent with the specific mechanisms of adaptation included in the alternate model. Further simulation studies show that the covarion-like component of the PG-BSM plays a crucial role in mitigating recently discovered statistical pathologies associated with confounding by accounting for heterotachy-by-any-cause. [Adaptive evolution; branch-site model; confounding; mutation-selection; phenotype-genotype.].


Assuntos
Classificação/métodos , Códon/genética , Genótipo , Fenótipo , Filogenia , Adaptação Fisiológica/genética , Simulação por Computador
15.
Mol Biol Evol ; 37(2): 549-562, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31688943

RESUMO

The information criteria Akaike information criterion (AIC), AICc, and Bayesian information criterion (BIC) are widely used for model selection in phylogenetics, however, their theoretical justification and performance have not been carefully examined in this setting. Here, we investigate these methods under simple and complex phylogenetic models. We show that AIC can give a biased estimate of its intended target, the expected predictive log likelihood (EPLnL) or, equivalently, expected Kullback-Leibler divergence between the estimated model and the true distribution for the data. Reasons for bias include commonly occurring issues such as small edge-lengths or, in mixture models, small weights. The use of partitioned models is another issue that can cause problems with information criteria. We show that for partitioned models, a different BIC correction is required for it to be a valid approximation to a Bayes factor. The commonly used AICc correction is not clearly defined in partitioned models and can actually create a substantial bias when the number of parameters gets large as is the case with larger trees and partitioned models. Bias-corrected cross-validation corrections are shown to provide better approximations to EPLnL than AIC. We also illustrate how EPLnL, the estimation target of AIC, can sometimes favor an incorrect model and give reasons for why selection of incorrectly under-partitioned models might be desirable in partitioned model settings.


Assuntos
Biologia Computacional/métodos , Filogenia , Algoritmos , Teorema de Bayes , Funções Verossimilhança , Modelos Genéticos , Seleção Genética
16.
Methods Mol Biol ; 1910: 399-426, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278672

RESUMO

Codon substitution models (CSMs) are commonly used to infer the history of natural section for a set of protein-coding sequences, often with the explicit goal of detecting the signature of positive Darwinian selection. However, the validity and success of CSMs used in conjunction with the maximum likelihood (ML) framework is sometimes challenged with claims that the approach might too often support false conclusions. In this chapter, we use a case study approach to identify four legitimate statistical difficulties associated with inference of evolutionary events using CSMs. These include: (1) model misspecification, (2) low information content, (3) the confounding of processes, and (4) phenomenological load, or PL. While past criticisms of CSMs can be connected to these issues, the historical critiques were often misdirected, or overstated, because they failed to recognize that the success of any model-based approach depends on the relationship between model and data. Here, we explore this relationship and provide a candid assessment of the limitations of CSMs to extract historical information from extant sequences. To aid in this assessment, we provide a brief overview of: (1) a more realistic way of thinking about the process of codon evolution framed in terms of population genetic parameters, and (2) a novel presentation of the ML statistical framework. We then divide the development of CSMs into two broad phases of scientific activity and show that the latter phase is characterized by increases in model complexity that can sometimes negatively impact inference of evolutionary mechanisms. Such problems are not yet widely appreciated by the users of CSMs. These problems can be avoided by using a model that is appropriate for the data; but, understanding the relationship between the data and a fitted model is a difficult task. We argue that the only way to properly understand that relationship is to perform in silico experiments using a generating process that can mimic the data as closely as possible. The mutation-selection modeling framework (MutSel) is presented as the basis of such a generating process. We contend that if complex CSMs continue to be developed for testing explicit mechanistic hypotheses, then additional analyses such as those described in here (e.g., penalized LRTs and estimation of PL) will need to be applied alongside the more traditional inferential methods.


Assuntos
Evolução Molecular , Genoma , Genômica , Modelos Genéticos , Algoritmos , Códon , Biologia Computacional/métodos , Variação Genética , Genética Populacional , Genômica/métodos , Humanos , Reprodutibilidade dos Testes , Seleção Genética
17.
Syst Biol ; 68(6): 1003-1019, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31140564

RESUMO

Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.


Assuntos
Classificação/métodos , Modelos Teóricos , Filogenia , Simulação por Computador , Microsporídios/classificação , Microsporídios/genética
18.
Elife ; 82019 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-30789345

RESUMO

The Alphaproteobacteria is an extraordinarily diverse and ancient group of bacteria. Previous attempts to infer its deep phylogeny have been plagued with methodological artefacts. To overcome this, we analyzed a dataset of 200 single-copy and conserved genes and employed diverse strategies to reduce compositional artefacts. Such strategies include using novel dataset-specific profile mixture models and recoding schemes, and removing sites, genes and taxa that are compositionally biased. We show that the Rickettsiales and Holosporales (both groups of intracellular parasites of eukaryotes) are not sisters to each other, but instead, the Holosporales has a derived position within the Rhodospirillales. A synthesis of our results also leads to an updated proposal for the higher-level taxonomy of the Alphaproteobacteria. Our robust consensus phylogeny will serve as a framework for future studies that aim to place mitochondria, and novel environmental diversity, within the Alphaproteobacteria.


Assuntos
Alphaproteobacteria/classificação , Alphaproteobacteria/genética , Evolução Molecular , Filogenia , Biologia Computacional , Genes Bacterianos , Biologia Molecular
19.
Bioinformatics ; 35(15): 2545-2554, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30541063

RESUMO

MOTIVATION: Likelihood ratio tests are commonly used to test for positive selection acting on proteins. They are usually applied with thresholds for declaring a protein under positive selection determined from a chi-square or mixture of chi-square distributions. Although it is known that such distributions are not strictly justified due to the statistical irregularity of the problem, the hope has been that the resulting tests are conservative and do not lose much power in comparison with the same test using the unknown, correct threshold. We show that commonly used thresholds need not yield conservative tests, but instead give larger than expected Type I error rates. Statistical regularity can be restored by using a modified likelihood ratio test. RESULTS: We give theoretical results to prove that, if the number of sites is not too small, the modified likelihood ratio test gives approximately correct Type I error probabilities regardless of the parameter settings of the underlying null hypothesis. Simulations show that modification gives Type I error rates closer to those stated without a loss of power. The simulations also show that parameter estimation for mixture models of codon evolution can be challenging in certain data-generation settings with very different mixing distributions giving nearly identical site pattern distributions unless the number of taxa and tree length are large. Because mixture models are widely used for a variety of problems in molecular evolution, the challenges and general approaches to solving them presented here are applicable in a broader context. AVAILABILITY AND IMPLEMENTATION: https://github.com/jehops/codeml_modl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biometria , Distribuição de Qui-Quadrado , Evolução Molecular , Funções Verossimilhança
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA