Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Syst Biol ; 68(6): 1003-1019, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31140564

RESUMO

Large taxa-rich genome-scale data sets are often necessary for resolving ancient phylogenetic relationships. But accurate phylogenetic inference requires that they are analyzed with realistic models that account for the heterogeneity in substitution patterns amongst the sites, genes and lineages. Two kinds of adjustments are frequently used: models that account for heterogeneity in amino acid frequencies at sites in proteins, and partitioned models that accommodate the heterogeneity in rates (branch lengths) among different proteins in different lineages (protein-wise heterotachy). Although partitioned and site-heterogeneous models are both widely used in isolation, their relative importance to the inference of correct phylogenies has not been carefully evaluated. We conducted several empirical analyses and a large set of simulations to compare the relative performances of partitioned models, site-heterogeneous models, and combined partitioned site heterogeneous models. In general, site-homogeneous models (partitioned or not) performed worse than site heterogeneous, except in simulations with extreme protein-wise heterotachy. Furthermore, simulations using empirically-derived realistic parameter settings showed a marked long-branch attraction (LBA) problem for analyses employing protein-wise partitioning even when the generating model included partitioning. This LBA problem results from a small sample bias compounded over many single protein alignments. In some cases, this problem was ameliorated by clustering similarly-evolving proteins together into larger partitions using the PartitionFinder method. Similar results were obtained under simulations with larger numbers of taxa or heterogeneity in simulating topologies over genes. For an empirical Microsporidia test data set, all but one tested site-heterogeneous models (with or without partitioning) obtain the correct Microsporidia+Fungi grouping, whereas site-homogenous models (with or without partitioning) did not. The single exception was the fully partitioned site-heterogeneous analysis that succumbed to the compounded small sample LBA bias. In general unless protein-wise heterotachy effects are extreme, it is more important to model site-heterogeneity than protein-wise heterotachy in phylogenomic analyses. Complete protein-wise partitioning should be avoided as it can lead to a serious LBA bias. In cases of extreme protein-wise heterotachy, approaches that cluster similarly-evolving proteins together and coupled with site-heterogeneous models work well for phylogenetic estimation.


Assuntos
Classificação/métodos , Modelos Teóricos , Filogenia , Simulação por Computador , Microsporídios/classificação , Microsporídios/genética
2.
Syst Biol ; 67(2): 216-235, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28950365

RESUMO

Proteins have distinct structural and functional constraints at different sites that lead to site-specific preferences for particular amino acid residues as the sequences evolve. Heterogeneity in the amino acid substitution process between sites is not modeled by commonly used empirical amino acid exchange matrices. Such model misspecification can lead to artefacts in phylogenetic estimation such as long-branch attraction. Although sophisticated site-heterogeneous mixture models have been developed to address this problem in both Bayesian and maximum likelihood (ML) frameworks, their formidable computational time and memory usage severely limits their use in large phylogenomic analyses. Here we propose a posterior mean site frequency (PMSF) method as a rapid and efficient approximation to full empirical profile mixture models for ML analysis. The PMSF approach assigns a conditional mean amino acid frequency profile to each site calculated based on a mixture model fitted to the data using a preliminary guide tree. These PMSF profiles can then be used for in-depth tree-searching in place of the full mixture model. Compared with widely used empirical mixture models with $k$ classes, our implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the computation by approximately $k$/1.5-fold and requires a small fraction of the RAM. Furthermore, this speedup allows, for the first time, full nonparametric bootstrap analyses to be conducted under complex site-heterogeneous models on large concatenated data matrices. Our simulations and empirical data analyses demonstrate that PMSF can effectively ameliorate long-branch attraction artefacts. In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Substituição de Aminoácidos , Simulação por Computador , Evolução Molecular , Estatísticas não Paramétricas
3.
Mol Phylogenet Evol ; 105: 114-125, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27568211

RESUMO

Assessing the robustness of an inferred phylogeny is an important element of phylogenetics. This is typically done with measures of stabilities at the internal branches and the variation of the positions of the leaf nodes. The bootstrap support for branches in maximum parsimony, distance and maximum likelihood estimation, or posterior probabilities in Bayesian inference, measure the uncertainty about a branch due to the sampling of the sites from genes or sampling genes from genomes. However, these measures do not reveal how taxon sampling affects branch support and the effects of taxon sampling on the estimated phylogeny. An internal branch in a phylogenetic tree can be viewed as a split that separates the taxa into two nonempty complementary subsets. We develop several split-specific measures of stability determined from bootstrap support for quartets. These include BPtaxon_split (average bootstrap percentage [BP] for all quartets involving a taxon within a split), BPsplit (BPtaxon_split averaged over taxa), BPtaxon (BPtaxon_split averaged over splits) and RBIC-taxon (average BP over all splits after removing a taxon). We also develop a pruned-tree distance metric. Application of our measures to empirical and simulated data illustrate that existing measures of overall stability can fail to detect taxa that are the primary source of a split-specific instability. Moreover, we show that the use of many reduced sets of quartets is important in being able to detect the influence of joint sets of taxa rather than individual taxa. These new measures are valuable diagnostic tools to guide taxon sampling in phylogenetic experimental design.


Assuntos
Código de Barras de DNA Taxonômico/métodos , Filogenia , Sequência de Bases , Simulação por Computador , Funções Verossimilhança
4.
Mol Biol Evol ; 31(4): 779-92, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24441033

RESUMO

Standard protein phylogenetic models use fixed rate matrices of amino acid interchange derived from analyses of large databases. Differences between the stationary amino acid frequencies of these rate matrices from those of a data set of interest are typically adjusted for by matrix multiplication that converts the empirical rate matrix to an exchangeability matrix which is then postmultiplied by the amino acid frequencies in the alignment. The result is a time-reversible rate matrix with stationary amino acid frequencies equal to the data set frequencies. On the basis of population genetics principles, we develop an amino acid substitution-selection model that parameterizes the fitness of an amino acid as the logarithm of the ratio of the frequency of the amino acid to the frequency of the same amino acid under no selection. The model gives rise to a different sequence of matrix multiplications to convert an empirical rate matrix to one that has stationary amino acid frequencies equal to the data set frequencies. We incorporated the substitution-selection model with an improved amino acid class frequency mixture (cF) model to partially take into account site-specific amino acid frequencies in the phylogenetic models. We show that 1) the selection models fit data significantly better than corresponding models without selection for most of the 21 test data sets; 2) both cF and cF selection models favored the phylogenetic trees that were inferred under current sophisticated models and methods for three difficult phylogenetic problems (the positions of microsporidia and breviates in eukaryote phylogeny and the position of the root of the angiosperm tree); and 3) for data simulated under site-specific residue frequencies, the cF selection models estimated trees closer to the generating trees than a standard Г model or cF without selection. We also explored several ways of estimating amino acid frequencies under neutral evolution that are required for these selection models. By better modeling the amino acid substitution process, the cF selection models will be valuable for phylogenetic inference and evolutionary studies.


Assuntos
Evolução Molecular , Modelos Genéticos , Algoritmos , Substituição de Aminoácidos , Cloroplastos/genética , Simulação por Computador , Genes de Plantas , Funções Verossimilhança , Magnoliopsida/genética , Filogenia , Proteínas de Plantas/genética
5.
J Mol Evol ; 76(5): 280-94, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23595859

RESUMO

The strength and direction of selection on the identity of an amino acid residue in a protein is typically measured by the ratio of the rate of non-synonymous substitutions to the rate of synonymous substitutions. In attempting to predict positively selected sites from amino acid alignments, we made the unexpected observation that the site likelihood of an alignment column for a given tree tends to be negatively correlated with the posterior probability that site is in the positive selection class under widely-used codon models. This is likely because positively selected sites tend to be more variable and display more "radical" amino acid changes; both of these features are expected to result in low site log-likelihoods. We explored the efficacy of using the site log-likelihood (SLL) score as a predictor for positive selection. Through simulation we show that a SLL-based test has a low false positive rate and comparable power as the codon models. In one case where the simulated data violated the assumption that synonymous substitution rates were constant across the sites, the codon models were not able to detect positive selection in the data while the SLL test did. We applied the new method to ten empirical datasets and found that it made similar predictions as the codon models in eight of them. For the tax gene dataset the SLL test seemed to produce more reasonable results. The SLL methods are a valuable complement to codon models, especially for some cases where the assumptions of codon models are likely violated.


Assuntos
Aminoácidos/genética , Códon , Modelos Genéticos , Seleção Genética , Álcool Desidrogenase/genética , Substituição de Aminoácidos , Simulação por Computador , Proteínas de Drosophila/genética , Vírus da Encefalite Japonesa (Espécie)/genética , Evolução Molecular , Flavivirus/genética , HIV/genética , Funções Verossimilhança , Filogenia , Proteínas Virais/genética , Globinas beta/genética
6.
Mol Biol Evol ; 28(8): 2305-15, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21343603

RESUMO

The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites.


Assuntos
Evolução Molecular , Modelos Estatísticos , Proteínas/química , Substituição de Aminoácidos , Simulação por Computador
7.
BMC Evol Biol ; 9: 225, 2009 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-19737395

RESUMO

BACKGROUND: The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference. RESULTS: PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors. CONCLUSION: Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Funções Verossimilhança , Dados de Sequência Molecular , Alinhamento de Sequência , Software
8.
BMC Evol Biol ; 8: 331, 2008 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-19087270

RESUMO

BACKGROUND: Widely used substitution models for proteins, such as the Jones-Taylor-Thornton (JTT) or Whelan and Goldman (WAG) models, are based on empirical amino acid interchange matrices estimated from databases of protein alignments that incorporate the average amino acid frequencies of the data set under examination (e.g JTT + F). Variation in the evolutionary process between sites is typically modelled by a rates-across-sites distribution such as the gamma (Gamma) distribution. However, sites in proteins also vary in the kinds of amino acid interchanges that are favoured, a feature that is ignored by standard empirical substitution matrices. Here we examine the degree to which the pattern of evolution at sites differs from that expected based on empirical amino acid substitution models and evaluate the impact of these deviations on phylogenetic estimation. RESULTS: We analyzed 21 large protein alignments with two statistical tests designed to detect deviation of site-specific amino acid distributions from data simulated under the standard empirical substitution model: JTT+ F + Gamma. We found that the number of states at a given site is, on average, smaller and the frequencies of these states are less uniform than expected based on a JTT + F + Gamma substitution model. With a four-taxon example, we show that phylogenetic estimation under the JTT + F + Gamma model is seriously biased by a long-branch attraction artefact if the data are simulated under a model utilizing the observed site-specific amino acid frequencies from an alignment. Principal components analyses indicate the existence of at least four major site-specific frequency classes in these 21 protein alignments. Using a mixture model with these four separate classes of site-specific state frequencies plus a fifth class of global frequencies (the JTT + cF + Gamma model), significant improvements in model fit for real data sets can be achieved. This simple mixture model also reduces the long-branch attraction problem, as shown by simulations and analyses of a real phylogenomic data set. CONCLUSION: Protein families display site-specific evolutionary dynamics that are ignored by standard protein phylogenetic models. Accurate estimation of protein phylogenies requires models that accommodate the heterogeneity in the evolutionary process across sites. To this end, we have implemented a class frequency mixture model (cF) in a freely available program called QmmRAxML for phylogenetic estimation.


Assuntos
Modelos Genéticos , Modelos Estatísticos , Filogenia , Alinhamento de Sequência , Análise de Sequência de Proteína , Algoritmos , Substituição de Aminoácidos , Simulação por Computador , Bases de Dados de Proteínas , Análise de Componente Principal
9.
BMC Evol Biol ; 7 Suppl 1: S6, 2007 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-17288579

RESUMO

BACKGROUND: Synonymous codon usage varies widely between genomes, and also between genes within genomes. Although there is now a large body of data on variations in codon usage, it is still not clear if the observed patterns reflect the effects of positive Darwinian selection acting at the level of translational efficiency or whether these patterns are due simply to the effects of mutational bias. In this study, we have included both intra-genomic and inter-genomic comparisons of codon usage. This allows us to distinguish more efficiently between the effects of nucleotide bias and translational selection. RESULTS: We show that there is an extreme degree of heterogeneity in codon usage patterns within the rice genome, and that this heterogeneity is highly correlated with differences in nucleotide content (particularly GC content) between the genes. In contrast to the situation observed within the rice genome, Arabidopsis genes show relatively little variation in both codon usage and nucleotide content. By exploiting a combination of intra-genomic and inter-genomic comparisons, we provide evidence that the differences in codon usage among the rice genes reflect a relatively rapid evolutionary increase in the GC content of some rice genes. We also noted that the degree of codon bias was negatively correlated with gene length. CONCLUSION: Our results show that mutational bias can cause a dramatic evolutionary divergence in codon usage patterns within a period of approximately two hundred million years. The heterogeneity of codon usage patterns within the rice genome can be explained by a balance between genome-wide mutational biases and negative selection against these biased mutations. The strength of the negative selection is proportional to the length of the coding sequences. Our results indicate that the large variations in synonymous codon usage are not related to selection acting on the translational efficiency of synonymous codons.


Assuntos
Códon , Código Genético , Genoma de Planta , Oryza/genética , Arabidopsis/genética , Composição de Bases , Genes de Plantas , Variação Genética , Homologia de Sequência do Ácido Nucleico
10.
Nucleic Acids Res ; 30(11): 2501-7, 2002 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-12034839

RESUMO

Previous studies have shown that the guanine plus cytosine (G+C) content of ribosomal RNAs (rRNAs) is highly correlated with bacterial growth temperatures. This correlation is strongest in the double-stranded stem regions of the rRNA, a fact that can be explained by selection for increased structural stability at high growth temperatures. In this study, we examined the single-stranded regions of 16S rRNAs. We reasoned that, since these regions of the molecule are subject to less structural constraint than the stem regions, their nucleotide content might simply reflect the overall nucleotide content of the genome. Contrary to this expectation, however, we found that all of the single-stranded regions are characterized by very high adenine (A) and relatively low cytosine (C) contents. Moreover, the nucleotide content of these single-stranded regions is surprisingly constant between species, despite dramatic differences in optimal growth temperatures, and despite large differences in the overall genomic G+C content. This provides compelling evidence for strong stabilizing selection acting on 16S rRNA single-stranded regions. We found that selection favors purines (A+G), and especially adenine (A), in the single-stranded regions of these rRNAs.


Assuntos
Archaea/genética , Bactérias/genética , Genes de RNAr/genética , RNA Arqueal/genética , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Seleção Genética , Archaea/crescimento & desenvolvimento , Bactérias/crescimento & desenvolvimento , Composição de Bases , Divisão Celular , Bases de Dados de Ácidos Nucleicos , Evolução Molecular , Genes Arqueais/genética , Genes Bacterianos/genética , Genoma Arqueal , Genoma Bacteriano , RNA Arqueal/química , RNA Bacteriano/química , RNA Ribossômico 16S/química , Proteínas Ribossômicas/genética , Temperatura , Termodinâmica
11.
Artigo em Inglês | MEDLINE | ID: mdl-12237685

RESUMO

This paper described a method for the identification of protein through its composition. The basic idea is as follows: A database for lengths, compositions and molecular weights of all known proteins is easily derived from a protein sequence database and by comparing the composition, length and molecular weight of the protein in question with each array of data in the composition database, proteins with similar length, composition and molecular weight may be found; as such a tentative identification for the said protein may be made. In some cases, it can be perfectly accurate to predict the right protein(s) from the composition database. A computer program applying this method was developed and used to search for the compositions of human insulin precursor and other proteins. The results are good, which proves that it is a fast and economic way to identify a protein based on its amino acid composition.

12.
J Mol Evol ; 66(1): 50-60, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18080080

RESUMO

Covarion processes allow changes in evolutionary rates at sites along the branches of a phylogenetic tree. Covarion-like evolution is increasingly recognized as an important mode of protein evolution. Several recent reports suggest that maximum likelihood estimation employing covarion models may support different optimal topologies than estimation using standard rates-across-sites (RAS) models. However, it remains to be demonstrated that ignoring covarion evolution will generally result in topological misestimation. In this study we performed analytical and theoretical studies of limiting distances under the covarion model and four-taxon tree simulations to investigate the extent to which the covarion process impacts on phylogenetic estimation. In particular, we assessed the limits of an RAS model-based maximum likelihood method to recover the phylogenies when the sequence data were simulated under the covarion processes. We find that, when ignored, covarion processes can induce systematic errors in phylogeny reconstruction. Surprisingly, when sequences are evolved under a covarion process but an RAS model is used for estimation, we find that a long branch repel bias occurs.


Assuntos
Modelos Genéticos , Filogenia , Proteínas/classificação , Substituição de Aminoácidos , Funções Verossimilhança , Proteínas/química , Proteínas/genética
13.
Mol Biol Evol ; 24(1): 294-305, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17056642

RESUMO

The covarion hypothesis of molecular evolution proposes that selective pressures on an amino acid or nucleotide site change through time, thus causing changes of evolutionary rate along the edges of a phylogenetic tree. Several kinds of Markov models for the covarion process have been proposed. One model, proposed by Huelsenbeck (2002), has 2 substitution rate classes: the substitution process at a site can switch between a single variable rate, drawn from a discrete gamma distribution, and a zero invariable rate. A second model, suggested by Galtier (2001), assumes rate switches among an arbitrary number of rate classes but switching to and from the invariable rate class is not allowed. The latter model allows for some sites that do not participate in the rate-switching process. Here we propose a general covarion model that combines features of both models, allowing evolutionary rates not only to switch between variable and invariable classes but also to switch among different rates when they are in a variable state. We have implemented all 3 covarion models in a maximum likelihood framework for amino acid sequences and tested them on 23 protein data sets. We found significant likelihood increases for all data sets for the 3 models, compared with a model that does not allow site-specific rate switches along the tree. Furthermore, we found that the general model fit the data better than the simpler covarion models in the majority of the cases, highlighting the complexity in modeling the covarion process. The general covarion model can be used for comparing tree topologies, molecular dating studies, and the investigation of protein adaptation.


Assuntos
Evolução Molecular , Modelos Genéticos , Proteínas/química , Proteínas/genética , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Cadeias de Markov
14.
J Mol Evol ; 63(1): 120-6, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16786438

RESUMO

We carried out a comprehensive survey of small subunit ribosomal RNA sequences from archaeal, bacterial, and eukaryotic lineages in order to understand the general patterns of thermal adaptation in the rRNA genes. Within each lineage, we compared sequences from mesophilic, moderately thermophilic, and hyperthermophilic species. We carried out a more detailed study of the archaea, because of the wide range of growth temperatures within this group. Our results confirmed that there is a clear correlation between the GC content of the paired stem regions of the 16S rRNA genes and the optimal growth temperature, and we show that this correlation cannot be explained simply by phylogenetic relatedness among the thermophilic archaeal species. In addition, we found a significant, positive relationship between rRNA stem length and growth temperature. These correlations are found in both bacterial and archaeal rRNA genes. Finally, we compared rRNA sequences from warm-blooded and cold-blooded vertebrates. We found that, while rRNA sequences from the warm-blooded vertebrates have a higher overall GC content than those from the cold-blooded vertebrates, this difference is not concentrated in the paired regions of the molecule, suggesting that thermal adaptation is not the cause of the nucleotide differences between the vertebrate lineages.


Assuntos
Aclimatação/genética , Genes de RNAr , Temperatura Alta , RNA Ribossômico 16S/genética , RNA Ribossômico 18S/genética , Animais , Composição de Bases , Genes Arqueais , Genes Bacterianos , Filogenia , Alinhamento de Sequência , Temperatura , Vertebrados/genética
15.
Biochem Biophys Res Commun ; 342(3): 681-4, 2006 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-16499870

RESUMO

The correlation between genomic G+C content and optimal growth temperature in prokaryotes has gained renewed interest after Musto et al. [H. Musto, H. Naya, A. Zavala, H. Romero, F. Alvarex-Valin, G. Bernardi, Correlations between genomic GC levels and optimal growth temperatures in prokaryotes, FEBS Lett. 573 (2004) 73-77], reported that positive correlations exist in 15 families studied. We have reanalyzed their data and found that when genome size and data quality were adjusted for, there was no significant evidence of relationship between optimal temperature and GC content for two of the families that had previously shown strongly significant correlations. Using updated temperature optima for Halobacteriaceae species we found the correlation is insignificant in this family. For the family Enterobacteriaceae when genome size and optimal temperature are included in a multiple linear regression, only genome size is significant as a predictor of GC content. We showed that more profound statistical methods than simple two factor correlation analysis should be used for analyzing complex intrinsic and extrinsic factors that affect genomic GC content. We further found that a positive correlation between temperature and genomic GC is only evident in free-living species of low optimal growth temperatures.


Assuntos
Composição de Bases/genética , Enterobacteriaceae/crescimento & desenvolvimento , Enterobacteriaceae/genética , Genoma Bacteriano/genética , Células Procarióticas/metabolismo , Temperatura
16.
Mol Biol Evol ; 21(1): 90-6, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-14595101

RESUMO

Amino acid sequences from several thousand homologous gene pairs were compared for two plant genomes, Oryza sativa and Arabidopsis thaliana. The Arabidopsis genes all have similar G+C (guanine plus cytosine) contents, whereas their homologs in rice span a wide range of G+C levels. The results show that those rice genes that display increased divergence in their nucleotide composition (specifically, increased G+C content) showed a corresponding, predictable change in the amino acid compositions of the encoded proteins relative to their Arabidopsis homologs. This trend was not seen in a "control" set of rice genes that had nucleotide contents closer to their Arabidopsis homologs. In addition to showing an overall difference in the amino acid composition of the homologous proteins, we were also able to investigate the biased patterns of amino acid substitution since the divergence of these two species. We found that the amino acid exchange matrix was highly asymmetric when comparing the High G+C rice genes with their Arabidopsis homologs. Finally, we investigated the possible causes of this biased pattern of sequence evolution. Our results indicate that the biased pattern of protein evolution is the consequence, rather than the cause, of the corresponding changes in nucleotide content. In fact, there is an even more marked asymmetry in the patterns of substitution at synonymous nucleotide sites. Surprisingly, there is a very strong negative correlation between the level of nucleotide bias and the length of the coding sequences within the rice genome. This difference in gene length may provide important clues about the underlying mechanisms.


Assuntos
Arabidopsis/genética , Evolução Molecular , Genoma de Planta , Mutação/genética , Oryza/genética , Proteínas de Plantas/genética , Aminoácidos/genética , Composição de Bases , Bases de Dados Genéticas , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA