Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Biol Evol ; 7(8): 2102-16, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-26139831

RESUMO

Evolutionary studies usually use a two-step process to investigate sequence data. Step one estimates a multiple sequence alignment (MSA) and step two applies phylogenetic methods to ask evolutionary questions of that MSA. Modern phylogenetic methods infer evolutionary parameters using maximum likelihood or Bayesian inference, mediated by a probabilistic substitution model that describes sequence change over a tree. The statistical properties of these methods mean that more data directly translates to an increased confidence in downstream results, providing the substitution model is adequate and the MSA is correct. Many studies have investigated the robustness of phylogenetic methods in the presence of substitution model misspecification, but few have examined the statistical properties of those methods when the MSA is unknown. This simulation study examines the statistical properties of the complete two-step process when inferring sequence divergence and the phylogenetic tree topology. Both nucleotide and amino acid analyses are negatively affected by the alignment step, both through inaccurate guide tree estimates and through overfitting to that guide tree. For many alignment tools these effects become more pronounced when additional sequences are added to the analysis. Nucleotide sequences are particularly susceptible, with MSA errors leading to statistical support for long-branch attraction artifacts, which are usually associated with gross substitution model misspecification. Amino acid MSAs are more robust, but do tend to arbitrarily resolve multifurcations in favor of the guide tree. No inference strategies produce consistently accurate estimates of divergence between sequences, although amino acid MSAs are again more accurate than their nucleotide counterparts. We conclude with some practical suggestions about how to limit the effect of MSA uncertainty on evolutionary inference.


Assuntos
Filogenia , Alinhamento de Sequência/métodos , Artefatos , Modelos Estatísticos , Incerteza
2.
Syst Biol ; 64(1): 42-55, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25209223

RESUMO

Molecular phylogenetics is a powerful tool for inferring both the process and pattern of evolution from genomic sequence data. Statistical approaches, such as maximum likelihood and Bayesian inference, are now established as the preferred methods of inference. The choice of models that a researcher uses for inference is of critical importance, and there are established methods for model selection conditioned on a particular type of data, such as nucleotides, amino acids, or codons. A major limitation of existing model selection approaches is that they can only compare models acting upon a single type of data. Here, we extend model selection to allow comparisons between models describing different types of data by introducing the idea of adapter functions, which project aggregated models onto the originally observed sequence data. These projections are implemented in the program ModelOMatic and used to perform model selection on 3722 families from the PANDIT database, 68 genes from an arthropod phylogenomic data set, and 248 genes from a vertebrate phylogenomic data set. For the PANDIT and arthropod data, we find that amino acid models are selected for the overwhelming majority of alignments; with progressively smaller numbers of alignments selecting codon and nucleotide models, and no families selecting RY-based models. In contrast, nearly all alignments from the vertebrate data set select codon-based models. The sequence divergence, the number of sequences, and the degree of selection acting upon the protein sequences may contribute to explaining this variation in model selection. Our ModelOMatic program is fast, with most families from PANDIT taking fewer than 150 s to complete, and should therefore be easily incorporated into existing phylogenetic pipelines. ModelOMatic is available at https://code.google.com/p/modelomatic/.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Aminoácidos/genética , Animais , Códon/genética , Nucleotídeos/genética , Software
3.
Mol Biol Evol ; 30(3): 642-53, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23144040

RESUMO

Multiple sequence alignment (MSA) is the heart of comparative sequence analysis. Recent studies demonstrate that MSA algorithms can produce different outcomes when analyzing genomes, including phylogenetic tree inference and the detection of adaptive evolution. These studies also suggest that the difference between MSA algorithms is of a similar order to the uncertainty within an algorithm and suggest integrating across this uncertainty. In this study, we examine further the problem of disagreements between MSA algorithms and how they affect downstream analyses. We also investigate whether integrating across alignment uncertainty affects downstream analyses. We address these questions by analyzing 200 chordate gene families, with properties reflecting those used in large-scale genomic analyses. We find that newly developed distance metrics reveal two significantly different classes of MSA methods (MSAMs). The similarity-based class includes progressive aligners and consistency aligners, representing many methodological innovations for sequence alignment, whereas the evolution-based class includes phylogenetically aware alignment and statistical alignment. We proceed to show that the class of an MSAM has a substantial impact on downstream analyses. For phylogenetic inference, tree estimates and their branch lengths appear highly dependent on the class of aligner used. The number of families, and the sites within those families, inferred to have undergone adaptive evolution depend on the class of aligner used. Similarity-based aligners tend to identify more adaptive evolution. We also develop and test methods for incorporating MSA uncertainty when detecting adaptive evolution but find that although accounting for MSA uncertainty does affect downstream analyses, it appears less important than the class of aligner chosen. Our results demonstrate the critical role that MSA methodology has on downstream analysis, highlighting that the class of aligner chosen in an analysis has a demonstrable effect on its outcome.


Assuntos
Algoritmos , Modelos Genéticos , Alinhamento de Sequência/métodos , Adaptação Biológica/genética , Teorema de Bayes , Evolução Molecular , Genoma Humano , Humanos , Funções Verossimilhança , Cadeias de Markov , Método de Monte Carlo , Filogenia , Seleção Genética , Análise de Sequência de DNA/métodos
4.
Bioinformatics ; 28(4): 495-502, 2012 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-22199391

RESUMO

MOTIVATION: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. RESULTS: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. AVAILABILITY: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/.


Assuntos
Filogenia , Alinhamento de Sequência/métodos , Software , Computadores , Mutação INDEL , Proteínas/química , Proteínas/genética
5.
Mol Biol Evol ; 28(1): 449-58, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20724379

RESUMO

There is widespread evidence of lineage-specific rate variation, known as heterotachy, during protein evolution. Changes in the structural and functional constraints acting on a protein can lead to heterotachy, and it is plausible that such changes, known as covarion shifts, may affect many amino acids at once. Several previous attempts to model heterotachy have used covarion models, where the sequence undergoes covarion drift, whereby each site may switch independently among a set of discrete classes having different substitution rates. However, such independent switching may not capture biologically important events where the selective forces acting on a protein affect many sites at once. We describe a new class of models that allow the rates of substitution and switching to vary among branches of a phylogenetic tree. Such models are better able to handle covarion shifts. We apply these models to a set of genes occurring in nonphotosynthetic bacteria, cyanobacteria, and the plastids of green and red algae. We find that 4/5 genes show evidence of some form of rate switching and that 3/5 genes show evidence that the relative switching rate differs among taxonomic groups. We conclude that covarion shifts may be frequent during the deep evolution of plastid genes and that our methodology may provide a powerful new tool for investigating such shifts in other systems.


Assuntos
Evolução Biológica , Variação Genética , Modelos Genéticos , Filogenia , Plastídeos/genética , Algoritmos , Sequência de Bases , Clorófitas/citologia , Clorófitas/genética , Simulação por Computador , Dados de Sequência Molecular , Proteínas/química , Proteínas/genética , Rodófitas/citologia , Rodófitas/genética , Alinhamento de Sequência
6.
PLoS Pathog ; 4(5): e1000058, 2008 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-18451985

RESUMO

The rapid evolution of influenza viruses presents difficulties in maintaining the optimal efficiency of vaccines. Amino acid substitutions result in antigenic drift, a process whereby antisera raised in response to one virus have reduced effectiveness against future viruses. Interestingly, while amino acid substitutions occur at a relatively constant rate, the antigenic properties of H3 move in a discontinuous, step-wise manner. It is not clear why this punctuated evolution occurs, whether this represents simply the fact that some substitutions affect these properties more than others, or if this is indicative of a changing relationship between the virus and the host. In addition, the role of changing glycosylation of the haemagglutinin in these shifts in antigenic properties is unknown. We analysed the antigenic drift of HA1 from human influenza H3 using a model of sequence change that allows for variation in selective pressure at different locations in the sequence, as well as at different parts of the phylogenetic tree. We detect significant changes in selective pressure that occur preferentially during major changes in antigenic properties. Despite the large increase in glycosylation during the past 40 years, changes in glycosylation did not correlate either with changes in antigenic properties or with significantly more rapid changes in selective pressure. The locations that undergo changes in selective pressure are largely in places undergoing adaptive evolution, in antigenic locations, and in locations or near locations undergoing substitutions that characterise the change in antigenicity of the virus. Our results suggest that the relationship of the virus to the host changes with time, with the shifts in antigenic properties representing changes in this relationship. This suggests that the virus and host immune system are evolving different methods to counter each other. While we are able to characterise the rapid increase in glycosylation of the haemagglutinin during time in human influenza H3, an increase not present in influenza in birds, this increase seems unrelated to the observed changes in antigenic properties.


Assuntos
Variação Antigênica/genética , Evolução Molecular , Deriva Genética , Vírus da Influenza A/genética , Influenza Humana/virologia , Seleção Genética , Animais , Variação Antigênica/imunologia , Antígenos Virais/imunologia , Células COS , Fusão Celular , Chlorocebus aethiops , DNA Viral/genética , Células HeLa , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Humanos , Vírus da Influenza A/imunologia , Vírus da Influenza A/patogenicidade , Influenza Humana/genética , Influenza Humana/imunologia , Leucócitos Mononucleares/imunologia , Leucócitos Mononucleares/virologia , Macrófagos/imunologia , Macrófagos/virologia
7.
PLoS Comput Biol ; 2(6): e69, 2006 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-16789817

RESUMO

The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of "ancestral sequences" inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a "best guess" amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.


Assuntos
Algoritmos , Evolução Molecular , Proteínas/química , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência Conservada , Variação Genética/genética , Proteínas/classificação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Homologia de Sequência de Aminoácidos
8.
J Chem Phys ; 123(15): 154907, 2005 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-16252972

RESUMO

In order to probe the fundamental principles that govern protein evolution, we use a minimalist model of proteins to provide a mapping from genotype to phenotype. The model is based on physically realistic forces of protein folding and includes an explicit definition of protein function. Thus, we can find the fitness of a sequence from its ability to fold to a stable structure and perform a function. We study the fitness landscapes of these functional model proteins, that is, the set of all sequences mapped on to their corresponding fitnesses and connected to their one mutant neighbors. Through population dynamics simulations we directly study the influence of the nature of the fitness landscape on evolution. Populations are observed to move to a steady state, the distribution of which can often be predicted prior to the population dynamics simulations from the nature of the fitness landscape and a quantity analogous to a partition function. In this paper, we develop a scheme for predicting the steady-state population on a fitness landscape, based on the nature of the fitness landscape, thereby obviating the need for explicit population dynamics simulations and providing some insight into the impact on molecular evolution of the nature of fitness landscapes. Poor predictions are indicative of fitness landscapes that consist of a series of weakly connected sublandscapes.


Assuntos
Simulação por Computador , Evolução Molecular , Modelos Teóricos , Proteínas/química , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...