Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
PLoS One ; 15(3): e0229493, 2020.
Article in English | MEDLINE | ID: mdl-32119689

ABSTRACT

It is standard practice to model site-to-site variability of substitution rates by discretizing a continuous distribution into a small number, K, of equiprobable rate categories. We demonstrate that the variance of this discretized distribution has an upper bound determined solely by the choice of K and the mean of the distribution. This bound can introduce biases into statistical inference, especially when estimating parameters governing site-to-site variability of substitution rates. Applications to two large collections of sequence alignments demonstrate that this upper bound is often reached in analyses of real data. When parameter estimation is of primary interest, additional rate categories or more flexible modeling methods should be considered.


Subject(s)
Amino Acid Substitution , Models, Genetic , Sequence Analysis, DNA/methods , Algorithms , Evolution, Molecular , Likelihood Functions , Mutation Rate , Phylogeny , Sequence Alignment
2.
Mol Biol Evol ; 37(8): 2430-2439, 2020 08 01.
Article in English | MEDLINE | ID: mdl-32068869

ABSTRACT

Most molecular evolutionary studies of natural selection maintain the decades-old assumption that synonymous substitution rate variation (SRV) across sites within genes occurs at levels that are either nonexistent or negligible. However, numerous studies challenge this assumption from a biological perspective and show that SRV is comparable in magnitude to that of nonsynonymous substitution rate variation. We evaluated the impact of this assumption on methods for inferring selection at the molecular level by incorporating SRV into an existing method (BUSTED) for detecting signatures of episodic diversifying selection in genes. Using simulated data we found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates. To evaluate the effect of the SRV assumption on actual inferences we compared results of tests with and without the assumption in an empirical analysis of over 13,000 Euteleostomi (bony vertebrate) gene alignments from the Selectome database. This exercise reveals that close to 50% of positive results (i.e., evidence for selection) in empirical analyses disappear when SRV is modeled as part of the statistical analysis and are thus candidates for being false positives. The results from this work add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.


Subject(s)
Models, Genetic , Selection, Genetic , Silent Mutation , Animals , Phylogeny , Rhodopsin/genetics , Vertebrates/genetics
3.
Mol Biol Evol ; 37(1): 295-299, 2020 Jan 01.
Article in English | MEDLINE | ID: mdl-31504749

ABSTRACT

HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.


Subject(s)
Genetic Techniques , Phylogeny , Software
4.
Mol Biol Evol ; 35(3): 773-777, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29301006

ABSTRACT

Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey.org, and the underlying codebase is available from https://github.com/veg/datamonkey-js.

6.
J Mol Evol ; 73(5-6): 266-72, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22258433

ABSTRACT

While molecular analyses have provided insight into the phylogeny of ciliates, the few studies assessing intraspecific variation have largely relied on just a single locus [e.g., nuclear small subunit rDNA (nSSU-rDNA) or mitochondrial cytochrome oxidase I]. In this study, we characterize the diversity of several nuclear protein-coding genes plus both nSSU-rDNA and mitochondrial small subunit rDNA (mtSSU-rDNA) of five isolates of the ciliate morphospecies Chilodonella uncinata. Although these isolates have nearly identical nSSU-rDNA sequences, they differ by up to 8.0% in mtSSU-rDNA. Comparative analyses of all loci, including ß-tubulin paralogs, indicate a lack of recombination between strains, demonstrating that the morphospecies C. uncinata consists of multiple cryptic species. Further, there is considerable variation in substitution rates among loci as some protein-coding domains are nearly identical between isolates, while others differ by up to 13.2% at the amino acid level. Combining insights on macronuclear variation among isolates, the focus of this study, with published data from the micronucleus of two of these isolates, indicates that C. uncinata lineages are able to maintain both highly divergent and highly conserved genes within a rapidly evolving germline genome.


Subject(s)
Ciliophora/genetics , DNA, Ribosomal/genetics , Evolution, Molecular , Ciliophora/classification , Genome , Mitochondrial Proteins/genetics , Nuclear Proteins/genetics , Phylogeny , Recombination, Genetic/genetics , Species Specificity , Tubulin/genetics
7.
PLoS Comput Biol ; 6(8)2010 Aug 19.
Article in English | MEDLINE | ID: mdl-20808876

ABSTRACT

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.


Subject(s)
Algorithms , Amino Acid Substitution/genetics , Codon , Models, Genetic , Computer Simulation , DNA-Directed DNA Polymerase/genetics , Evolution, Molecular , HIV-1/genetics , Hemagglutinins/genetics , Humans , Markov Chains , Sequence Alignment
8.
PLoS One ; 5(7): e11230, 2010 Jul 30.
Article in English | MEDLINE | ID: mdl-20689581

ABSTRACT

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a "corrected" empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.


Subject(s)
Codon , Models, Statistical , Algorithms , Bias
9.
PLoS One ; 5(7): e11587, 2010 Jul 21.
Article in English | MEDLINE | ID: mdl-20657773

ABSTRACT

The single rate codon model of non-synonymous substitution is ubiquitous in phylogenetic modeling. Indeed, the use of a non-synonymous to synonymous substitution rate ratio parameter has facilitated the interpretation of selection pressure on genomes. Although the single rate model has achieved wide acceptance, we argue that the assumption of a single rate of non-synonymous substitution is biologically unreasonable, given observed differences in substitution rates evident from empirical amino acid models. Some have attempted to incorporate amino acid substitution biases into models of codon evolution and have shown improved model performance versus the single rate model. Here, we show that the single rate model of non-synonymous substitution is easily outperformed by a model with multiple non-synonymous rate classes, yet in which amino acid substitution pairs are assigned randomly to these classes. We argue that, since the single rate model is so easy to improve upon, new codon models should not be validated entirely on the basis of improved model fit over this model. Rather, we should strive to both improve on the single rate model and to approximate the general time-reversible model of codon substitution, with as few parameters as possible, so as to reduce model over-fitting. We hint at how this can be achieved with a Genetic Algorithm approach in which rate classes are assigned on the basis of sequence information content.


Subject(s)
Codon , Models, Genetic , Algorithms
10.
J Virol ; 82(10): 5099-103, 2008 May.
Article in English | MEDLINE | ID: mdl-18321976

ABSTRACT

To understand astrovirus biology, it is essential to understand factors associated with its evolution. The current study reports the genomic sequences of nine novel turkey astrovirus (TAstV) type 2-like clinical isolates. This represents, to our knowledge, the largest genomic-length data set available for any one astrovirus type. The comparison of these TAstV sequences suggests that the TAstV species contains multiple subtypes and that recombination events have occurred across the astrovirus genome. In addition, the analysis of the capsid gene demonstrated evidence for both site-specific positive selection and purifying selection.


Subject(s)
Avastrovirus/genetics , Animals , Astroviridae Infections/virology , Avastrovirus/isolation & purification , Genome, Viral , Phylogeny , Poultry Diseases/virology , RNA, Viral/genetics , Recombination, Genetic , Sequence Analysis, DNA , Sequence Homology , Turkeys , United States
11.
Mol Biol Evol ; 24(1): 159-70, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17038448

ABSTRACT

The choice of a probabilistic model to describe sequence evolution can and should be justified. Underfitting the data through the use of overly simplistic models may miss out on interesting phenomena and lead to incorrect inferences. Overfitting the data with models that are too complex may ascribe biological meaning to statistical artifacts and result in falsely significant findings. We describe a likelihood-based approach for evolutionary model selection. The procedure employs a genetic algorithm (GA) to quickly explore a combinatorially large set of all possible time-reversible Markov models with a fixed number of substitution rates. When applied to stem RNA data subject to well-understood evolutionary forces, the models found by the GA 1) capture the expected overall rate patterns a priori; 2) fit the data better than the best available models based on a priori assumptions, suggesting subtle substitution patterns not previously recognized; 3) cannot be rejected in favor of the general reversible model, implying that the evolution of stem RNA sequences can be explained well with only a few substitution rate parameters; and 4) perform well on simulated data, both in terms of goodness of fit and the ability to estimate evolutionary rates. We also investigate the utility of several distance measures for comparing and contrasting inferred evolutionary models. Using widely available small computer clusters, our approach allows, for the first time, to evaluate the performance of existing RNA evolutionary models by comparing them with a large pool of candidate models and to validate common modeling assumptions. In addition, the new method provides the foundation for rigorous selection and comparison of substitution models for other types of sequence data.


Subject(s)
Algorithms , Evolution, Molecular , Nucleic Acid Conformation , RNA/chemistry , Animals , Computational Biology , HIV/genetics , Invertebrates/genetics , Likelihood Functions , Mammals/genetics , Models, Genetic , RNA/genetics , Response Elements , Sequence Alignment
12.
Mol Biol Evol ; 23(9): 1681-7, 2006 Sep.
Article in English | MEDLINE | ID: mdl-16760419

ABSTRACT

Studies of microbial eukaryotes have been pivotal in the discovery of biological phenomena, including RNA editing, self-splicing RNA, and telomere addition. Here we extend this list by demonstrating that genome architecture, namely the extensive processing of somatic (macronuclear) genomes in some ciliate lineages, is associated with elevated rates of protein evolution. Using newly developed likelihood-based procedures for studying molecular evolution, we investigate 6 genes to compare 1) ciliate protein evolution to that of 3 other clades of eukaryotes (plants, animals, and fungi) and 2) protein evolution in ciliates with extensively processed macronuclear genomes to that of other ciliate lineages. In 5 of the 6 genes, ciliates are estimated to have a higher ratio of nonsynonymous/synonymous substitution rates, consistent with an increase in the rate of protein diversification in ciliates relative to other eukaryotes. Even more striking, there is a significant effect of genome architecture within ciliates as the most divergent proteins are consistently found in those lineages with the most highly processed macronuclear genomes. We propose a model whereby genome architecture-specifically chromosomal processing, amitosis within macronuclei, and epigenetics-allows ciliates to explore protein space in a novel manner. Further, we predict that examination of diverse eukaryotes will reveal additional evidence of the impact of genome architecture on molecular evolution.


Subject(s)
Ciliophora/genetics , Evolution, Molecular , Genetic Variation , Genome, Protozoan , Selection, Genetic , Animals , Species Specificity
13.
Mol Biol Evol ; 22(12): 2375-85, 2005 Dec.
Article in English | MEDLINE | ID: mdl-16107593

ABSTRACT

We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.


Subject(s)
Amino Acid Substitution , Evolution, Molecular , Models, Genetic , Selection, Genetic , Amino Acids/genetics , Codon , Genetic Variation , Humans , Mutation , Open Reading Frames , Phylogeny
14.
J Mol Evol ; 61(3): 325-32, 2005 Sep.
Article in English | MEDLINE | ID: mdl-16044247

ABSTRACT

We analyze members of the receptor-like kinase (RLK) gene family in Arabidopsis thaliana for positive selection. Likelihood analyses find evidence for positive selection in 12 of the 52 RLK family sequences groups. These 12 groups represent 97 of the 403 sequences analyzed. The majority of genes in groups subject to positive selection have not been functionally characterized, but sites under selection are predominantly located in the extracellular region. The pattern of selection in the extracellular leucine-rich repeat (LRR) motif of groups 14 and 51 is similar to previous studies where positively selected positions are located in a solvent exposed beta-strand that may determine disease specificity, raising the possibility that some RLK genes function in a similar role.


Subject(s)
Arabidopsis Proteins/classification , Arabidopsis Proteins/genetics , Arabidopsis/classification , Arabidopsis/genetics , Phosphotransferases/classification , Phosphotransferases/genetics , Arabidopsis/enzymology , Leucine/genetics , Phylogeny
15.
Bioinformatics ; 21(9): 2128-9, 2005 May 01.
Article in English | MEDLINE | ID: mdl-15705655

ABSTRACT

SUMMARY: PowerMarker delivers a data-driven, integrated analysis environment (IAE) for genetic data. The IAE integrates data management, analysis and visualization in a user-friendly graphical user interface. It accelerates the analysis lifecycle and enables users to maintain data integrity throughout the process. An ever-growing list of more than 50 different statistical analyses for genetic markers has been implemented in PowerMarker. AVAILABILITY: www.powermarker.net


Subject(s)
DNA Mutational Analysis/methods , Genetic Markers/genetics , Polymorphism, Single Nucleotide/genetics , Software , User-Computer Interface , Algorithms , Computer Graphics , Gene Frequency , Genetics, Population/methods , Linkage Disequilibrium/genetics , Oligonucleotide Array Sequence Analysis/methods
16.
Bioinformatics ; 21(5): 676-9, 2005 Mar 01.
Article in English | MEDLINE | ID: mdl-15509596

ABSTRACT

UNLABELLED: The HyPhypackage is designed to provide a flexible and unified platform for carrying out likelihood-based analyses on multiple alignments of molecular sequence data, with the emphasis on studies of rates and patterns of sequence evolution. AVAILABILITY: http://www.hyphy.org CONTACT: muse@stat.ncsu.edu SUPPLEMENTARY INFORMATION: HyPhydocumentation and tutorials are available at http://www.hyphy.org.


Subject(s)
Algorithms , Evolution, Molecular , Models, Genetic , Phylogeny , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , User-Computer Interface , Computer Simulation
17.
Syst Biol ; 53(5): 685-92, 2004 Oct.
Article in English | MEDLINE | ID: mdl-15545249

ABSTRACT

Likelihood applications have become a central approach for molecular evolutionary analyses since the first computationally tractable treatment two decades ago. Although Felsenstein's original pruning algorithm makes likelihood calculations feasible, it is usually possible to take advantage of repetitive structure present in the data to arrive at even greater computational reductions. In particular, alignment columns with certain similarities have components of the likelihood calculation that are identical and need not be recomputed if columns are evaluated in an optimal order. We develop an algorithm for exploiting this speed improvement via an application of graph theory. The reductions provided by the method depend on both the tree and the data, but typical savings range between 15%and 50%. Real-data examples with time reductions of 80%have been identified. The overhead costs associated with implementing the algorithm are minimal, and they are recovered in all but the smallest data sets. The modifications will provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.


Subject(s)
Algorithms , Classification/methods , Evolution, Molecular , Models, Genetic , Phylogeny , Base Sequence/genetics , Likelihood Functions
18.
Mol Biol Evol ; 21(3): 555-62, 2004 Mar.
Article in English | MEDLINE | ID: mdl-14694079

ABSTRACT

The accumulation of divergent histone H4 amino acid sequences within and between ciliate lineages challenges traditional views of the evolution of this essential eukaryotic protein. We analyzed histone H4 sequences from 13 species of ciliates and compared these data with sequences from well-sampled eukaryotic clades. Ciliate histone H4s differ from one another at as many as 46% of their amino acids, in contrast with the highly conserved character of this protein in most other eukaryotes. Equally striking, we find paralogs of histone H4 within ciliate genomes that differ by up to 25% of their amino acids, whereas paralogs in other eukaryotes share identical or nearly identical amino acid sequences. Moreover, the most divergent H4 proteins within ciliates are found in the lineages with highly processed macronuclear genomes. Our analyses demonstrate that the dual nature of ciliate genomes-the presence of a "germline" micronucleus and a "somatic" macronucleus within each cell-allowed the dramatic variation in ciliate histone genes by altering functional constraints or enabling adaptive evolution of the histone H4 protein, or both.


Subject(s)
Ciliophora/genetics , Evolution, Molecular , Genetic Variation , Histones/genetics , Amino Acid Substitution , Animals , Genealogy and Heraldry
19.
Evolution ; 56(6): 1110-22, 2002 Jun.
Article in English | MEDLINE | ID: mdl-12144013

ABSTRACT

Ciliates provide a powerful system to analyze the evolution of duplicated alpha-tubulin genes in the context of single-celled organisms. Genealogical analyses of ciliate alpha-tubulin sequences reveal five apparently recent gene duplications. Comparisons of paralogs in different ciliates implicate differing patterns of substitutions (e.g., ratios of replacement/synonymous nucleotides and radical/conservative amino acids) following duplication. Most substitutions between paralogs in Euplotes crassus, Halteria grandinella and Paramecium tetraurelia are synonymous. In contrast, alpha-tubulin paralogs within Stylonychia lemnae and Chilodonella uncinata are evolving at significantly different rates and have higher ratios of both replacement substitutions to synonymous substitutions and radical amino acid changes to conservative amino acid changes. Moreover, the amino acid substitutions in C. uncinata and S. lemnae paralogs are limited to short stretches that correspond to functionally important regions of the alpha-tubulin protein. The topology of ciliate alpha-tubulin genealogies are inconsistent with taxonomy based on morphology and other molecular markers, which may be due to taxonomic sampling, gene conversion, unequal rates of evolution, or asymmetric patterns of gene duplication and loss.


Subject(s)
Ciliophora/genetics , Evolution, Molecular , Gene Duplication , Tubulin/genetics , Amino Acid Sequence , Animals , Base Sequence , Cloning, Molecular , Genetic Variation , Molecular Sequence Data , Phylogeny , Sequence Alignment , Sequence Homology, Amino Acid , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...