Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Rev Genet ; 17(2): 109-21, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26781812

RESUMO

It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.


Assuntos
Evolução Molecular , Proteínas/genética , Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo
2.
Mol Biol Evol ; 37(7): 2110-2123, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32191313

RESUMO

It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.


Assuntos
Modelos Genéticos , Filogenia , Simulação por Computador
3.
Mol Biol Evol ; 37(1): 295-299, 2020 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504749

RESUMO

HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.


Assuntos
Técnicas Genéticas , Filogenia , Software
4.
Mol Biol Evol ; 35(9): 2307-2317, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29924340

RESUMO

The relative evolutionary rates at individual sites in proteins are informative measures of conservation or adaptation. Often used as evolutionarily aware conservation scores, relative rates reveal key functional or strongly selected residues. Estimating rates in a phylogenetic context requires specifying a protein substitution model, which is typically a phenomenological model trained on a large empirical data set. A strong emphasis has traditionally been placed on selecting the "best-fit" model, with the implicit understanding that suboptimal or otherwise ill-fitting models might bias inferences. However, the pervasiveness and degree of such bias has not been systematically examined. We investigated how model choice impacts site-wise relative rates in a large set of empirical protein alignments. We compared models designed for use on any general protein, models designed for specific domains of life, and the simple equal-rates Jukes Cantor-style model (JC). As expected, information theoretic measures showed overwhelming evidence that some models fit the data decidedly better than others. By contrast, estimates of site-specific evolutionary rates were impressively insensitive to the substitution model used, revealing an unexpected degree of robustness to potential model misspecification. A deeper examination of the fewer than 5% of sites for which model inferences differed in a meaningful way showed that the JC model could uniquely identify rapidly evolving sites that models with empirically derived exchangeabilities failed to detect. We conclude that relative protein rates appear robust to the applied substitution model, and any sensible model of protein evolution, regardless of its fit to the data, should produce broadly consistent evolutionary rates.


Assuntos
Evolução Molecular , Técnicas Genéticas , Modelos Genéticos , Proteínas/genética
5.
Mol Biol Evol ; 35(3): 773-777, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-29301006

RESUMO

Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey.org, and the underlying codebase is available from https://github.com/veg/datamonkey-js.

6.
Mol Biol Evol ; 33(11): 2990-3002, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27512115

RESUMO

The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development.


Assuntos
Modelos Genéticos , Mutação , Análise de Sequência de DNA/métodos , Substituição de Aminoácidos , Aminoácidos/genética , Teorema de Bayes , Evolução Molecular , Variação Genética , Funções Verossimilhança , Fases de Leitura Aberta , Filogenia , Seleção Genética
7.
Mol Biol Evol ; 32(4): 1097-108, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25576365

RESUMO

Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS ≤ 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. In contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.


Assuntos
Códon , Genômica , Filogenia , Seleção Genética , Simulação por Computador , Variação Genética , Modelos Genéticos
8.
Mol Biol Evol ; 31(9): 2496-500, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24899665

RESUMO

Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Simulação por Computador , Proteínas/genética , Seleção Genética , Software
9.
J Mol Evol ; 79(3-4): 130-42, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25217382

RESUMO

Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1-0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.


Assuntos
Evolução Molecular , Proteínas Virais/química , Sequência de Aminoácidos , Entropia , Simulação de Dinâmica Molecular , Conformação Proteica
10.
J Mol Evol ; 76(3): 172-82, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23355009

RESUMO

We have investigated the influence of the plasma membrane environment on the molecular evolution of G protein-coupled receptors (GPCRs), the largest receptor family in Metazoa. In particular, we have analyzed the site-specific rate variation across the two primary structural partitions, transmembrane (TM) and extramembrane (EM), of these membrane proteins. We find that TM domains evolve more slowly than do EM domains, though TM domains display increased rate heterogeneity relative to their EM counterparts. Although the majority of residues across GPCRs experience strong to weak purifying selection, many GPCRs experience positive selection at both TM and EM residues, albeit with a slight bias towards the EM. Further, a subset of GPCRs, chemosensory receptors (including olfactory and taste receptors), exhibit increased rates of evolution relative to other GPCRs, an effect which is more pronounced in their TM spans. Although it has been previously suggested that the TM's low evolutionary rate is caused by their high percentage of buried residues, we show that their attenuated rate seems to stem from the strong biophysical constraints of the membrane itself, or by functional requirements. In spite of the strong evolutionary constraints acting on the TM spans of GPCRs, positive selection and high levels of evolutionary rate variability are common. Thus, biophysical constraints should not be presumed to preclude a protein's ability to evolve.


Assuntos
Membrana Celular/fisiologia , Receptores Acoplados a Proteínas G/química , Receptores Acoplados a Proteínas G/genética , Seleção Genética , Sequência de Aminoácidos/genética , Animais , Membrana Celular/metabolismo , Bases de Dados de Proteínas , Evolução Molecular , Humanos , Modelos Moleculares , Taxa de Mutação , Estrutura Secundária de Proteína/genética , Estrutura Terciária de Proteína/genética , Receptores Acoplados a Proteínas G/metabolismo , Seleção Genética/genética , Seleção Genética/fisiologia
11.
Cell Genom ; 3(7): 100340, 2023 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-37492101

RESUMO

Pediatric brain and spinal cancers are collectively the leading disease-related cause of death in children; thus, we urgently need curative therapeutic strategies for these tumors. To accelerate such discoveries, the Children's Brain Tumor Network (CBTN) and Pacific Pediatric Neuro-Oncology Consortium (PNOC) created a systematic process for tumor biobanking, model generation, and sequencing with immediate access to harmonized data. We leverage these data to establish OpenPBTA, an open collaborative project with over 40 scalable analysis modules that genomically characterize 1,074 pediatric brain tumors. Transcriptomic classification reveals universal TP53 dysregulation in mismatch repair-deficient hypermutant high-grade gliomas and TP53 loss as a significant marker for poor overall survival in ependymomas and H3 K28-mutant diffuse midline gliomas. Already being actively applied to other pediatric cancers and PNOC molecular tumor board decision-making, OpenPBTA is an invaluable resource to the pediatric oncology community.

12.
Life (Basel) ; 12(7)2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35888041

RESUMO

The geosphere of primitive Earth was the source of life's essential building blocks, and the geochemical interactions among chemical elements can inform the origins of biological roles of each element. Minerals provide a record of the fundamental properties that each chemical element contributes to crustal composition, evolution, and subsequent biological utilization. In this study, we investigate correlations between the mineral species and bulk crustal composition of each chemical element. There are statistically significant correlations between the number of elements that each element forms minerals with (#-mineral-elements) and the log of the number of mineral species that each element occurs in, and between #-mineral-elements and the log of the number of mineral localities of that element. There is a lesser correlation between the log of the crustal percentage of each element and #-mineral-elements. In the crustal percentage vs. #-mineral-elements plot, positive outliers have either important biological roles (S, Cu) or toxic biological impacts (Pb, As), while negative outliers have no biological importance (Sc, Ga, Br, Yb). In particular, S is an important bridge element between organic (e.g., amino acids) and inorganic (metal cofactors) biological components. While C and N rarely form minerals together, the two elements commonly form minerals with H, which coincides with the role of H as an electron donor/carrier in biological nitrogen and carbon fixation. Both abundant crustal percentage vs. #-mineral-elements insiders (elements that follow the correlation) and less abundant outsiders (positive outliers from the correlation) have important biological functions as essential structural elements and catalytic cofactors.

13.
Sci Rep ; 12(1): 4956, 2022 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-35322071

RESUMO

Earth surface redox conditions are intimately linked to the co-evolution of the geosphere and biosphere. Minerals provide a record of Earth's evolving surface and interior chemistry in geologic time due to many different processes (e.g. tectonic, volcanic, sedimentary, oxidative, etc.). Here, we show how the bipartite network of minerals and their shared constituent elements expanded and evolved over geologic time. To further investigate network expansion over time, we derive and apply a novel metric (weighted mineral element electronegativity coefficient of variation; wMEECV) to quantify intra-mineral electronegativity variation with respect to redox. We find that element electronegativity and hard soft acid base (HSAB) properties are central factors in mineral redox chemistry under a wide range of conditions. Global shifts in mineral element electronegativity and HSAB associations represented by wMEECV changes at 1.8 and 0.6 billion years ago align with decreased continental elevation followed by the transition from the intermediate ocean and glaciation eras to post-glaciation, increased atmospheric oxygen in the Phanerozoic, and enhanced continental weathering. Consequently, network analysis of mineral element electronegativity and HSAB properties reveal that orogenic activity, evolving redox state of the mantle, planetary oxygenation, and climatic transitions directly impacted the evolving chemical complexity of Earth's crust.

14.
BMC Ecol Evol ; 21(1): 214, 2021 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-34844571

RESUMO

BACKGROUND: Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. RESULTS: We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. CONCLUSIONS: We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.


Assuntos
Alinhamento de Sequência , Filogenia , Incerteza
15.
Geobiology ; 18(2): 127-138, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32048807

RESUMO

The incorporation of metal cofactors into protein active sites and/or active regions expanded the network of microbial metabolism during the Archean eon. The bioavailability of crucial metal cofactors is largely influenced by earth surface redox state, which impacted the timing of metabolic evolution. Vanadium (V) is a unique element in geo-bio-coevolution due to its complex redox chemistry and specific biological functions. Thus, the extent of microbial V utilization potentially represents an important link between the geo- and biospheres in deep time. In this study, we used geochemical modeling and network analysis to investigate the availability and chemical speciation of V in the environment, and the emergence and changing chemistry of V-containing minerals throughout earth history. The redox state of V shifted from a more reduced V(III) state in Archean aqueous geochemistry and mineralogy to more oxidized V(IV) and V(V) states in the Proterozoic and Phanerozoic. The weathering of vanadium sulfides, vanadium alkali metal minerals, and vanadium alkaline earth metal minerals were potential sources of V to the environment and microbial utilization. Community detection analysis of the expanding V mineral network indicates tectonic and redox influence on the distribution of V mineral-forming elements. In reducing environments, energetic drivers existed for V to potentially be involved in early nitrogen fixation, while in oxidizing environments vanadate ( VO43-]]> ) could have acted as a metabolic electron acceptor and phosphate mimicking enzyme inhibitor. The coevolving chemical speciation and biological functions of V due to earth's changing surface redox conditions demonstrate the crucial links between the geosphere and biosphere in the evolution of metabolic electron transfer pathways and biogeochemical cycles from the Archean to Phanerozoic.


Assuntos
Vanádio/química , Disponibilidade Biológica , Planeta Terra , Oxirredução , Água
16.
Methods Mol Biol ; 1910: 427-468, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278673

RESUMO

Natural selection is a fundamental force shaping organismal evolution, as it both maintains function and enables adaptation and innovation. Viruses, with their typically short and largely coding genomes, experience strong and diverse selective forces, sometimes acting on timescales that can be directly measured. These selection pressures emerge from an antagonistic interplay between rapidly changing fitness requirements (immune and antiviral responses from hosts, transmission between hosts, or colonization of new host species) and functional imperatives (the ability to infect hosts or host cells and replicate within hosts). Indeed, computational methods to quantify these evolutionary forces using molecular sequence data were initially, dating back to the 1980s, applied to the study of viral pathogens. This preference largely emerged because the strong selective forces are easiest to detect in viruses, and, of course, viruses have clear biomedical relevance. Recent commoditization of affordable high-throughput sequencing has made it possible to generate truly massive genomic data sets, on which powerful and accurate methods can yield a very detailed depiction of when, where, and (sometimes) how viral pathogens respond to various selective forces.Here, we present recent statistical developments and state-of-the-art methods to identify and characterize these selection pressures from protein-coding sequence alignments and phylogenies. Methods described here can reveal critical information about various evolutionary regimes, including whole-gene selection, lineage-specific selection, and site-specific selection acting upon viral genomes, while accounting for confounding biological processes, such as recombination and variation in mutation rates.


Assuntos
Evolução Molecular , Genoma Viral , Genômica , Vírus/genética , Códon , Biologia Computacional/métodos , Variação Genética , Genômica/métodos , Filogenia , Recombinação Genética , Seleção Genética , Software , Vírus/classificação
17.
PeerJ ; 6: e4339, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29423346

RESUMO

We introduce LEISR (Likehood Estimation of Individual Site Rates, pronounced "laser"), a tool to infer relative evolutionary rates from protein and nucleotide data, implemented in HyPhy. LEISR is based on the popular Rate4Site (Pupko et al., 2002) approach for inferring relative site-wise evolutionary rates, primarily from protein data. We extend the original method for more general use in several key ways: (i) we increase the support for nucleotide data with additional models, (ii) we allow for datasets of arbitrary size, (iii) we support analysis of site-partitioned datasets to correct for the presence of recombination breakpoints, (iv) we produce rate estimates at all sites rather than at just a subset of sites, and (v) we implemented LEISR as MPI-enabled to support rapid, high-throughput analysis. LEISR is available in HyPhy starting with version 2.3.8, and it is accessible as an option in the HyPhy analysis menu ("Relative evolutionary rate inference"), which calls the HyPhy batchfile LEISR.bf.

18.
Evolution ; 72(10): 2234-2243, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30152871

RESUMO

Viral gain-of-function mutations frequently evolve during laboratory experiments. Whether the specific mutations that evolve in the lab also evolve in nature and whether they have the same impact on evolution in the real world is unknown. We studied a model virus, bacteriophage λ, that repeatedly evolves to exploit a new host receptor under typical laboratory conditions. Here, we demonstrate that two residues of λ's J protein are required for the new function. In natural λ variants, these amino acid sites are highly diverse and evolve at high rates. Insertions and deletions at these locations are associated with phylogenetic patterns indicative of ecological diversification. Our results show that viral evolution in the laboratory mirrors that in nature and that laboratory experiments can be coupled with protein sequence analyses to identify the causes of viral evolution in the real world. Furthermore, our results provide evidence for widespread host-shift evolution in lambdoid viruses.


Assuntos
Bacteriófago lambda/genética , Evolução Molecular , Mutação com Ganho de Função/genética , Seleção Genética , Filogenia
19.
PLoS One ; 12(4): e0164905, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28369116

RESUMO

Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.


Assuntos
Proteínas de Fluorescência Verde/química , Proteínas de Fluorescência Verde/genética , Sequência de Aminoácidos , Evolução Molecular Direcionada , Fluorescência , Mutação INDEL , Modelos Logísticos , Modelos Moleculares , Estrutura Secundária de Proteína , Deleção de Sequência , Máquina de Vetores de Suporte
20.
F1000Res ; 6: 1845, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29167739

RESUMO

We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as dN/ dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA