RESUMO
The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes implicated in cancer exhibit. The identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease and enhanced likelihood of therapeutic drug targets success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. Here, we investigated gene essentiality scores and found that they tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in their cancer genes research.
Assuntos
Genes Essenciais , Aprendizado de Máquina , Neoplasias , Humanos , Neoplasias/genética , Mapas de Interação de Proteínas/genética , Evolução Molecular , Biologia Computacional/métodosRESUMO
Small in-frame insertion-deletion (indel) variants are a common form of genomic variation whose impact on rare disease phenotypes has been understudied. The prediction of the pathogenicity of such variants remains challenging. X-linked incomplete congenital stationary night blindness type 2 (CSNB2) is a nonprogressive, inherited retinal disorder caused by variants in CACNA1F, encoding the Cav1.4α1 channel protein. Here, structural analysis was used through homology modeling to interpret 10 disease-correlated and 10 putatively benign CACNA1F in-frame indel variants. CSNB2-correlated changes were found to be more highly conserved compared with putative benign variants. Notably, all 10 disease-correlated variants but none of the benign changes were within modeled regions of the protein. Structural analysis revealed that disease-correlated variants are predicted to destabilize the structure and function of the Cav1.4α1 channel protein. Overall, the use of structural information to interpret the consequences of in-frame indel variants provides an important adjunct that can improve the diagnosis for individuals with CSNB2.
Assuntos
Oftalmopatias Hereditárias , Cegueira Noturna , Humanos , Virulência , Canais de Cálcio Tipo L/genética , Cegueira Noturna/genética , Cegueira Noturna/metabolismo , Oftalmopatias Hereditárias/genética , Oftalmopatias Hereditárias/metabolismo , MutaçãoRESUMO
BACKGROUND: Improving the clinical interpretation of missense variants can increase the diagnostic yield of genomic testing and lead to personalised management strategies. Currently, due to the imprecision of bioinformatic tools that aim to predict variant pathogenicity, their role in clinical guidelines remains limited. There is a clear need for more accurate prediction algorithms and this study aims to improve performance by harnessing structural biology insights. The focus of this work is missense variants in a subset of genes associated with X linked disorders. METHODS: We have developed a protein-specific variant interpreter (ProSper) that combines genetic and protein structural data. This algorithm predicts missense variant pathogenicity by applying machine learning approaches to the sequence and structural characteristics of variants. RESULTS: ProSper outperformed seven previously described tools, including meta-predictors, in correctly evaluating whether or not variants are pathogenic; this was the case for 11 of the 21 genes associated with X linked disorders that met the inclusion criteria for this study. We also determined gene-specific pathogenicity thresholds that improved the performance of VEST4, REVEL and ClinPred, the three best-performing tools out of the seven that were evaluated; this was the case in 11, 11 and 12 different genes, respectively. CONCLUSION: ProSper can form the basis of a molecule-specific prediction tool that can be implemented into diagnostic strategies. It can allow the accurate prioritisation of missense variants associated with X linked disorders, aiding precise and timely diagnosis. In addition, we demonstrate that gene-specific pathogenicity thresholds for a range of missense prioritisation tools can lead to an increase in prediction accuracy.
Assuntos
Genes Ligados ao Cromossomo X , Mutação de Sentido Incorreto , Algoritmos , Biologia Computacional , Humanos , Mutação de Sentido Incorreto/genéticaRESUMO
The transcriptional regulator EVI1 has an essential role in early development and haematopoiesis. However, acute myeloid leukaemia (AML) driven by aberrantly high EVI1 expression has very poor prognosis. To investigate the effects of post-translational modifications on EVI1 function, we carried out a mass spectrometry (MS) analysis of EVI1 in AML and detected dynamic phosphorylation at serine 436 (S436). Wild-type EVI1 (EVI1-WT) with S436 available for phosphorylation, but not non-phosphorylatable EVI1-S436A, conferred haematopoietic progenitor cell self-renewal and was associated with significantly higher organised transcriptional patterns. In silico modelling of EVI1-S436 phosphorylation showed reduced affinity to CtBP1, and CtBP1 showed reduced interaction with EVI1-WT compared with EVI1-S436A. The motif harbouring S436 is a target of CDK2 and CDK3 kinases, which interacted with EVI1-WT. The methyltransferase DNMT3A bound preferentially to EVI1-WT compared with EVI1-S436A, and a hypomethylated cell population associated by EVI1-WT expression in murine haematopoietic progenitors is not maintained with EVI1-S436A. These data point to EVI1-S436 phosphorylation directing functional protein interactions for haematopoietic self-renewal. Targeting EVI1-S436 phosphorylation may be of therapeutic benefit when treating EVI1-driven leukaemia.
Assuntos
Oxirredutases do Álcool/metabolismo , Autorrenovação Celular/fisiologia , DNA (Citosina-5-)-Metiltransferases/metabolismo , Proteínas de Ligação a DNA/metabolismo , Leucemia Mieloide Aguda/metabolismo , Proteína do Locus do Complexo MDS1 e EVI1/metabolismo , Metilação de DNA/fisiologia , DNA Metiltransferase 3A , Metilases de Modificação do DNA/metabolismo , Humanos , Fosforilação , Prognóstico , Serina/metabolismo , Fatores de Transcrição/metabolismoRESUMO
Advances in DNA sequencing technologies have revolutionised rare disease diagnostics and have led to a dramatic increase in the volume of available genomic data. A key challenge that needs to be overcome to realise the full potential of these technologies is that of precisely predicting the effect of genetic variants on molecular and organismal phenotypes. Notably, despite recent progress, there is still a lack of robust in silico tools that accurately assign clinical significance to variants. Genetic alterations in the CACNA1F gene are the commonest cause of X-linked incomplete Congenital Stationary Night Blindness (iCSNB), a condition associated with non-progressive visual impairment. We combined genetic and homology modelling data to produce CACNA1F-vp, an in silico model that differentiates disease-implicated from benign missense CACNA1F changes. CACNA1F-vp predicts variant effects on the structure of the CACNA1F encoded protein (a calcium channel) using parameters based upon changes in amino acid properties; these include size, charge, hydrophobicity, and position. The model produces an overall score for each variant that can be used to predict its pathogenicity. CACNA1F-vp outperformed four other tools in identifying disease-implicated variants (area under receiver operating characteristic and precision recall curves = 0.84; Matthews correlation coefficient = 0.52) using a tenfold cross-validation technique. We consider this protein-specific model to be a robust stand-alone diagnostic classifier that could be replicated in other proteins and could enable precise and timely diagnosis.
Assuntos
Testes Genéticos/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Homologia Estrutural de Proteína , Animais , Canais de Cálcio Tipo L/química , Canais de Cálcio Tipo L/genética , Humanos , Aprendizado de Máquina , MutaçãoRESUMO
Inherited eye disorders (IED) are a heterogeneous group of Mendelian conditions that are associated with visual impairment. Although these disorders often exhibit incomplete penetrance and variable expressivity, the scale and mechanisms of these phenomena remain largely unknown. Here, we utilize publicly-available genomic and transcriptomic datasets to gain insights into variable penetrance in IED. Variants in a curated set of 340 IED-implicated genes were extracted from the Human Gene Mutation Database (HGMD) 2019.1 and cross-checked with the Genome Aggregation Database (gnomAD) 2.1 control-only dataset. Genes for which >1 variants were encountered in both HGMD and gnomAD were considered to be associated with variable penetrance (n = 56). Variability in gene expression levels was then estimated for the subset of these genes that was found to be adequately expressed in two relevant resources: the Genotype-Tissue Expression (GTEx) and Eye Genotype Expression (EyeGEx) datasets. We found that genes suspected to be associated with variable penetrance tended to have significantly more variability in gene expression levels in the general population (p = 0.0000015); this finding was consistent across tissue types. The results of this study point to the possible influence of cis and/or trans-acting elements on the expressivity of variants causing Mendelian disorders. They also highlight the potential utility of quantifying gene expression as part of the investigation of families showing evidence of variable penetrance.
Assuntos
Oftalmopatias/metabolismo , Regulação da Expressão Gênica/genética , Predisposição Genética para Doença , Penetrância , Retina/metabolismo , Doenças Retinianas/metabolismo , Sangue/metabolismo , Encéfalo/metabolismo , Bases de Dados Genéticas , Oftalmopatias/congênito , Oftalmopatias/genética , Fibroblastos/metabolismo , Expressão Gênica , Ontologia Genética , Humanos , Especificidade de Órgãos , Retina/patologia , Doenças Retinianas/congênito , Doenças Retinianas/genética , Pele/metabolismo , Pele/efeitos da radiação , Transcriptoma/genéticaRESUMO
Excessive type I interferon (IFNα/ß) activity is implicated in a spectrum of human disease, yet its direct role remains to be conclusively proven. We investigated two siblings with severe early-onset autoinflammatory disease and an elevated IFN signature. Whole-exome sequencing revealed a shared homozygous missense Arg148Trp variant in STAT2, a transcription factor that functions exclusively downstream of innate IFNs. Cells bearing STAT2R148W in homozygosity (but not heterozygosity) were hypersensitive to IFNα/ß, which manifest as prolonged Janus kinase-signal transducers and activators of transcription (STAT) signaling and transcriptional activation. We show that this gain of IFN activity results from the failure of mutant STAT2R148W to interact with ubiquitin-specific protease 18, a key STAT2-dependent negative regulator of IFNα/ß signaling. These observations reveal an essential in vivo function of STAT2 in the regulation of human IFNα/ß signaling, providing concrete evidence of the serious pathological consequences of unrestrained IFNα/ß activity and supporting efforts to target this pathway therapeutically in IFN-associated disease.
Assuntos
Doenças do Sistema Imunitário/genética , Interferon Tipo I/imunologia , Fator de Transcrição STAT2/genética , Mutação em Linhagem Germinativa , Humanos , Doenças do Sistema Imunitário/imunologia , Lactente , Masculino , Transdução de SinaisRESUMO
Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions.
Assuntos
Proteínas/química , Algoritmos , Conformação Proteica , TermodinâmicaRESUMO
Protein sequences of members of the plasminogen activation system are present throughout the entire vertebrate phylum. This important and well-described proteolytic cascade is governed by numerous protease-substrate and protease-inhibitor interactions whose conservation is crucial to maintaining unchanged protein function throughout evolution. The pressure to preserve protein-protein interactions may lead to either co-conservation or covariation of binding interfaces. Here, we combined covariation analysis and structure-based prediction to analyze the binding interfaces of urokinase (uPA):plasminogen activator inhibitor-1 (PAI-1) and uPA:plasminogen complexes. We detected correlated variation between the S3-pocket-lining residues of uPA and the P3 residue of both PAI-1 and plasminogen. These residues are known to form numerous polar interactions in the human uPA:PAI-1 Michaelis complex. To test the effect of mutations that correlate with each other and have occurred during mammalian diversification on protein-protein interactions, we produced uPA, PAI-1, and plasminogen from human and zebrafish to represent mammalian and nonmammalian orthologs. Using single amino acid point substitutions in these proteins, we found that the binding interfaces of uPA:plasminogen and uPA:PAI-1 may have coevolved to maintain tight interactions. Moreover, we conclude that although the interaction areas between protease-substrate and protease-inhibitor are shared, the two interactions are mechanistically different. Compared with a protease cleaving its natural substrate, the interaction between a protease and its inhibitor is more complex and involves a more fine-tuned mechanism. Understanding the effects of evolution on specific protein interactions may help further pharmacological interventions of the plasminogen activation system and other proteolytic systems.
Assuntos
Evolução Molecular , Inibidor 1 de Ativador de Plasminogênio/metabolismo , Ativadores de Plasminogênio/metabolismo , Sequência de Aminoácidos , Animais , Humanos , Modelos Moleculares , Ativadores de Plasminogênio/antagonistas & inibidores , Ativadores de Plasminogênio/química , Ligação Proteica , Conformação Proteica , Ativador de Plasminogênio Tipo Uroquinase/metabolismoRESUMO
Difficulty in sampling large and complex conformational spaces remains a key limitation in fragment-based de novo prediction of protein structure. Our previous work has shown that even for small-to-medium-sized proteins, some current methods inadequately sample alternative structures. We have developed two new conformational sampling techniques, one employing a bilevel optimisation framework and the other employing iterated local search. We combine strategies of forced structural perturbation (where some fragment insertions are accepted regardless of their impact on scores) and greedy local optimisation, allowing greater exploration of the available conformational space. Comparisons against the Rosetta Abinitio method indicate that our protocols more frequently generate native-like predictions for many targets, even following the low-resolution phase, using a given set of fragment libraries. By contrasting results across two different fragment sets, we show that our methods are able to better take advantage of high-quality fragments. These improvements can also translate into more reliable identification of near-native structures in a simple clustering-based model selection procedure. We show that when fragment libraries are sufficiently well-constructed, improved breadth of exploration within runs improves prediction accuracy. Our results also suggest that in benchmarking scenarios, a total exclusion of fragments drawn from homologous templates can make performance differences between methods appear less pronounced.
Assuntos
Fragmentos de Peptídeos/química , Proteínas/química , Benchmarking/métodos , Análise por Conglomerados , Simulação por Computador , Heurística , Modelos Moleculares , Conformação ProteicaRESUMO
This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore.
Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Linguagens de Programação , Proteínas/química , Proteínas/genéticaRESUMO
The coronary vasculature is an essential vessel network providing the blood supply to the heart. Disruptions in coronary blood flow contribute to cardiac disease, a major cause of premature death worldwide. The generation of treatments for cardiovascular disease will be aided by a deeper understanding of the developmental processes that underpin coronary vessel formation. From an ENU mutagenesis screen, we have isolated a mouse mutant displaying embryonic hydrocephalus and cardiac defects (EHC). Positional cloning and candidate gene analysis revealed that the EHC phenotype results from a point mutation in a splice donor site of the Myh10 gene, which encodes NMHC IIB. Complementation testing confirmed that the Myh10 mutation causes the EHC phenotype. Characterisation of the EHC cardiac defects revealed abnormalities in myocardial development, consistent with observations from previously generated NMHC IIB null mouse lines. Analysis of the EHC mutant hearts also identified defects in the formation of the coronary vasculature. We attribute the coronary vessel abnormalities to defective epicardial cell function, as the EHC epicardium displays an abnormal cell morphology, reduced capacity to undergo epithelial-mesenchymal transition (EMT), and impaired migration of epicardial-derived cells (EPDCs) into the myocardium. Our studies on the EHC mutant demonstrate a requirement for NMHC IIB in epicardial function and coronary vessel formation, highlighting the importance of this protein in cardiac development and ultimately, embryonic survival.
Assuntos
Vasos Coronários/crescimento & desenvolvimento , Desenvolvimento Embrionário/genética , Cadeias Pesadas de Miosina/genética , Miosina não Muscular Tipo IIB/genética , Pericárdio/crescimento & desenvolvimento , Animais , Diferenciação Celular/genética , Vasos Coronários/metabolismo , Embrião de Mamíferos , Transição Epitelial-Mesenquimal/genética , Humanos , Hidrocefalia/genética , Hidrocefalia/metabolismo , Hidrocefalia/patologia , Camundongos , Camundongos Knockout , Mutação , Miocárdio/metabolismo , Pericárdio/metabolismoRESUMO
Despite the use of combination antiretroviral drugs for the treatment of HIV-1 infection, the emergence of drug resistance remains a problem. Resistance may be conferred either by a single mutation or a concerted set of mutations. The involvement of multiple mutations can arise due to interactions between sites in the amino acid sequence as a consequence of the need to maintain protein structure. To better understand the nature of such epistatic interactions, we reconstructed the ancestral sequences of HIV-1's Pol protein, and traced the evolutionary trajectories leading to mutations associated with drug resistance. Using contemporary and ancestral sequences we modelled the effects of mutations (i.e. amino acid replacements) on protein structure to understand the functional effects of residue changes. Although the majority of resistance-associated sequences tend to destabilise the protein structure, we find there is a general tendency for protein stability to decrease across HIV-1's evolutionary history. That a similar pattern is observed in the non-drug resistance lineages indicates that non-resistant mutations, for example, associated with escape from the immune response, also impacts on protein stability. Maintenance of optimal protein structure therefore represents a major constraining factor to the evolution of HIV-1.
RESUMO
Duplication of genes or genomes provides the raw material for evolutionary innovation. After duplication a gene may be lost, recombine with another gene, have its function modified or be retained in an unaltered state. The fate of duplication is usually studied by comparing extant genomes and reconstructing the most likely ancestral states. Valuable as this approach is, it may miss the most rapid evolutionary events. Here, we engineered strains of Saccharomyces cerevisiae carrying tandem and non-tandem duplications of the singleton gene IFA38 to monitor (i) the fate of the duplicates in different conditions, including time scale and asymmetry of gene loss, and (ii) the changes in fitness and transcriptome of the strains immediately after duplication and after experimental evolution. We found that the duplication brings widespread transcriptional changes, but a fitness advantage is only present in fermentable media. In respiratory conditions, the yeast strains consistently lose the non-tandem IFA38 gene copy in a surprisingly short time, within only a few generations. This gene loss appears to be asymmetric and dependent on genome location, since the original IFA38 copy and the tandem duplicate are retained. Overall, this work shows for the first time that gene loss can be extremely rapid and context dependent.
Assuntos
Evolução Molecular , Duplicação Gênica , Saccharomyces cerevisiae/genética , Aptidão Genética , Genoma Fúngico , Microrganismos Geneticamente Modificados/genética , TranscriptomaRESUMO
BACKGROUND: Although the majority of small in-frame insertions/deletions (indels) has no/little affect on protein function, a small subset of these changes has been causally associated with genetic disorders. Notably, the molecular mechanisms and frequency by which they give rise to disease phenotypes remain largely unknown. The aim of this study is to provide insights into the role of in-frame indels (≤21 nucleotides) in two genetically heterogeneous eye disorders. RESULTS: One hundred eighty-one probands with childhood cataracts and 486 probands with retinal dystrophy underwent multigene panel testing in a clinical diagnostic laboratory. In-frame indels were collected and evaluated both clinically and in silico. Variants that could be modeled in the context of protein structure were identified and analysed using integrative structural modeling. Overall, 55 small in-frame indels were detected in 112 of 667 probands (16.8 %); 17 of these changes were novel to this study and 18 variants were reported clinically. A reliable model of the corresponding protein sequence could be generated for 8 variants. Structural modeling indicated a diverse range of molecular mechanisms of disease including disruption of secondary and tertiary protein structure and alteration of protein-DNA binding sites. CONCLUSIONS: In childhood cataract and retinal dystrophy subjects, one small in-frame indel is clinically reported in every ~37 individuals tested. The clinical utility of computational tools evaluating these changes increases when the full complexity of the involved molecular mechanisms is embraced.
Assuntos
Oftalmopatias/genética , Mutação INDEL/genética , Fases de Leitura/genética , Catarata/genética , Biologia Computacional , Humanos , Distrofias Retinianas/genéticaRESUMO
BACKGROUND: Physical interactions between proteins are essential for almost all biological functions and systems. To understand the evolution of function it is therefore important to understand the evolution of molecular interactions. Of key importance is the evolution of binding specificity, the set of interactions made by a protein, since change in specificity can lead to "rewiring" of interaction networks. Unfortunately, the interfaces through which proteins interact are complex, typically containing many amino-acid residues that collectively must contribute to binding specificity as well as binding affinity, structural integrity of the interface and solubility in the unbound state. RESULTS: In order to study the relationship between interface composition and binding specificity, we make use of paralogous pairs of yeast proteins. Immediately after duplication these paralogues will have identical sequences and protein products that make an identical set of interactions. As the sequences diverge, we can correlate amino-acid change in the interface with any change in the specificity of binding. We show that change in interface regions correlates only weakly with change in specificity, and many variants in interfaces are functionally equivalent. We show that many of the residue replacements within interfaces are silent with respect to their contribution to binding specificity. CONCLUSIONS: We conclude that such functionally-equivalent change has the potential to contribute to evolutionary plasticity in interfaces by creating cryptic variation, which in turn may provide the raw material for functional innovation and coevolution.
Assuntos
Evolução Molecular , Proteínas de Saccharomyces cerevisiae/química , Aminoácidos/genética , Sítios de Ligação , Evolução Biológica , Duplicação Gênica , Genoma Fúngico , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/genéticaRESUMO
Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent "fragment-assembly" technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of "deception" in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.
Assuntos
Algoritmos , Proteínas/química , Biologia Computacional , Evolução Molecular , Simulação de Dinâmica Molecular , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/genética , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genética , Processos EstocásticosRESUMO
Energy functions, fragment libraries, and search methods constitute three key components of fragment-assembly methods for protein structure prediction, which are all crucial for their ability to generate high-accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment-assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non-uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low-energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low-energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment-based methods in terms of the quality of conformational exploration realized.
Assuntos
Algoritmos , Biblioteca Gênica , Fragmentos de Peptídeos/química , Simulação por Computador , Modelos Moleculares , Fragmentos de Peptídeos/genética , Conformação Proteica , Dobramento de Proteína , TermodinâmicaRESUMO
Recent developments in the analysis of amino acid covariation are leading to breakthroughs in protein structure prediction, protein design, and prediction of the interactome. It is assumed that observed patterns of covariation are caused by molecular coevolution, where substitutions at one site affect the evolutionary forces acting at neighboring sites. Our theoretical and empirical results cast doubt on this assumption. We demonstrate that the strongest coevolutionary signal is a decrease in evolutionary rate and that unfeasibly long times are required to produce coordinated substitutions. We find that covarying substitutions are mostly found on different branches of the phylogenetic tree, indicating that they are independent events that may or may not be attributable to coevolution. These observations undermine the hypothesis that molecular coevolution is the primary cause of the covariation signal. In contrast, we find that the pairs of residues with the strongest covariation signal tend to have low evolutionary rates, and that it is this low rate that gives rise to the covariation signal. Slowly evolving residue pairs are disproportionately located in the protein's core, which explains covariation methods' ability to detect pairs of residues that are close in three dimensions. These observations lead us to propose the "coevolution paradox": The strength of coevolution required to cause coordinated changes means the evolutionary rate is so low that such changes are highly unlikely to occur. As modern covariation methods may lead to breakthroughs in structural genomics, it is critical to recognize their biases and limitations.
Assuntos
Evolução Molecular , Cadeias de Markov , Modelos Genéticos , Taxa de Mutação , Filogenia , Dobramento de Proteína , Proteínas/genéticaRESUMO
The 2014 epidemic of Ebola virus disease (EVD) has had a devastating impact in West Africa. Sequencing of ebolavirus (EBOV) from infected individuals has revealed extensive genetic variation, leading to speculation that the virus may be adapting to humans, accounting for the scale of the 2014 outbreak. We computationally analyze the variation associated with all EVD outbreaks, and find none of the amino acid replacements lead to identifiable functional changes. These changes have minimal effect on protein structure, being neither stabilizing nor destabilizing, are not found in regions of the proteins associated with known functions and tend to cluster in poorly constrained regions of proteins, specifically intrinsically disordered regions. We find no evidence that the difference between the current and previous outbreaks is due to evolutionary changes associated with transmission to humans. Instead, epidemiological factors are likely to be responsible for the unprecedented spread of EVD.