RESUMO
Accurate in silico prediction of conformational B-cell epitopes would lead to major improvements in disease diagnostics, drug design and vaccine development. A variety of computational methods, mainly based on machine learning approaches, have been developed in the last decades to tackle this challenging problem. Here, we rigorously benchmarked nine state-of-the-art conformational B-cell epitope prediction webservers, including generic and antibody-specific methods, on a dataset of over 250 antibody-antigen structures. The results of our assessment and statistical analyses show that all the methods achieve very low performances, and some do not perform better than randomly generated patches of surface residues. In addition, we also found that commonly used consensus strategies that combine the results from multiple webservers are at best only marginally better than random. Finally, we applied all the predictors to the SARS-CoV-2 spike protein as an independent case study, and showed that they perform poorly in general, which largely recapitulates our benchmarking conclusions. We hope that these results will lead to greater caution when using these tools until the biases and issues that limit current methods have been addressed, promote the use of state-of-the-art evaluation methodologies in future publications and suggest new strategies to improve the performance of conformational B-cell epitope prediction methods.
Assuntos
Epitopos de Linfócito B , Glicoproteína da Espícula de Coronavírus , Humanos , Biologia Computacional/métodos , Epitopos de Linfócito B/imunologia , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus/imunologiaRESUMO
Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.
Assuntos
Glicoproteína da Espícula de Coronavírus , Humanos , Ligação Proteica , Mutação , ViésRESUMO
MOTIVATION: The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS: We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION: https://github.com/KIT-MBS/pycofitness.
Assuntos
RNA , Software , RNA/genética , Sequência de Aminoácidos , Biologia Computacional , ProteínasRESUMO
Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.
Assuntos
Proteínas , Humanos , MutaçãoRESUMO
This paper presents an evaluation of predictions submitted for the "HMBS" challenge, a component of the sixth round of the Critical Assessment of Genome Interpretation held in 2021. The challenge required participants to predict the effects of missense variants of the human HMBS gene on yeast growth. The HMBS enzyme, critical for the biosynthesis of heme in eukaryotic cells, is highly conserved among eukaryotes. Despite the application of a variety of algorithms and methods, the performance of predictors was relatively similar, with Kendall's tau correlation coefficients between predictions and experimental scores around 0.3 for a majority of submissions. Notably, the median correlation (≥ 0.34) observed among these predictors, especially the top predictions from different groups, was greater than the correlation observed between their predictions and the actual experimental results. Most predictors were moderately successful in distinguishing between deleterious and benign variants, as evidenced by an area under the receiver operating characteristic (ROC) curve (AUC) of approximately 0.7 respectively. Compared with the recent two rounds of CAGI competitions, we noticed more predictors outperformed the baseline predictor, which is solely based on the amino acid frequencies. Nevertheless, the overall accuracy of predictions is still far short of positive control, which is derived from experimental scores, indicating the necessity for considerable improvements in the field. The most inaccurately predicted variants in this round were associated with the insertion loop, which is absent in many orthologs, suggesting the predictors still heavily rely on the information from multiple sequence alignment.
RESUMO
Antibodies play a central role in the adaptive immune response of vertebrates through the specific recognition of exogenous or endogenous antigens. The rational design of antibodies has a wide range of biotechnological and medical applications, such as in disease diagnosis and treatment. However, there are currently no reliable methods for predicting the antibodies that recognize a specific antigen region (or epitope) and, conversely, epitopes that recognize the binding region of a given antibody (or paratope). To fill this gap, we developed ImaPEp, a machine learning-based tool for predicting the binding probability of paratope-epitope pairs, where the epitope and paratope patches were simplified into interacting two-dimensional patches, which were colored according to the values of selected features, and pixelated. The specific recognition of an epitope image by a paratope image was achieved by using a convolutional neural network-based model, which was trained on a set of two-dimensional paratope-epitope images derived from experimental structures of antibody-antigen complexes. Our method achieves good performances in terms of cross-validation with a balanced accuracy of 0.8. Finally, we showcase examples of application of ImaPep, including extensive screening of large libraries to identify paratope candidates that bind to a selected epitope, and rescoring and refining antibody-antigen docking poses.
Assuntos
Epitopos , Redes Neurais de Computação , Epitopos/imunologia , Epitopos/química , Aprendizado de Máquina , Complexo Antígeno-Anticorpo/química , Complexo Antígeno-Anticorpo/imunologia , Humanos , Simulação de Acoplamento Molecular , Anticorpos/imunologia , Anticorpos/química , Antígenos/imunologia , Sítios de Ligação de AnticorposRESUMO
MOTIVATION: The SARS-CoV-2 virus has shown a remarkable ability to evolve and spread across the globe through successive waves of variants since the original Wuhan lineage. Despite all the efforts of the last 2 years, the early and accurate prediction of variant severity is still a challenging issue which needs to be addressed to help, for example, the decision of activating COVID-19 plans long before the peak of new waves. Upstream preparation would indeed make it possible to avoid the overflow of health systems and limit the most severe cases. RESULTS: We recently developed SpikePro, a structure-based computational model capable of quickly and accurately predicting the viral fitness of a variant from its spike protein sequence. It is based on the impact of mutations on the stability of the spike protein as well as on its binding affinity for the angiotensin-converting enzyme 2 (ACE2) and for a set of neutralizing antibodies. It yields a precise indication of the virus transmissibility, infectivity, immune escape and basic reproduction rate. We present here an updated version of the model that is now available on an easy-to-use webserver, and illustrate its power in a retrospective study of fitness evolution and reproduction rate of the main viral lineages. SpikePro is thus expected to be great help to assess the fitness of newly emerging SARS-CoV-2 variants in genomic surveillance and viral evolution programs. AVAILABILITY AND IMPLEMENTATION: SpikePro webserver http://babylone.ulb.ac.be/SpikePro/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Glicoproteína da Espícula de Coronavírus/genética , Estudos Retrospectivos , Peptidil Dipeptidase A , MutaçãoRESUMO
The electronic properties of DNA molecules, defined by the sequence-dependent ionization potentials of nucleobases, enable long-range charge transport along the DNA stacks. This has been linked to a range of key physiological processes in the cells and to the triggering of nucleobase substitutions, some of which may cause diseases. To gain molecular-level understanding of the sequence dependence of these phenomena, we estimated the vertical ionization potential (vIP) of all possible nucleobase stacks in B-conformation, containing one to four Gua, Ade, Thy, Cyt, or methylated Cyt. To do this, we used quantum chemistry calculations and more precisely the second-order Møller-Plesset perturbation theory (MP2) and three double-hybrid density functional theory methods, combined with several basis sets for describing atomic orbitals. The calculated vIP of single nucleobases were compared to experimental data and those of nucleobase pairs, triplets, and quadruplets, to observed mutability frequencies in the human genome, reported to be correlated with vIP values. This comparison selected MP2 with the 6-31G* basis set as the best of the tested calculation levels. These results were exploited to set up a recursive model, called vIPer, which estimates the vIP of all possible single-stranded DNA sequences of any length based on the calculated vIPs of overlapping quadruplets. vIPer's vIP values correlate well with oxidation potentials measured by cyclic voltammetry and activities obtained through photoinduced DNA cleavage experiments, further validating our approach. vIPer is freely available on the github.com/3BioCompBio/vIPer repository.
Assuntos
DNA de Cadeia Simples , DNA , Humanos , DNA/química , Conformação MolecularRESUMO
With more than 40 causative genes identified so far, autosomal dominant cerebellar ataxias exhibit a remarkable genetic heterogeneity. Yet, half the patients are lacking a molecular diagnosis. In a large family with nine sampled affected members, we performed exome sequencing combined with whole-genome linkage analysis. We identified a missense variant in NPTX1, NM_002522.3:c.1165G>A: p.G389R, segregating with the phenotype. Further investigations with whole-exome sequencing and an amplicon-based panel identified four additional unrelated families segregating the same variant, for whom a common founder effect could be excluded. A second missense variant, NM_002522.3:c.980A>G: p.E327G, was identified in a fifth familial case. The NPTX1-associated phenotype consists of a late-onset, slowly progressive, cerebellar ataxia, with downbeat nystagmus, cognitive impairment reminiscent of cerebellar cognitive affective syndrome, myoclonic tremor and mild cerebellar vermian atrophy on brain imaging. NPTX1 encodes the neuronal pentraxin 1, a secreted protein with various cellular and synaptic functions. Both variants affect conserved amino acid residues and are extremely rare or absent from public databases. In COS7 cells, overexpression of both neuronal pentraxin 1 variants altered endoplasmic reticulum morphology and induced ATF6-mediated endoplasmic reticulum stress, associated with cytotoxicity. In addition, the p.E327G variant abolished neuronal pentraxin 1 secretion, as well as its capacity to form a high molecular weight complex with the wild-type protein. Co-immunoprecipitation experiments coupled with mass spectrometry analysis demonstrated abnormal interactions of this variant with the cytoskeleton. In agreement with these observations, in silico modelling of the neuronal pentraxin 1 complex evidenced a destabilizing effect for the p.E327G substitution, located at the interface between monomers. On the contrary, the p.G389 residue, located at the protein surface, had no predictable effect on the complex stability. Our results establish NPTX1 as a new causative gene in autosomal dominant cerebellar ataxias. We suggest that variants in NPTX1 can lead to cerebellar ataxia due to endoplasmic reticulum stress, mediated by ATF6, and associated to a destabilization of NP1 polymers in a dominant-negative manner for one of the variants.
Assuntos
Proteína C-Reativa , Ataxia Cerebelar , Estresse do Retículo Endoplasmático , Proteínas do Tecido Nervoso , Humanos , Proteína C-Reativa/genética , Ataxia Cerebelar/genética , Estresse do Retículo Endoplasmático/genética , Sequenciamento do Exoma , Mutação , Proteínas do Tecido Nervoso/genética , LinhagemRESUMO
The design of allosteric modulators to control protein function is a key objective in drug discovery programs. Altering functionally essential allosteric residue networks provides unique protein family subtype specificity, minimizes unwanted off-target effects, and helps avert resistance acquisition typically plaguing drugs that target orthosteric sites. In this work, we used protein engineering and dimer interface mutations to positively and negatively modulate the immunosuppressive activity of the proapoptotic human galectin-7 (GAL-7). Using the PoPMuSiC and BeAtMuSiC algorithms, mutational sites and residue identity were computationally probed and predicted to either alter or stabilize the GAL-7 dimer interface. By designing a covalent disulfide bridge between protomers to control homodimer strength and stability, we demonstrate the importance of dimer interface perturbations on the allosteric network bridging the two opposite glycan-binding sites on GAL-7, resulting in control of induced apoptosis in Jurkat T cells. Molecular investigation of G16X GAL-7 variants using X-ray crystallography, biophysical, and computational characterization illuminates residues involved in dimer stability and allosteric communication, along with discrete long-range dynamic behaviors involving loops 1, 3, and 5. We show that perturbing the protein-protein interface between GAL-7 protomers can modulate its biological function, even when the overall structure and ligand-binding affinity remains unaltered. This study highlights new avenues for the design of galectin-specific modulators influencing both glycan-dependent and glycan-independent interactions.
Assuntos
Apoptose , Galectinas , Tolerância Imunológica , Multimerização Proteica , Linfócitos T/imunologia , Regulação Alostérica , Apoptose/genética , Apoptose/imunologia , Galectinas/química , Galectinas/genética , Galectinas/imunologia , Humanos , Células Jurkat , Multimerização Proteica/genética , Multimerização Proteica/imunologiaRESUMO
MOTIVATION: High-throughput experiments are generating ever increasing amounts of various -omics data, so shedding new light on the link between human disorders, their genetic causes and the related impact on protein behavior and structure. While numerous bioinformatics tools now exist that predict which variants in the human exome cause diseases, few tools predict the reasons why they might do so. Yet, understanding the impact of variants at the molecular level is a prerequisite for the rational development of targeted drugs or personalized therapies. RESULTS: We present the updated MutaFrame webserver, which aims to meet this need. It offers two deleteriousness prediction softwares, DEOGEN2 and SNPMuSiC, and is designed for bioinformaticians and medical researchers who want to gain insights into the origins of monogenic diseases. It contains information at two levels for each human protein: its amino acid sequence and its three-dimensional structure; we used the experimental structures whenever available, and modeled structures otherwise. MutaFrame also includes higher-level information, such as protein essentiality and protein-protein interactions. It has a user-friendly interface for the interpretation of results and a convenient visualization system for protein structures, in which the variant positions introduced by the user and other structural information are shown. In this way, MutaFrame aids our understanding of the pathogenic processes caused by single-site mutations and their molecular and contextual interpretation. AVAILABILITY AND IMPLEMENTATION: Mutaframe webserver at http://mutaframe.com/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Exoma , Humanos , Software , Proteínas , Mutação de Sentido IncorretoRESUMO
MOTIVATION: Although structured proteins adopt their lowest free energy conformation in physiological conditions, the individual residues are generally not in their lowest free energy conformation. Residues that are stability weaknesses are often involved in functional regions, whereas stability strengths ensure local structural stability. The detection of strengths and weaknesses provides key information to guide protein engineering experiments aiming to modulate folding and various functional processes. RESULTS: We developed the SWOTein predictor which identifies strong and weak residues in proteins on the basis of three types of statistical energy functions describing local interactions along the chain, hydrophobic forces and tertiary interactions. The large-scale analysis of the different types of strengths and weaknesses demonstrated their complementarity and the enhancement of the information they provide. Moreover, a good average correlation was observed between predicted and experimental strengths and weaknesses obtained from native hydrogen exchange data. SWOTein application to three test cases further showed its suitability to predict and interpret strong and weak residues in the context of folding, conformational changes and protein-protein binding. In summary, SWOTein is both fast and accurate and can be applied at small and large scale to analyze and modulate folding and molecular recognition processes. AVAILABILITY: The SWOTein webserver provides the list of predicted strengths and weaknesses and a protein structure visualization tool that facilitates the interpretation of the predictions. It is freely available for academic use at http://babylone.ulb.ac.be/SWOTein/.
RESUMO
SARS-CoV-2 infection elicits a polyclonal neutralizing antibody (nAb) response that primarily targets the spike protein, but it is still unclear which nAbs are immunodominant and what distinguishes them from subdominant nAbs. This information would however be crucial to predict the evolutionary trajectory of the virus and design future vaccines. To shed light on this issue, we gathered 83 structures of nAbs in complex with spike protein domains. We analyzed in silico the ability of these nAbs to bind the full spike protein trimer in open and closed conformations, and predicted the change in binding affinity of the most frequently observed spike protein variants in the circulating strains. This led us to define four nAb classes with distinct variant escape fractions. By comparing these fractions with those measured from plasma of infected patients, we showed that the class of nAbs that most contributes to the immune response is able to bind the spike protein in its closed conformation. Although this class of nAbs only partially inhibits the spike protein binding to the host's angiotensin converting enzyme 2 (ACE2), it has been suggested to lock the closed pre-fusion spike protein conformation and therefore prevent its transition to an open state. Furthermore, comparison of our predictions with mRNA-1273 vaccinated patient plasma measurements suggests that spike proteins contained in vaccines elicit a different nAb class than the one elicited by natural SARS-CoV-2 infection and suggests the design of highly stable closed-form spike proteins as next-generation vaccine immunogens.
Assuntos
Anticorpos Neutralizantes/imunologia , SARS-CoV-2/metabolismo , Glicoproteína da Espícula de Coronavírus/imunologia , Enzima de Conversão de Angiotensina 2/química , Enzima de Conversão de Angiotensina 2/metabolismo , Anticorpos Monoclonais/imunologia , Reações Antígeno-Anticorpo , COVID-19/patologia , COVID-19/virologia , Epitopos/imunologia , Humanos , Mutagênese , Ligação Proteica , Conformação Proteica , SARS-CoV-2/isolamento & purificação , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismoRESUMO
MOTIVATION: The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. RESULTS: We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. AVAILABILITY AND IMPLEMENTATION: The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional , Proteínas de Escherichia coli , Solubilidade , SolventesRESUMO
BACKGROUND: How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS: At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION: Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.
Assuntos
Código Genético , Mutagênese , Mutação , Biossíntese de Proteínas , Uso do Códon , Simulação por ComputadorRESUMO
Sphingomyelin phosphodiesterase (SMPD1) is a key enzyme in the sphingolipid metabolism. Genetic SMPD1 variants have been related to the Niemann-Pick lysosomal storage disorder, which has different degrees of phenotypic severity ranging from severe symptomatology involving the central nervous system (type A) to milder ones (type B). They have also been linked to neurodegenerative disorders such as Parkinson and Alzheimer. In this paper, we leveraged structural, evolutionary and stability information on SMPD1 to predict and analyze the impact of variants at the molecular level. We developed the SMPD1-ZooM algorithm, which is able to predict with good accuracy whether variants cause Niemann-Pick disease and its phenotypic severity; the predictor is freely available for download. We performed a large-scale analysis of all possible SMPD1 variants, which led us to identify protein regions that are either robust or fragile with respect to amino acid variations, and show the importance of aromatic-involving interactions in SMPD1 function and stability. Our study also revealed a good correlation between SMPD1-ZooM scores and in vitro loss of SMPD1 activity. The understanding of the molecular effects of SMPD1 variants is of crucial importance to improve genetic screening of SMPD1-related disorders and to develop personalized treatments that restore SMPD1 functionality.
Assuntos
Doenças de Niemann-Pick/genética , Esfingomielina Fosfodiesterase/genética , Simulação por Computador , Bases de Dados Genéticas , Éxons/genética , Variação Genética/genética , Humanos , Mutação/genética , Doenças de Niemann-Pick/metabolismo , Fenótipo , Índice de Gravidade de Doença , Esfingolipídeos/genética , Esfingolipídeos/metabolismo , Esfingomielina Fosfodiesterase/metabolismoRESUMO
Primary microcephaly (PM) is characterized by a small head since birth and is vastly heterogeneous both genetically and phenotypically. While most cases are monogenic, genetic interactions between Aspm and Wdr62 have recently been described in a mouse model of PM. Here, we used two complementary, holistic in vivo approaches: high throughput DNA sequencing of multiple PM genes in human patients with PM, and genome-edited zebrafish modeling for the digenic inheritance of PM. Exomes of patients with PM showed a significant burden of variants in 75 PM genes, that persisted after removing monogenic causes of PM (e.g., biallelic pathogenic variants in CEP152). This observation was replicated in an independent cohort of patients with PM, where a PM gene panel showed in addition that the burden was carried by six centrosomal genes. Allelic frequencies were consistent with digenic inheritance. In zebrafish, non-centrosomal gene casc5 -/- produced a severe PM phenotype, that was not modified by centrosomal genes aspm or wdr62 invalidation. A digenic, quadriallelic PM phenotype was produced by aspm and wdr62. Our observations provide strong evidence for digenic inheritance of human PM, involving centrosomal genes. Absence of genetic interaction between casc5 and aspm or wdr62 further delineates centrosomal and non-centrosomal pathways in PM.
Assuntos
Centrossomo/metabolismo , Estudos de Associação Genética , Predisposição Genética para Doença , Padrões de Herança , Microcefalia/diagnóstico , Microcefalia/genética , Animais , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Humanos , Mutação , Fases de Leitura Aberta , Fenótipo , Transdução de Sinais , Sequenciamento do Exoma , Peixe-ZebraRESUMO
BACKGROUND: It is nowadays clear that single base substitutions that occur in the human genome, of which some lead to pathogenic conditions, are non-random and influenced by their flanking nucleobase sequences. However, despite recent progress, the understanding of these "non-local" effects is still far from being achieved. RESULTS: To advance this problem, we analyzed the relationship between the base mutability in specific gene regions and the electron hole transport along the DNA base stacks, as it is one of the mechanisms that have been suggested to contribute to these effects. More precisely, we studied the connection between the normalized frequency of single base substitutions and the vertical ionization potential of the base and its flanking sequence, estimated using MP2/6-31G* ab initio quantum chemistry calculations. We found a statistically significant overall anticorrelation between these two quantities: the lower the vIP value, the more probable the substitution. Moreover, the slope of the regression lines varies. It is larger for introns than for exons and untranslated regions, and for synonymous than for missense substitutions. Interestingly, the correlation appears to be more pronounced when considering the flanking sequence of the substituted base in the 3' rather than in the 5' direction, which corresponds to the preferred direction of charge migration. A weaker but still statistically significant correlation is found between the ionization potentials and the pathogenicity of the base substitutions. Moreover, pathogenicity is also preferentially associated with larger changes in ionization potentials upon base substitution. CONCLUSIONS: With this analysis we gained new insights into the complex biophysical mechanisms that are at the basis of mutagenesis and pathogenicity, and supported the role of electron-hole transport in these matters.
Assuntos
Biologia Computacional/métodos , DNA/química , DNA/genética , Doença/genética , Elétrons , Polimorfismo de Nucleotídeo Único , Bases de Dados Genéticas , Mutação , Motivos de NucleotídeosRESUMO
Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis experiments feasible, even on a proteome scale. Despite these achievements, they still suffer from important issues that must be solved to allow further improving their performances and utilizing them to deepen our insights into protein folding and stability mechanisms. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations. Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (ΔΔG0) and proposed some unbiased solutions. We started by constructing a dataset Ssym of experimentally measured ΔΔG0s with an equal number of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used ΔΔG0 predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, especially those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing physical symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiCsym. This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed. Supplementary information: Supplementary data are available at Bioinformatics online. Note: The article 10.1093/bioinformatics/bty340/, published alongside this paper, also addresses the problem of biases in protein stability change predictions.
Assuntos
Dobramento de Proteína , Proteínas/genética , Viés , Mutação , Estabilidade ProteicaRESUMO
High-throughput sequencing methods are generating enormous amounts of genomic data, giving unprecedented insights into human genetic variation and its relation to disease. An individual human genome contains millions of Single Nucleotide Variants: to discriminate the deleterious from the benign ones, a variety of methods have been developed that predict whether a protein-coding variant likely affects the carrier individual's health. We present such a method, DEOGEN2, which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates. This extensive contextual information is non-linearly mapped into one single deleteriousness score for each variant. Since for the non-expert user it is sometimes still difficult to assess what this score means, how it relates to the encoded protein, and where it originates from, we developed an interactive online framework (http://deogen2.mutaframe.com/) to better present the DEOGEN2 deleteriousness predictions of all possible variants in all human proteins. The prediction is visualized so both expert and non-expert users can gain insights into the meaning, protein context and origins of each prediction.