Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 104
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34792168

RESUMO

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.

2.
Adv Protein Chem Struct Biol ; 127: 217-248, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34340768

RESUMO

Protein structure characterization is fundamental to understand protein properties, such as folding process and protein resistance to thermal stress, up to unveiling organism pathologies (e.g., prion disease). In this chapter, we provide an overview on how the spectral properties of the networks reconstructed from the Protein Contact Map (PCM) can be used to generate informative observables. As a specific case study, we apply two different network approaches to an example protein dataset, for the aim of discriminating protein folding state, and for the reconstruction of protein 3D structure.


Assuntos
Bases de Dados de Proteínas , Dobramento de Proteína , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Animais , Humanos , Domínios Proteicos , Estabilidade Proteica
3.
Genes (Basel) ; 12(6)2021 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-34204764

RESUMO

Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=-ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.


Assuntos
Aprendizado Profundo , Variação Genética , Estabilidade Proteica , Análise de Sequência de Proteína/métodos , Substituição de Aminoácidos , Humanos , Simulação de Dinâmica Molecular
4.
Methods Mol Biol ; 2275: 433-452, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34118055

RESUMO

Protein sequences, directly translated from genomic data, need functional and structural annotation. Together with molecular function and biological process, subcellular localization is an important feature necessary for understanding the protein role and the compartment where the mature protein is active. In the case of mitochondrial proteins, their precursor sequences translated by the ribosome machinery include specific patterns from which it is possible not only to recognize their final destination within the organelle but also which of the mitochondrial subcompartments the protein is intended for. Four compartments are routinely discriminated, including the inner and the outer membranes, the intermembrane space, and the matrix. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequence and to discriminate their final destination in the organelle. We benchmark two of our methods on the general task of recognizing human mitochondrial proteins endowed with an experimentally characterized targeting peptide (TPpred3) and predicting which submitochondrial compartment is the final destination (DeepMito). We describe how to adopt our web servers in order to discriminate which human proteins are endowed with mitochondrial targeting peptides, the position of cleavage sites, and which submitochondrial compartment are intended for. By this, we add some other 1788 human proteins to the 450 ones already manually annotated in UniProt with a mitochondrial targeting peptide, providing for each of them also the characterization of the suborganellar localization.


Assuntos
Biologia Computacional/métodos , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Aprendizado Profundo , Humanos , Proteínas Mitocondriais/química , Transporte Proteico , Navegador
5.
J Hepatol ; 75(4): 786-794, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34090928

RESUMO

BACKGROUND & AIMS: Non-invasive scoring systems (NSS) are used to identify patients with non-alcoholic fatty liver disease (NAFLD) who are at risk of advanced fibrosis, but their reliability in predicting long-term outcomes for hepatic/extrahepatic complications or death and their concordance in cross-sectional and longitudinal risk stratification remain uncertain. METHODS: The most common NSS (NFS, FIB-4, BARD, APRI) and the Hepamet fibrosis score (HFS) were assessed in 1,173 European patients with NAFLD from tertiary centres. Performance for fibrosis risk stratification and for the prediction of long-term hepatic/extrahepatic events, hepatocarcinoma (HCC) and overall mortality were evaluated in terms of AUC and Harrell's c-index. For longitudinal data, NSS-based Cox proportional hazard models were trained on the whole cohort with repeated 5-fold cross-validation, sampling for testing from the 607 patients with all NSS available. RESULTS: Cross-sectional analysis revealed HFS as the best performer for the identification of significant (F0-1 vs. F2-4, AUC = 0.758) and advanced (F0-2 vs. F3-4, AUC = 0.805) fibrosis, while NFS and FIB-4 showed the best performance for detecting histological cirrhosis (range AUCs 0.85-0.88). Considering longitudinal data (follow-up between 62 and 110 months), NFS and FIB-4 were the best at predicting liver-related events (c-indices>0.7), NFS for HCC (c-index = 0.9 on average), and FIB-4 and HFS for overall mortality (c-indices >0.8). All NSS showed limited performance (c-indices <0.7) for extrahepatic events. CONCLUSIONS: Overall, NFS, HFS and FIB-4 outperformed APRI and BARD for both cross-sectional identification of fibrosis and prediction of long-term outcomes, confirming that they are useful tools for the clinical management of patients with NAFLD at increased risk of fibrosis and liver-related complications or death. LAY SUMMARY: Non-invasive scoring systems are increasingly being used in patients with non-alcoholic fatty liver disease to identify those at risk of advanced fibrosis and hence clinical complications. Herein, we compared various non-invasive scoring systems and identified those that were best at identifying risk, as well as those that were best for the prediction of long-term outcomes, such as liver-related events, liver cancer and death.

6.
Bioengineering (Basel) ; 8(5)2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-34066939

RESUMO

Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.

7.
ESC Heart Fail ; 8(4): 2907-2919, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33934544

RESUMO

AIMS: Risk stratification in patients with advanced chronic heart failure (HF) is an unmet need. Circulating microRNA (miRNA) levels have been proposed as diagnostic and prognostic biomarkers in several diseases including HF. The aims of the present study were to characterize HF-specific miRNA expression profiles and to identify miRNAs with prognostic value in HF patients. METHODS AND RESULTS: We performed a global miRNome analysis using next-generation sequencing in the plasma of 30 advanced chronic HF patients and of matched healthy controls. A small subset of miRNAs was validated by real-time PCR (P < 0.0008). Pearson's correlation analysis was computed between miRNA expression levels and common HF markers. Multivariate prediction models were exploited to evaluate miRNA profiles' prognostic role. Thirty-two miRNAs were found to be dysregulated between the two groups. Six miRNAs (miR-210-3p, miR-22-5p, miR-22-3p, miR-21-3p, miR-339-3p, and miR-125a-5p) significantly correlated with HF biomarkers, among which N-terminal prohormone of brain natriuretic peptide. Inside the cohort of advanced HF population, we identified three miRNAs (miR-125a-5p, miR-10b-5p, and miR-9-5p) altered in HF patients experiencing the primary endpoint of cardiac death, heart transplantation, or mechanical circulatory support implantation when compared with those without clinical events. The three miRNAs added substantial prognostic power to Barcelona Bio-HF score, a multiparametric and validated risk stratification tool for HF (from area under the curve = 0.72 to area under the curve = 0.82). CONCLUSIONS: This discovery study has characterized, for the first time, the advanced chronic HF-specific miRNA expression pattern. We identified a few miRNAs able to improve the prognostic stratification of HF patients based on common clinical and laboratory values. Further studies are needed to validate our results in larger populations.


Assuntos
MicroRNA Circulante , Insuficiência Cardíaca , MicroRNAs , Biomarcadores , Insuficiência Cardíaca/diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MicroRNAs/genética
8.
Front Mol Biosci ; 8: 620475, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33842537

RESUMO

During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts. Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.

9.
Gut ; 2021 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-33541866

RESUMO

OBJECTIVE: The full phenotypic expression of non-alcoholic fatty liver disease (NAFLD) in lean subjects is incompletely characterised. We aimed to investigate prevalence, characteristics and long-term prognosis of Caucasian lean subjects with NAFLD. DESIGN: The study cohort comprises 1339 biopsy-proven NAFLD subjects from four countries (Italy, UK, Spain and Australia), stratified into lean and non-lean (body mass index (BMI) 10 483 person-years), 4.7% of lean vs 7.7% of non-lean patients reported liver-related events (p=0.37). No difference in survival was observed compared with non-lean NAFLD (p=0.069). CONCLUSIONS: Caucasian lean subjects with NAFLD may progress to advanced liver disease, develop metabolic comorbidities and experience cardiovascular disease (CVD) as well as liver-related mortality, independent of longitudinal progression to obesity and PNPLA3 genotype. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD where the disease manifests at lower overall BMI thresholds. LAY SUMMARY: NAFLD may affect and progress in both obese and lean individuals. Lean subjects are predominantly males, have a younger age at diagnosis and are more prevalent in some geographic areas. During the follow-up, lean subjects can develop hepatic and extrahepatic disease, including metabolic comorbidities, in the absence of weight gain. These patients represent one end of a wide spectrum of phenotypic expression of NAFLD.

10.
Front Mol Biosci ; 8: 620793, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33598480

RESUMO

Missense variants are among the most studied genome modifications as disease biomarkers. It has been shown that the "perturbation" of the protein stability upon a missense variant (in terms of absolute ΔΔG value, i.e., |ΔΔG|) has a significant, but not predictive, correlation with the pathogenicity of that variant. However, here we show that this correlation becomes significantly amplified in haploinsufficient genes. Moreover, the enrichment of pathogenic variants increases at the increasing protein stability perturbation value. These findings suggest that protein stability perturbation might be considered as a potential cofactor in diseases associated with haploinsufficient genes reporting missense variants.

11.
BMC Biol ; 19(1): 3, 2021 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-33441128

RESUMO

BACKGROUND: Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. RESULTS: In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. CONCLUSIONS: To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.


Assuntos
Carcinogênese/genética , Progressão da Doença , Aprendizado de Máquina , Oncologia/instrumentação , Neoplasias/genética , Medicina de Precisão/instrumentação , Neoplasias/patologia
12.
Bioinformatics ; 2021 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-33492342

RESUMO

Identifying pathogenic variants and annotating them is a major challenge in human genetics, especially for the non-coding ones. Several tools have been developed and used to predict the functional effect of genetic variants. However, the calibration assessment of the predictions has received little attention. Calibration refers to the idea that if a model predicts a group of variants to be pathogenic with a probability P, it is expected that the same fraction P of true positive is found in the observed set. For instance, a well-calibrated classifier should label the variants such that among the ones to which it gave a probability value close to 0.7, approximately 70% actually belong to the pathogenic class. Poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. Supplementary information Supplementary data are available at Bioinformatics online.

13.
Brief Bioinform ; 22(1): 601-603, 2021 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885042

RESUMO

A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.

14.
Brief Bioinform ; 22(2): 2172-2181, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32266404

RESUMO

Most living organisms rely on double-stranded DNA (dsDNA) to store their genetic information and perpetuate themselves. This biological information has been considered as the main target of evolution. However, here we show that symmetries and patterns in the dsDNA sequence can emerge from the physical peculiarities of the dsDNA molecule itself and the maximum entropy principle alone, rather than from biological or environmental evolutionary pressure. The randomness justifies the human codon biases and context-dependent mutation patterns in human populations. Thus, the DNA 'exceptional symmetries,' emerged from the randomness, have to be taken into account when looking for the DNA encoded information. Our results suggest that the double helix energy constraints and, more generally, the physical properties of the dsDNA are the hard drivers of the overall DNA sequence architecture, whereas the selective biological processes act as soft drivers, which only under extraordinary circumstances overtake the overall entropy content of the genome.

15.
Comput Struct Biotechnol J ; 18: 1968-1979, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32774791

RESUMO

Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.

16.
PLoS Comput Biol ; 16(4): e1007722, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32352965

RESUMO

Protein solubility is a key aspect for many biotechnological, biomedical and industrial processes, such as the production of active proteins and antibodies. In addition, understanding the molecular determinants of the solubility of proteins may be crucial to shed light on the molecular mechanisms of diseases caused by aggregation processes such as amyloidosis. Here we present SKADE, a novel Neural Network protein solubility predictor and we show how it can provide novel insight into the protein solubility mechanisms, thanks to its neural attention architecture. First, we show that SKADE positively compares with state of the art tools while using just the protein sequence as input. Then, thanks to the neural attention mechanism, we use SKADE to investigate the patterns learned during training and we analyse its decision process. We use this peculiarity to show that, while the attention profiles do not correlate with obvious sequence aspects such as biophysical properties of the aminoacids, they suggest that N- and C-termini are the most relevant regions for solubility prediction and are predictive for complex emergent properties such as aggregation-prone regions involved in beta-amyloidosis and contact density. Moreover, SKADE is able to identify mutations that increase or decrease the overall solubility of the protein, allowing it to be used to perform large scale in-silico mutagenesis of proteins in order to maximize their solubility.


Assuntos
Biologia Computacional/métodos , Rede Nervosa/fisiologia , Solubilidade , Algoritmos , Sequência de Aminoácidos/fisiologia , Aminoácidos , Animais , Simulação por Computador , Humanos , Modelos Moleculares , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Software
17.
NAR Genom Bioinform ; 2(1): lqaa011, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33575557

RESUMO

Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn's disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.

18.
Bioinform Biol Insights ; 13: 1177932219871263, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31488948

RESUMO

Predictions are fundamental in science as they allow to test and falsify theories. Predictions are ubiquitous in bioinformatics and also help when no first principles are available. Predictions can be distinguished between classifications (when we associate a label to a given input) or regression (when a real value is assigned). Different scores are used to assess the performance of regression predictors; the most widely adopted include the mean square error, the Pearson correlation (ρ), and the coefficient of determination (or R 2 ). The common conception related to the last 2 indices is that the theoretical upper bound is 1; however, their upper bounds depend both on the experimental uncertainty and the distribution of target variables. A narrow distribution of the target variable may induce a low upper bound. The knowledge of the theoretical upper bounds also has 2 practical applications: (1) comparing different predictors tested on different data sets may lead to wrong ranking and (2) performances higher than the theoretical upper bounds indicate overtraining and improper usage of the learning data sets. Here, we derive the upper bound for the coefficient of determination showing that it is lower than that of the square of the Pearson correlation. We provide analytical equations for both indices that can be used to evaluate the upper bound of the predictions when the experimental uncertainty and the target distribution are available. Our considerations are general and applicable to all regression predictors.

19.
BMC Bioinformatics ; 20(Suppl 14): 335, 2019 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-31266447

RESUMO

BACKGROUND: Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = -∆∆G(B → A), where A and B are amino acids. RESULTS: Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. CONCLUSIONS: Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods.


Assuntos
Algoritmos , Estabilidade Proteica , Proteínas/química , Sequência de Aminoácidos , Evolução Molecular , Humanos , Mutação Puntual , Proteínas/genética , Termodinâmica
20.
Hum Mutat ; 40(9): 1392-1399, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31209948

RESUMO

Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.


Assuntos
Substituição de Aminoácidos , Proteínas de Ligação ao Ferro/química , Proteínas de Ligação ao Ferro/genética , Algoritmos , Dicroísmo Circular , Humanos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...