Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Adv Protein Chem Struct Biol ; 127: 217-248, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34340768

RESUMO

Protein structure characterization is fundamental to understand protein properties, such as folding process and protein resistance to thermal stress, up to unveiling organism pathologies (e.g., prion disease). In this chapter, we provide an overview on how the spectral properties of the networks reconstructed from the Protein Contact Map (PCM) can be used to generate informative observables. As a specific case study, we apply two different network approaches to an example protein dataset, for the aim of discriminating protein folding state, and for the reconstruction of protein 3D structure.


Assuntos
Bases de Dados de Proteínas , Dobramento de Proteína , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Animais , Humanos , Domínios Proteicos , Estabilidade Proteica
2.
Genes (Basel) ; 12(6)2021 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-34204764

RESUMO

Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=-ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.


Assuntos
Aprendizado Profundo , Variação Genética , Estabilidade Proteica , Análise de Sequência de Proteína/métodos , Substituição de Aminoácidos , Humanos , Simulação de Dinâmica Molecular
3.
Int J Mol Sci ; 22(11)2021 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-34063805

RESUMO

Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.


Assuntos
Mutação de Sentido Incorreto/genética , Neoplasias/genética , Biologia Computacional/métodos , Simulação por Computador , Humanos , Estabilidade Proteica , Proteínas/genética
4.
Bioinformatics ; 2021 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-33492342

RESUMO

Identifying pathogenic variants and annotating them is a major challenge in human genetics, especially for the non-coding ones. Several tools have been developed and used to predict the functional effect of genetic variants. However, the calibration assessment of the predictions has received little attention. Calibration refers to the idea that if a model predicts a group of variants to be pathogenic with a probability P, it is expected that the same fraction P of true positive is found in the observed set. For instance, a well-calibrated classifier should label the variants such that among the ones to which it gave a probability value close to 0.7, approximately 70% actually belong to the pathogenic class. Poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. Supplementary information Supplementary data are available at Bioinformatics online.

5.
Comput Struct Biotechnol J ; 18: 1968-1979, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32774791

RESUMO

Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.

6.
BMC Bioinformatics ; 20(Suppl 14): 335, 2019 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-31266447

RESUMO

BACKGROUND: Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = -∆∆G(B → A), where A and B are amino acids. RESULTS: Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. CONCLUSIONS: Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods.


Assuntos
Algoritmos , Estabilidade Proteica , Proteínas/química , Sequência de Aminoácidos , Evolução Molecular , Humanos , Mutação Puntual , Proteínas/genética , Termodinâmica
7.
Hum Mutat ; 40(9): 1474-1485, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31260570

RESUMO

The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.


Assuntos
Autoantígenos/genética , Proteínas de Ciclo Celular/genética , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Esquizofrenia/genética , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Redes Neurais de Computação , Fenótipo , Polimorfismo de Nucleotídeo Único
8.
Hum Mutat ; 40(9): 1463-1473, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31283071

RESUMO

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.


Assuntos
Calmodulina/química , Calmodulina/genética , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Leveduras/crescimento & desenvolvimento , Algoritmos , Sítios de Ligação , Cálcio/metabolismo , Calmodulina/metabolismo , Evolução Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptidão Genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformação Proteica , Engenharia de Proteínas , Leveduras/genética
10.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31301157

RESUMO

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de Precisão
11.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31241222

RESUMO

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Assuntos
Neoplasias da Mama/genética , Quinase do Ponto de Checagem 2/genética , Biologia Computacional/métodos , Hispano-Americanos/genética , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Neoplasias da Mama/etnologia , Estudos de Casos e Controles , Simulação por Computador , Feminino , Predisposição Genética para Doença , Humanos , Modelos Lineares , Pessoa de Meia-Idade , Estados Unidos/etnologia , Sequenciamento Completo do Exoma
12.
Hum Mutat ; 40(9): 1392-1399, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31209948

RESUMO

Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.


Assuntos
Substituição de Aminoácidos , Proteínas de Ligação ao Ferro/química , Proteínas de Ligação ao Ferro/genética , Algoritmos , Dicroísmo Circular , Humanos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica
13.
Nucleic Acids Res ; 47(W1): W136-W141, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31114899

RESUMO

As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.


Assuntos
Genoma/genética , Genômica , Polimorfismo de Nucleotídeo Único/genética , Software , Algoritmos , Animais , Cães , Variação Genética , Estudo de Associação Genômica Ampla , Genótipo , Internet
14.
Hum Mutat ; 40(9): 1314-1320, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31140652

RESUMO

Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.


Assuntos
Tromboembolia Venosa/genética , Varfarina/administração & dosagem , Sequenciamento Completo do Exoma/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Congressos como Assunto , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Curva ROC , Aprendizado de Máquina não Supervisionado , Tromboembolia Venosa/tratamento farmacológico , Varfarina/uso terapêutico
15.
Hum Mutat ; 40(9): 1455-1462, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31066146

RESUMO

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.


Assuntos
Biologia Computacional/métodos , Variação Genética , Proteínas/química , Proteínas/genética , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Fenótipo , Estabilidade Proteica
16.
Hum Mutat ; 40(9): 1400-1413, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31074541

RESUMO

Human frataxin is an iron-binding protein involved in the mitochondrial iron-sulfur (Fe-S) clusters assembly, a process fundamental for the functional activity of mitochondrial proteins. Decreased level of frataxin expression is associated with the neurodegenerative disease Friedreich ataxia. Defective function of frataxin may cause defects in mitochondria, leading to increased tumorigenesis. Tumor-initiating cells show higher iron uptake, a decrease in iron storage and a reduced Fe-S clusters synthesis and utilization. In this study, we selected, from COSMIC database, the somatic human frataxin missense variants found in cancer tissues p.D104G, p.A107V, p.F109L, p.Y123S, p.S161I, p.W173C, p.S181F, and p.S202F to analyze the effect of the single amino acid substitutions on frataxin structure, function, and stability. The spectral properties, the thermodynamic and the kinetic stability, as well as the molecular dynamics of the frataxin missense variants found in cancer tissues point to local changes confined to the environment of the mutated residues. The global fold of the variants is not altered by the amino acid substitutions; however, some of the variants show a decreased stability and a decreased functional activity in comparison with that of the wild-type protein.


Assuntos
Proteínas de Ligação ao Ferro/química , Proteínas de Ligação ao Ferro/genética , Mutação de Sentido Incorreto , Neoplasias/genética , Substituição de Aminoácidos , Bases de Dados Genéticas , Humanos , Modelos Moleculares , Simulação de Dinâmica Molecular , Mutagênese Sítio-Dirigida , Conformação Proteica , Estabilidade Proteica
17.
Wiley Interdiscip Rev Syst Biol Med ; 11(3): e1443, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30548534

RESUMO

More reliable and cheaper sequencing technologies have revealed the vast mutational landscapes characteristic of many phenotypes. The analysis of such genetic variants has led to successful identification of altered proteins underlying many Mendelian disorders. Nevertheless the simple one-variant one-phenotype model valid for many monogenic diseases does not capture the complexity of polygenic traits and disorders. Although experimental and computational approaches have improved detection of functionally deleterious variants and important interactions between gene products, the development of comprehensive models relating genotype and phenotypes remains a challenge in the field of genomic medicine. In this context, a new view of the pathologic state as significant perturbation of the network of interactions between biomolecules is crucial for the identification of biochemical pathways associated with complex phenotypes. Seminal studies in systems biology combined the analysis of genetic variation with protein-protein interaction networks to demonstrate that even as biological systems evolve to be robust to genetic variation, their topologies create disease vulnerabilities. More recent analyses model the impact of genetic variants as changes to the "wiring" of the interactome to better capture heterogeneity in genotype-phenotype relationships. These studies lay the foundation for using networks to predict variant effects at scale using machine-learning or algorithmic approaches. A wealth of databases and resources for the annotation of genotype-phenotype relationships have been developed to support developments in this area. This overview describes how study of the molecular interactome has generated insights linking the organization of biological systems to disease mechanism, and how this information can enable precision medicine. This article is categorized under: Translational, Genomic, and Systems Medicine > Translational Medicine Biological Mechanisms > Cell Signaling Models of Systems Properties and Processes > Mechanistic Models Analytical and Computational Methods > Computational Methods.


Assuntos
Variação Genética , Medicina de Precisão , Bases de Dados Genéticas , Doença/genética , Estudos de Associação Genética , Humanos , Aprendizado de Máquina , Mapas de Interação de Proteínas/genética , Receptor trkB/genética , Receptor trkB/metabolismo
18.
Nucleic Acids Res ; 45(W1): W247-W252, 2017 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-28482034

RESUMO

One of the major challenges in human genetics is to identify functional effects of coding and non-coding single nucleotide variants (SNVs). In the past, several methods have been developed to identify disease-related single amino acid changes but only few tools are able to score the impact of non-coding variants. Among the most popular algorithms, CADD and FATHMM predict the effect of SNVs in non-coding regions combining sequence conservation with several functional features derived from the ENCODE project data. Thus, to run CADD or FATHMM locally, the installation process requires to download a large set of pre-calculated information. To facilitate the process of variant annotation we develop PhD-SNPg, a new easy-to-install and lightweight machine learning method that depends only on sequence-based features. Despite this, PhD-SNPg performs similarly or better than more complex methods. This makes PhD-SNPg ideal for quick SNV interpretation, and as benchmark for tool development. AVAILABILITY: PhD-SNPg is accessible at http://snps.biofold.org/phd-snpg.


Assuntos
Variação Genética , Software , Algoritmos , Humanos , Internet , Aprendizado de Máquina , Análise de Sequência , Interface Usuário-Computador
19.
Hum Mutat ; 38(9): 1042-1050, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28440912

RESUMO

Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants.


Assuntos
Biologia Computacional/métodos , Inibidor de Quinase Dependente de Ciclina p18/genética , Variação Genética , Linhagem Celular Tumoral , Proliferação de Células , Simulação por Computador , Inibidor p16 de Quinase Dependente de Ciclina , Inibidor de Quinase Dependente de Ciclina p18/química , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Estabilidade Proteica
20.
Hum Mutat ; 38(9): 1064-1071, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28102005

RESUMO

SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a support vector machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by gene ontology (GO) terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve of 0.88 with low false-positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper, we summarize the best results obtained by SNPs&GO on disease-related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013), and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.


Assuntos
Substituição de Aminoácidos , Quinase do Ponto de Checagem 2/genética , Biologia Computacional/métodos , Inibidor p16 de Quinase Dependente de Ciclina/genética , Enzimas Reparadoras do DNA/genética , Proteínas de Ligação a DNA/genética , alfa-N-Acetilgalactosaminidase/genética , Hidrolases Anidrido Ácido , Algoritmos , Ontologia Genética , Predisposição Genética para Doença , Humanos , Anotação de Sequência Molecular , Curva ROC , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...