Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(W1): W451-W458, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37246737

RESUMEN

One of the primary challenges in human genetics is determining the functional impact of single nucleotide variants (SNVs) and insertion and deletions (InDels), whether coding or noncoding. In the past, methods have been created to detect disease-related single amino acid changes, but only some can assess the influence of noncoding variations. CADD is the most commonly used and advanced algorithm for predicting the diverse effects of genome variations. It employs a combination of sequence conservation and functional features derived from the ENCODE project data. To use CADD, a large set of pre-calculated information must be downloaded during the installation process. To streamline the variant annotation process, we developed PhD-SNPg, a machine-learning tool that is easy to install and lightweight, relying solely on sequence-based features. Here we present an updated version, trained on a larger dataset, that can also predict the impact of the InDel variations. Despite its simplicity, PhD-SNPg performs similarly to CADD, making it ideal for rapid genome interpretation and as a benchmark for tool development.


Asunto(s)
Algoritmos , Genoma Humano , Humanos , Mutación INDEL , Aprendizaje Automático , Polimorfismo de Nucleótido Simple
2.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35021190

RESUMEN

Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and 'all' available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21-0.5 and 0-0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51-0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $\Delta \Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.


Asunto(s)
Mutación Puntual , Proteínas , Mutación , Estabilidad Proteica , Proteínas/química , Termodinámica
3.
Hum Genomics ; 17(1): 95, 2023 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-37891694

RESUMEN

Mitogen-activated protein kinases 1 and 3 (MAPK1 and MAPK3), also called extracellular regulated kinases (ERK2 and ERK1), are serine/threonine kinase activated downstream by the Ras/Raf/MEK/ERK signal transduction cascade that regulates a variety of cellular processes. A dysregulation of MAPK cascade is frequently associated to missense mutations on its protein components and may be related to many pathologies, including cancer. In this study we selected from COSMIC database a set of MAPK1 and MAPK3 somatic variants found in cancer tissues carrying missense mutations distributed all over the MAPK1 and MAPK3 sequences. The proteins were expressed as pure recombinant proteins, and their biochemical and biophysical properties have been studied in comparison with the wild type. The missense mutations lead to changes in the tertiary arrangements of all the variants. The thermodynamic stability of the wild type and variants has been investigated in the non-phosphorylated and in the phosphorylated form. Significant differences in the thermal stabilities of most of the variants have been observed, as well as changes in the catalytic efficiencies. The energetics of the catalytic reaction is affected for all the variants for both the MAPK proteins. The stability changes and the variation in the enzyme catalysis observed for most of MAPK1/3 variants suggest that a local change in a residue, distant from the catalytic site, may have long-distance effects that reflect globally on enzyme stability and functions.


Asunto(s)
Mutación Missense , Neoplasias , Humanos , Proteína Quinasa 1 Activada por Mitógenos/metabolismo , Mutación Missense/genética , Neoplasias/genética , Neoplasias/metabolismo , Fosforilación , Proteínas Serina-Treonina Quinasas/metabolismo , Transducción de Señal
4.
Nucleic Acids Res ; 50(W1): W222-W227, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35524565

RESUMEN

Estimating the functional effect of single amino acid variants in proteins is fundamental for predicting the change in the thermodynamic stability, measured as the difference in the Gibbs free energy of unfolding, between the wild-type and the variant protein (ΔΔG). Here, we present the web-server of the DDGun method, which was previously developed for the ΔΔG prediction upon amino acid variants. DDGun is an untrained method based on basic features derived from evolutionary information. It is antisymmetric, as it predicts opposite ΔΔG values for direct (A → B) and reverse (B → A) single and multiple site variants. DDGun is available in two versions, one based on only sequence information and the other one based on sequence and structure information. Despite being untrained, DDGun reaches prediction performances comparable to those of trained methods. Here we make DDGun available as a web server. For the web server version, we updated the protein sequence database used for the computation of the evolutionary features, and we compiled two new data sets of protein variants to do a blind test of its performances. On these blind data sets of single and multiple site variants, DDGun confirms its prediction performance, reaching an average correlation coefficient between experimental and predicted ΔΔG of 0.45 and 0.49 for the sequence-based and structure-based versions, respectively. Besides being used for the prediction of ΔΔG, we suggest that DDGun should be adopted as a benchmark method to assess the predictive capabilities of newly developed methods. Releasing DDGun as a web-server, stand-alone program and docker image will facilitate the necessary process of method comparison to improve ΔΔG prediction.


Asunto(s)
Aminoácidos , Estabilidad Proteica , Proteínas , Aminoácidos/genética , Computadores , Bases de Datos de Proteínas , Proteínas/genética , Proteínas/química
5.
Int J Mol Sci ; 24(11)2023 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-37298272

RESUMEN

Cancer arises from the complex interplay of various factors. Traditionally, the identification of driver genes focuses primarily on the analysis of somatic mutations. We describe a new method for the detection of driver gene pairs based on an epistasis analysis that considers both germline and somatic variations. Specifically, the identification of significantly mutated gene pairs entails the calculation of a contingency table, wherein one of the co-mutated genes can exhibit a germline variant. By adopting this approach, it is possible to select gene pairs in which the individual genes do not exhibit significant associations with cancer. Finally, a survival analysis is used to select clinically relevant gene pairs. To test the efficacy of the new algorithm, we analyzed the colon adenocarcinoma (COAD) and lung adenocarcinoma (LUAD) samples available at The Cancer Genome Atlas (TCGA). In the analysis of the COAD and LUAD samples, we identify epistatic gene pairs significantly mutated in tumor tissue with respect to normal tissue. We believe that further analysis of the gene pairs detected by our method will unveil new biological insights, enhancing a better description of the cancer mechanism.


Asunto(s)
Adenocarcinoma del Pulmón , Adenocarcinoma , Neoplasias del Colon , Neoplasias Pulmonares , Humanos , Adenocarcinoma/genética , Epistasis Genética , Mutación , Neoplasias del Colon/genética , Adenocarcinoma del Pulmón/genética , Neoplasias Pulmonares/genética , Células Germinativas
6.
Hum Genet ; 141(10): 1649-1658, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35098354

RESUMEN

Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD and REVEL, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. As expected, our method results in ~ 3% lower area under the receiver-operating characteristic curve (AUC). When compared with an ensemble-based algorithm (REVEL). Nevertheless, the combination of predictions of multiple methods can help to identify more reliable predictions. These observations indicate that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.


Asunto(s)
Biología Computacional , Ácidos Nucleicos , Algoritmos , Secuencia de Aminoácidos , Biología Computacional/métodos , Humanos , Mutación Missense , Alineación de Secuencia
7.
Bioinformatics ; 36(24): 5709-5711, 2021 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-33492342

RESUMEN

SUMMARY: Identifying pathogenic variants and annotating them is a major challenge in human genetics, especially for the non-coding ones. Several tools have been developed and used to predict the functional effect of genetic variants. However, the calibration assessment of the predictions has received little attention. Calibration refers to the idea that if a model predicts a group of variants to be pathogenic with a probability P, it is expected that the same fraction P of true positive is found in the observed set. For instance, a well-calibrated classifier should label the variants such that among the ones to which it gave a probability value close to 0.7, approximately 70% actually belong to the pathogenic class. Poorly calibrated algorithms can be misleading and potentially harmful for clinical decision making. AVALIABILITY AND IMPLEMENTATION: The dataset used for testing the methods is available through the DOI:10.5281/zenodo.4448197. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Nucleic Acids Res ; 47(W1): W136-W141, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31114899

RESUMEN

As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve.


Asunto(s)
Genoma/genética , Genómica , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Algoritmos , Animales , Perros , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Internet
9.
Int J Mol Sci ; 22(11)2021 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-34063805

RESUMEN

Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.


Asunto(s)
Mutación Missense/genética , Neoplasias/genética , Biología Computacional/métodos , Simulación por Computador , Humanos , Estabilidad Proteica , Proteínas/genética
10.
BMC Bioinformatics ; 20(Suppl 14): 335, 2019 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-31266447

RESUMEN

BACKGROUND: Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = -∆∆G(B → A), where A and B are amino acids. RESULTS: Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. CONCLUSIONS: Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods.


Asunto(s)
Algoritmos , Estabilidad Proteica , Proteínas/química , Secuencia de Aminoácidos , Evolución Molecular , Humanos , Mutación Puntual , Proteínas/genética , Termodinámica
11.
Hum Mutat ; 40(9): 1455-1462, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31066146

RESUMEN

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Proteínas/química , Proteínas/genética , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Fenotipo , Estabilidad Proteica
12.
Hum Mutat ; 40(9): 1400-1413, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31074541

RESUMEN

Human frataxin is an iron-binding protein involved in the mitochondrial iron-sulfur (Fe-S) clusters assembly, a process fundamental for the functional activity of mitochondrial proteins. Decreased level of frataxin expression is associated with the neurodegenerative disease Friedreich ataxia. Defective function of frataxin may cause defects in mitochondria, leading to increased tumorigenesis. Tumor-initiating cells show higher iron uptake, a decrease in iron storage and a reduced Fe-S clusters synthesis and utilization. In this study, we selected, from COSMIC database, the somatic human frataxin missense variants found in cancer tissues p.D104G, p.A107V, p.F109L, p.Y123S, p.S161I, p.W173C, p.S181F, and p.S202F to analyze the effect of the single amino acid substitutions on frataxin structure, function, and stability. The spectral properties, the thermodynamic and the kinetic stability, as well as the molecular dynamics of the frataxin missense variants found in cancer tissues point to local changes confined to the environment of the mutated residues. The global fold of the variants is not altered by the amino acid substitutions; however, some of the variants show a decreased stability and a decreased functional activity in comparison with that of the wild-type protein.


Asunto(s)
Proteínas de Unión a Hierro/química , Proteínas de Unión a Hierro/genética , Mutación Missense , Neoplasias/genética , Sustitución de Aminoácidos , Bases de Datos Genéticas , Humanos , Modelos Moleculares , Simulación de Dinámica Molecular , Mutagénesis Sitio-Dirigida , Conformación Proteica , Estabilidad Proteica , Frataxina
13.
Hum Mutat ; 40(9): 1463-1473, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31283071

RESUMEN

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.


Asunto(s)
Calmodulina/química , Calmodulina/genética , Biología Computacional/métodos , Mutación Missense , Levaduras/crecimiento & desarrollo , Algoritmos , Sitios de Unión , Calcio/metabolismo , Calmodulina/metabolismo , Evolución Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptitud Genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformación Proteica , Ingeniería de Proteínas , Levaduras/genética
14.
Hum Mutat ; 40(9): 1474-1485, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31260570

RESUMEN

The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.


Asunto(s)
Autoantígenos/genética , Proteínas de Ciclo Celular/genética , Biología Computacional/métodos , Mutación Missense , Esquizofrenia/genética , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Redes Neurales de la Computación , Fenotipo , Polimorfismo de Nucleótido Simple
15.
Hum Mutat ; 40(9): 1314-1320, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31140652

RESUMEN

Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.


Asunto(s)
Secuenciación del Exoma/métodos , Tromboembolia Venosa/genética , Warfarina/administración & dosificación , Análisis por Conglomerados , Biología Computacional/métodos , Congresos como Asunto , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Curva ROC , Aprendizaje Automático no Supervisado , Tromboembolia Venosa/tratamiento farmacológico , Warfarina/uso terapéutico
16.
Hum Mutat ; 40(9): 1392-1399, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31209948

RESUMEN

Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.


Asunto(s)
Sustitución de Aminoácidos , Proteínas de Unión a Hierro/química , Proteínas de Unión a Hierro/genética , Algoritmos , Dicroismo Circular , Humanos , Modelos Moleculares , Conformación Proteica , Pliegue de Proteína , Estabilidad Proteica , Frataxina
17.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31301157

RESUMEN

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Asunto(s)
Sustitución de Aminoácidos , Biología Computacional/métodos , Cistationina betasintasa/genética , Cistationina/metabolismo , Cistationina betasintasa/metabolismo , Homocisteína/metabolismo , Humanos , Fenotipo , Medicina de Precisión
18.
Hum Mutat ; 40(9): 1612-1622, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31241222

RESUMEN

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.


Asunto(s)
Neoplasias de la Mama/genética , Quinasa de Punto de Control 2/genética , Biología Computacional/métodos , Hispánicos o Latinos/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Neoplasias de la Mama/etnología , Estudios de Casos y Controles , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Humanos , Modelos Lineales , Persona de Mediana Edad , Estados Unidos/etnología , Secuenciación del Exoma
19.
Nucleic Acids Res ; 45(W1): W247-W252, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28482034

RESUMEN

One of the major challenges in human genetics is to identify functional effects of coding and non-coding single nucleotide variants (SNVs). In the past, several methods have been developed to identify disease-related single amino acid changes but only few tools are able to score the impact of non-coding variants. Among the most popular algorithms, CADD and FATHMM predict the effect of SNVs in non-coding regions combining sequence conservation with several functional features derived from the ENCODE project data. Thus, to run CADD or FATHMM locally, the installation process requires to download a large set of pre-calculated information. To facilitate the process of variant annotation we develop PhD-SNPg, a new easy-to-install and lightweight machine learning method that depends only on sequence-based features. Despite this, PhD-SNPg performs similarly or better than more complex methods. This makes PhD-SNPg ideal for quick SNV interpretation, and as benchmark for tool development. AVAILABILITY: PhD-SNPg is accessible at http://snps.biofold.org/phd-snpg.


Asunto(s)
Variación Genética , Programas Informáticos , Algoritmos , Humanos , Internet , Aprendizaje Automático , Análisis de Secuencia , Interfaz Usuario-Computador
20.
Hum Mutat ; 38(9): 1064-1071, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-28102005

RESUMEN

SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a support vector machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by gene ontology (GO) terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve of 0.88 with low false-positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper, we summarize the best results obtained by SNPs&GO on disease-related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013), and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.


Asunto(s)
Sustitución de Aminoácidos , Quinasa de Punto de Control 2/genética , Biología Computacional/métodos , Inhibidor p16 de la Quinasa Dependiente de Ciclina/genética , Enzimas Reparadoras del ADN/genética , Proteínas de Unión al ADN/genética , alfa-N-Acetilgalactosaminidasa/genética , Ácido Anhídrido Hidrolasas , Algoritmos , Ontología de Genes , Predisposición Genética a la Enfermedad , Humanos , Anotación de Secuencia Molecular , Curva ROC , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA