Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(D1): D494-D501, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37791887

RESUMEN

MultifacetedProtDB is a database of multifunctional human proteins deriving information from other databases, including UniProt, GeneCards, Human Protein Atlas (HPA), Human Phenotype Ontology (HPO) and MONDO. It collects under the label 'multifaceted' multitasking proteins addressed in literature as pleiotropic, multidomain, promiscuous (in relation to enzymes catalysing multiple substrates) and moonlighting (with two or more molecular functions), and difficult to be retrieved with a direct search in existing non-specific databases. The study of multifunctional proteins is an expanding research area aiming to elucidate the complexities of biological processes, particularly in humans, where multifunctional proteins play roles in various processes, including signal transduction, metabolism, gene regulation and cellular communication, and are often involved in disease insurgence and progression. The webserver allows searching by gene, protein and any associated structural and functional information, like available structures from PDB, structural models and interactors, using multiple filters. Protein entries are supplemented with comprehensive annotations including EC number, GO terms (biological pathways, molecular functions, and cellular components), pathways from Reactome, subcellular localization from UniProt, tissue and cell type expression from HPA, and associated diseases following MONDO, Orphanet and OMIM classification. MultiFacetedProtDB is freely available as a web server at: https://multifacetedprotdb.biocomp.unibo.it/.


Asunto(s)
Bases de Datos de Proteínas , Proteínas , Humanos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Bases de Datos como Asunto
2.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38685113

RESUMEN

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Asunto(s)
Enfermedades Raras , Humanos , Enfermedades Raras/genética , Enfermedades Raras/diagnóstico , Genoma Humano/genética , Variación Genética/genética , Biología Computacional/métodos , Fenotipo
3.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37540220

RESUMEN

MOTIVATION: Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. RESULTS: In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states. AVAILABILITY AND IMPLEMENTATION: CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.


Asunto(s)
Aprendizaje Profundo , Proteínas/química , Dominios Proteicos , Redes Neurales de la Computación , Anotación de Secuencia Molecular
4.
Proteomics ; 23(17): e2200323, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37365936

RESUMEN

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.


Asunto(s)
Proteínas , Reproducibilidad de los Resultados , Proteínas/metabolismo , Unión Proteica
5.
Brief Bioinform ; 22(1): 601-603, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-31885042

RESUMEN

A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.


Asunto(s)
Algoritmos , Aprendizaje Automático , Mutación , Estabilidad Proteica
6.
Bioinformatics ; 38(23): 5168-5174, 2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36227117

RESUMEN

MOTIVATION: The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS: E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION: The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inteligencia Artificial , Polimorfismo de Nucleótido Simple , Humanos , Secuencia de Aminoácidos , Proteínas/genética , Proteínas/química , Aminoácidos , Biología Computacional/métodos , Anotación de Secuencia Molecular
7.
Insect Mol Biol ; 32(2): 118-131, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36366787

RESUMEN

Termites (Insecta, Blattodea, Termitoidae) are a widespread and diverse group of eusocial insects known for their ability to digest wood matter. Herein, we report the draft genome of the subterranean termite Reticulitermes lucifugus, an economically important species and among the most studied taxa with respect to eusocial organization and mating system. The final assembly (~813 Mb) covered up to 88% of the estimated genome size and, in agreement with the Asexual Queen Succession Mating System, it was found completely homozygous. We predicted 16,349 highly supported gene models and 42% of repetitive DNA content. Transposable elements of R. lucifugus show similar evolutionary dynamics compared to that of other termites, with two main peaks of activity localized at 25% and 8% of Kimura divergence driven by DNA, LINE and SINE elements. Gene family turnover analyses identified multiple instances of gene duplication associated with R. lucifugus diversification, with significant lineage-specific gene family expansions related to development, perception and nutrient metabolism pathways. Finally, we analysed P450 and odourant receptor gene repertoires in detail, highlighting the large diversity and dynamical evolutionary history of these proteins in the R. lucifugus genome. This newly assembled genome will provide a valuable resource for further understanding the molecular basis of termites biology as well as for pest control.


Asunto(s)
Cucarachas , Isópteros , Animales , Isópteros/genética , Madera , Evolución Biológica , Reproducción
8.
Nucleic Acids Res ; 49(W1): W60-W66, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-33963861

RESUMEN

The Bologna ENZyme Web Server (BENZ WS) annotates four-level Enzyme Commission numbers (EC numbers) as defined by the International Union of Biochemistry and Molecular Biology (IUBMB). BENZ WS filters a target sequence with a combined system of Hidden Markov Models, modelling protein sequences annotated with the same molecular function, and Pfams, carrying along conserved protein domains. BENZ returns, when successful, for any enzyme target sequence an associated four-level EC number. Our system can annotate both monofunctional and polyfunctional enzymes, and it can be a valuable resource for sequence functional annotation.


Asunto(s)
Enzimas/química , Anotación de Secuencia Molecular/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Internet , Cadenas de Markov , Dominios Proteicos , Alineación de Secuencia
9.
Genomics ; 113(6): 4163-4172, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34748900

RESUMEN

This analysis presents five genome assemblies of four Notostraca taxa. Notostraca origin dates to the Permian/Upper Devonian and the extant forms show a striking morphological similarity to fossil taxa. The comparison of sequenced genomes with other Branchiopoda genomes shows that, despite the morphological stasis, Notostraca share a dynamic genome evolution with high turnover for gene families' expansion/contraction and a transposable elements content comparable to other branchiopods. While Notostraca substitutions rate appears similar or lower in comparison to other branchiopods, a subset of genes shows a faster evolutionary pace, highlighting the difficulty of generalizing about genomic stasis versus dynamism. Moreover, we found that the variation of Triops cancriformis transposable elements content appeared linked to reproductive strategies, in line with theoretical expectations. Overall, besides providing new genomic resources for the study of these organisms, which appear relevant for their ecology and evolution, we also confirmed the decoupling of morphological and molecular evolution.


Asunto(s)
Crustáceos , Evolución Molecular , Animales , Crustáceos/genética , Genómica , Larva , Filogenia
10.
Int J Mol Sci ; 23(12)2022 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-35742853

RESUMEN

Next-generation sequencing (NGS) has enormously improved the identification of disease-candidate genetic variants [...].


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación
11.
Bioinformatics ; 36(1): 56-64, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31218353

RESUMEN

MOTIVATION: The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments. RESULTS: We describe DeepMito, a novel method for predicting protein sub-mitochondrial cellular localization. Taking advantage of powerful deep-learning approaches, such as convolutional neural networks, our method is able to achieve very high prediction performances when discriminating among four different mitochondrial compartments (matrix, outer, inner and intermembrane regions). The method is trained and tested in cross-validation on a newly generated, high-quality dataset comprising 424 mitochondrial proteins with experimental evidence for sub-organelle localizations. We benchmark DeepMito towards the only one recent approach developed for the same task. Results indicate that DeepMito performances are superior. Finally, genomic-scale prediction on a highly-curated dataset of human mitochondrial proteins further confirms the effectiveness of our approach and suggests that DeepMito is a good candidate for genome-scale annotation of mitochondrial protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: The DeepMito web server as well as all datasets used in this study are available at http://busca.biocomp.unibo.it/deepmito. A standalone version of DeepMito is available on DockerHub at https://hub.docker.com/r/bolognabiocomp/deepmito. DeepMito source code is available on GitHub at https://github.com/BolognaBiocomp/deepmito. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Proteínas Mitocondriales , Redes Neurales de la Computación , Programas Informáticos , Biología Computacional/métodos , Humanos , Proteínas Mitocondriales/genética , Proteínas Mitocondriales/metabolismo , Transporte de Proteínas
12.
Int J Mol Sci ; 22(6)2021 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-33809039

RESUMEN

Taking advantage of the last cryogenic electron microscopy structure of human huntingtin, we explored with computational methods its physicochemical properties, focusing on the solvent accessible surface of the protein and highlighting a quite interesting mix of hydrophobic and hydrophilic patterns, with the prevalence of the latter ones. We then evaluated the probability of exposed residues to be in contact with other proteins, discovering that they tend to cluster in specific regions of the protein. We then found that the remaining portions of the protein surface can contain calcium-binding sites that we propose here as putative mediators for the protein to interact with membranes. Our findings are justified in relation to the present knowledge of huntingtin functional annotation.


Asunto(s)
Calcio/metabolismo , Biología Computacional , Proteína Huntingtina/química , Proteínas/genética , Sitios de Unión/genética , Humanos , Proteína Huntingtina/genética , Proteína Huntingtina/ultraestructura , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Unión Proteica/genética , Solventes/química , Propiedades de Superficie
13.
Int J Mol Sci ; 23(1)2021 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-35008593

RESUMEN

MTHFR deficiency still deserves an investigation to associate the phenotype to protein structure variations. To this aim, considering the MTHFR wild type protein structure, with a catalytic and a regulatory domain and taking advantage of state-of-the-art computational tools, we explore the properties of 72 missense variations known to be disease associated. By computing the thermodynamic ΔΔG change according to a consensus method that we recently introduced, we find that 61% of the disease-related variations destabilize the protein, are present both in the catalytic and regulatory domain and correspond to known biochemical deficiencies. The propensity of solvent accessible residues to be involved in protein-protein interaction sites indicates that most of the interacting residues are located in the regulatory domain, and that only three of them, located at the interface of the functional protein homodimer, are both disease-related and destabilizing. Finally, we compute the protein architecture with Hidden Markov Models, one from Pfam for the catalytic domain and the second computed in house for the regulatory domain. We show that patterns of disease-associated, physicochemical variation types, both in the catalytic and regulatory domains, are unique for the MTHFR deficiency when mapped into the protein architecture.


Asunto(s)
Homocistinuria/genética , Metilenotetrahidrofolato Reductasa (NADPH2)/deficiencia , Espasticidad Muscular/genética , Dominio Catalítico/genética , Humanos , Metilenotetrahidrofolato Reductasa (NADPH2)/genética , Mapas de Interacción de Proteínas/genética , Trastornos Psicóticos/genética
14.
BMC Bioinformatics ; 21(Suppl 8): 266, 2020 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-32938368

RESUMEN

BACKGROUND: The prediction of protein subcellular localization is a key step of the big effort towards protein functional annotation. Many computational methods exist to identify high-level protein subcellular compartments such as nucleus, cytoplasm or organelles. However, many organelles, like mitochondria, have their own internal compartmentalization. Knowing the precise location of a protein inside mitochondria is crucial for its accurate functional characterization. We recently developed DeepMito, a new method based on a 1-Dimensional Convolutional Neural Network (1D-CNN) architecture outperforming other similar approaches available in literature. RESULTS: Here, we explore the adoption of DeepMito for the large-scale annotation of four sub-mitochondrial localizations on mitochondrial proteomes of five different species, including human, mouse, fly, yeast and Arabidopsis thaliana. A significant fraction of the proteins from these organisms lacked experimental information about sub-mitochondrial localization. We adopted DeepMito to fill the gap, providing complete characterization of protein localization at sub-mitochondrial level for each protein of the five proteomes. Moreover, we identified novel mitochondrial proteins fishing on the set of proteins lacking any subcellular localization annotation using available state-of-the-art subcellular localization predictors. We finally performed additional functional characterization of proteins predicted by DeepMito as localized into the four different sub-mitochondrial compartments using both available experimental and predicted GO terms. All data generated in this study were collected into a database called DeepMitoDB (available at http://busca.biocomp.unibo.it/deepmitodb ), providing complete functional characterization of 4307 mitochondrial proteins from the five species. CONCLUSIONS: DeepMitoDB offers a comprehensive view of mitochondrial proteins, including experimental and predicted fine-grain sub-cellular localization and annotated and predicted functional annotations. The database complements other similar resources providing characterization of new proteins. Furthermore, it is also unique in including localization information at the sub-mitochondrial level. For this reason, we believe that DeepMitoDB can be a valuable resource for mitochondrial research.


Asunto(s)
Biología Computacional/métodos , Proteínas Mitocondriales/genética , Transporte de Proteínas/genética , Animales , Humanos
15.
Nucleic Acids Res ; 46(W1): W459-W466, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29718411

RESUMEN

Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.


Asunto(s)
Células Eucariotas/química , Proteínas de la Membrana/genética , Proteínas Mitocondriales/genética , Células Procariotas/química , Programas Informáticos , Bacterias/química , Bacterias/ultraestructura , Benchmarking , Membrana Celular/química , Membrana Celular/ultraestructura , Núcleo Celular/química , Núcleo Celular/ultraestructura , Cloroplastos/química , Cloroplastos/ultraestructura , Eucariontes/química , Eucariontes/ultraestructura , Células Eucariotas/ultraestructura , Expresión Génica , Ontología de Genes , Internet , Proteínas de la Membrana/metabolismo , Mitocondrias/química , Mitocondrias/ultraestructura , Proteínas Mitocondriales/metabolismo , Anotación de Secuencia Molecular , Células Procariotas/ultraestructura , Señales de Clasificación de Proteína/genética
16.
Hum Mutat ; 40(9): 1455-1462, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31066146

RESUMEN

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Proteínas/química , Proteínas/genética , Algoritmos , Simulación por Computador , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Aprendizaje Automático , Fenotipo , Estabilidad Proteica
17.
Hum Mutat ; 40(9): 1463-1473, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31283071

RESUMEN

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.


Asunto(s)
Calmodulina/química , Calmodulina/genética , Biología Computacional/métodos , Mutación Missense , Levaduras/crecimiento & desarrollo , Algoritmos , Sitios de Unión , Calcio/metabolismo , Calmodulina/metabolismo , Evolución Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptitud Genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformación Proteica , Ingeniería de Proteínas , Levaduras/genética
18.
Hum Mutat ; 40(9): 1495-1506, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31184403

RESUMEN

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.


Asunto(s)
Biología Computacional/métodos , Metiltransferasas/química , Mutación , Fosfohidrolasa PTEN/química , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Metiltransferasas/genética , Fosfohidrolasa PTEN/genética , Estabilidad Proteica
19.
Hum Mutat ; 40(9): 1215-1224, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31301154

RESUMEN

Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.


Asunto(s)
Empalme Alternativo , Biología Computacional/métodos , Mutación , Proteínas/genética , Animales , Congresos como Asunto , Aptitud Genética , Humanos , Modelos Genéticos , Homología de Secuencia de Ácido Nucleico
20.
Hum Mutat ; 40(9): 1474-1485, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31260570

RESUMEN

The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.


Asunto(s)
Autoantígenos/genética , Proteínas de Ciclo Celular/genética , Biología Computacional/métodos , Mutación Missense , Esquizofrenia/genética , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Redes Neurales de la Computación , Fenotipo , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA