Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 124
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D494-D501, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37791887

RESUMO

MultifacetedProtDB is a database of multifunctional human proteins deriving information from other databases, including UniProt, GeneCards, Human Protein Atlas (HPA), Human Phenotype Ontology (HPO) and MONDO. It collects under the label 'multifaceted' multitasking proteins addressed in literature as pleiotropic, multidomain, promiscuous (in relation to enzymes catalysing multiple substrates) and moonlighting (with two or more molecular functions), and difficult to be retrieved with a direct search in existing non-specific databases. The study of multifunctional proteins is an expanding research area aiming to elucidate the complexities of biological processes, particularly in humans, where multifunctional proteins play roles in various processes, including signal transduction, metabolism, gene regulation and cellular communication, and are often involved in disease insurgence and progression. The webserver allows searching by gene, protein and any associated structural and functional information, like available structures from PDB, structural models and interactors, using multiple filters. Protein entries are supplemented with comprehensive annotations including EC number, GO terms (biological pathways, molecular functions, and cellular components), pathways from Reactome, subcellular localization from UniProt, tissue and cell type expression from HPA, and associated diseases following MONDO, Orphanet and OMIM classification. MultiFacetedProtDB is freely available as a web server at: https://multifacetedprotdb.biocomp.unibo.it/.


Assuntos
Bases de Dados de Proteínas , Proteínas , Humanos , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Bases de Dados como Assunto
2.
Hum Genomics ; 18(1): 44, 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38685113

RESUMO

BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.


Assuntos
Doenças Raras , Humanos , Doenças Raras/genética , Doenças Raras/diagnóstico , Genoma Humano/genética , Variação Genética/genética , Biologia Computacional/métodos , Fenótipo
3.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36220772

RESUMO

The recent biotechnological progress has allowed life scientists and physicians to access an unprecedented, massive amount of data at all levels (molecular, supramolecular, cellular and so on) of biological complexity. So far, mostly classical computational efforts have been dedicated to the simulation, prediction or de novo design of biomolecules, in order to improve the understanding of their function or to develop novel therapeutics. At a higher level of complexity, the progress of omics disciplines (genomics, transcriptomics, proteomics and metabolomics) has prompted researchers to develop informatics means to describe and annotate new biomolecules identified with a resolution down to the single cell, but also with a high-throughput speed. Machine learning approaches have been implemented to both the modelling studies and the handling of biomedical data. Quantum computing (QC) approaches hold the promise to resolve, speed up or refine the analysis of a wide range of these computational problems. Here, we review and comment on recently developed QC algorithms for biocomputing, with a particular focus on multi-scale modelling and genomic analyses. Indeed, differently from other computational approaches such as protein structure prediction, these problems have been shown to be adequately mapped onto quantum architectures, the main limit for their immediate use being the number of qubits and decoherence effects in the available quantum machines. Possible advantages over the classical counterparts are highlighted, along with a description of some hybrid classical/quantum approaches, which could be the closest to be realistically applied in biocomputation.


Assuntos
Biologia Computacional , Metodologias Computacionais , Teoria Quântica , Genômica , Algoritmos
4.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37540220

RESUMO

MOTIVATION: Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. RESULTS: In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states. AVAILABILITY AND IMPLEMENTATION: CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.


Assuntos
Aprendizado Profundo , Proteínas/química , Domínios Proteicos , Redes Neurais de Computação , Anotação de Sequência Molecular
5.
Proteomics ; 23(17): e2200323, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37365936

RESUMO

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.


Assuntos
Proteínas , Reprodutibilidade dos Testes , Proteínas/metabolismo , Ligação Proteica
6.
Brief Bioinform ; 22(1): 601-603, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31885042

RESUMO

A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.


Assuntos
Algoritmos , Aprendizado de Máquina , Mutação , Estabilidade Proteica
7.
Bioinformatics ; 38(23): 5168-5174, 2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36227117

RESUMO

MOTIVATION: The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS: E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION: The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Inteligência Artificial , Polimorfismo de Nucleotídeo Único , Humanos , Sequência de Aminoácidos , Proteínas/genética , Proteínas/química , Aminoácidos , Biologia Computacional/métodos , Anotação de Sequência Molecular
8.
Insect Mol Biol ; 32(2): 118-131, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36366787

RESUMO

Termites (Insecta, Blattodea, Termitoidae) are a widespread and diverse group of eusocial insects known for their ability to digest wood matter. Herein, we report the draft genome of the subterranean termite Reticulitermes lucifugus, an economically important species and among the most studied taxa with respect to eusocial organization and mating system. The final assembly (~813 Mb) covered up to 88% of the estimated genome size and, in agreement with the Asexual Queen Succession Mating System, it was found completely homozygous. We predicted 16,349 highly supported gene models and 42% of repetitive DNA content. Transposable elements of R. lucifugus show similar evolutionary dynamics compared to that of other termites, with two main peaks of activity localized at 25% and 8% of Kimura divergence driven by DNA, LINE and SINE elements. Gene family turnover analyses identified multiple instances of gene duplication associated with R. lucifugus diversification, with significant lineage-specific gene family expansions related to development, perception and nutrient metabolism pathways. Finally, we analysed P450 and odourant receptor gene repertoires in detail, highlighting the large diversity and dynamical evolutionary history of these proteins in the R. lucifugus genome. This newly assembled genome will provide a valuable resource for further understanding the molecular basis of termites biology as well as for pest control.


Assuntos
Baratas , Isópteros , Animais , Isópteros/genética , Madeira , Evolução Biológica , Reprodução
9.
Nucleic Acids Res ; 49(W1): W60-W66, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33963861

RESUMO

The Bologna ENZyme Web Server (BENZ WS) annotates four-level Enzyme Commission numbers (EC numbers) as defined by the International Union of Biochemistry and Molecular Biology (IUBMB). BENZ WS filters a target sequence with a combined system of Hidden Markov Models, modelling protein sequences annotated with the same molecular function, and Pfams, carrying along conserved protein domains. BENZ returns, when successful, for any enzyme target sequence an associated four-level EC number. Our system can annotate both monofunctional and polyfunctional enzymes, and it can be a valuable resource for sequence functional annotation.


Assuntos
Enzimas/química , Anotação de Sequência Molecular/métodos , Análise de Sequência de Proteína/métodos , Software , Internet , Cadeias de Markov , Domínios Proteicos , Alinhamento de Sequência
10.
Genomics ; 113(6): 4163-4172, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34748900

RESUMO

This analysis presents five genome assemblies of four Notostraca taxa. Notostraca origin dates to the Permian/Upper Devonian and the extant forms show a striking morphological similarity to fossil taxa. The comparison of sequenced genomes with other Branchiopoda genomes shows that, despite the morphological stasis, Notostraca share a dynamic genome evolution with high turnover for gene families' expansion/contraction and a transposable elements content comparable to other branchiopods. While Notostraca substitutions rate appears similar or lower in comparison to other branchiopods, a subset of genes shows a faster evolutionary pace, highlighting the difficulty of generalizing about genomic stasis versus dynamism. Moreover, we found that the variation of Triops cancriformis transposable elements content appeared linked to reproductive strategies, in line with theoretical expectations. Overall, besides providing new genomic resources for the study of these organisms, which appear relevant for their ecology and evolution, we also confirmed the decoupling of morphological and molecular evolution.


Assuntos
Crustáceos , Evolução Molecular , Animais , Crustáceos/genética , Genômica , Larva , Filogenia
11.
Bioinformatics ; 36(1): 56-64, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31218353

RESUMO

MOTIVATION: The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments. RESULTS: We describe DeepMito, a novel method for predicting protein sub-mitochondrial cellular localization. Taking advantage of powerful deep-learning approaches, such as convolutional neural networks, our method is able to achieve very high prediction performances when discriminating among four different mitochondrial compartments (matrix, outer, inner and intermembrane regions). The method is trained and tested in cross-validation on a newly generated, high-quality dataset comprising 424 mitochondrial proteins with experimental evidence for sub-organelle localizations. We benchmark DeepMito towards the only one recent approach developed for the same task. Results indicate that DeepMito performances are superior. Finally, genomic-scale prediction on a highly-curated dataset of human mitochondrial proteins further confirms the effectiveness of our approach and suggests that DeepMito is a good candidate for genome-scale annotation of mitochondrial protein subcellular localization. AVAILABILITY AND IMPLEMENTATION: The DeepMito web server as well as all datasets used in this study are available at http://busca.biocomp.unibo.it/deepmito. A standalone version of DeepMito is available on DockerHub at https://hub.docker.com/r/bolognabiocomp/deepmito. DeepMito source code is available on GitHub at https://github.com/BolognaBiocomp/deepmito. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Proteínas Mitocondriais , Redes Neurais de Computação , Software , Biologia Computacional/métodos , Humanos , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Transporte Proteico
12.
Int J Mol Sci ; 22(6)2021 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-33809039

RESUMO

Taking advantage of the last cryogenic electron microscopy structure of human huntingtin, we explored with computational methods its physicochemical properties, focusing on the solvent accessible surface of the protein and highlighting a quite interesting mix of hydrophobic and hydrophilic patterns, with the prevalence of the latter ones. We then evaluated the probability of exposed residues to be in contact with other proteins, discovering that they tend to cluster in specific regions of the protein. We then found that the remaining portions of the protein surface can contain calcium-binding sites that we propose here as putative mediators for the protein to interact with membranes. Our findings are justified in relation to the present knowledge of huntingtin functional annotation.


Assuntos
Cálcio/metabolismo , Biologia Computacional , Proteína Huntingtina/química , Proteínas/genética , Sítios de Ligação/genética , Humanos , Proteína Huntingtina/genética , Proteína Huntingtina/ultraestrutura , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Ligação Proteica/genética , Solventes/química , Propriedades de Superfície
13.
Int J Mol Sci ; 23(1)2021 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-35008593

RESUMO

MTHFR deficiency still deserves an investigation to associate the phenotype to protein structure variations. To this aim, considering the MTHFR wild type protein structure, with a catalytic and a regulatory domain and taking advantage of state-of-the-art computational tools, we explore the properties of 72 missense variations known to be disease associated. By computing the thermodynamic ΔΔG change according to a consensus method that we recently introduced, we find that 61% of the disease-related variations destabilize the protein, are present both in the catalytic and regulatory domain and correspond to known biochemical deficiencies. The propensity of solvent accessible residues to be involved in protein-protein interaction sites indicates that most of the interacting residues are located in the regulatory domain, and that only three of them, located at the interface of the functional protein homodimer, are both disease-related and destabilizing. Finally, we compute the protein architecture with Hidden Markov Models, one from Pfam for the catalytic domain and the second computed in house for the regulatory domain. We show that patterns of disease-associated, physicochemical variation types, both in the catalytic and regulatory domains, are unique for the MTHFR deficiency when mapped into the protein architecture.


Assuntos
Homocistinúria/genética , Metilenotetra-Hidrofolato Redutase (NADPH2)/deficiência , Espasticidade Muscular/genética , Domínio Catalítico/genética , Humanos , Metilenotetra-Hidrofolato Redutase (NADPH2)/genética , Mapas de Interação de Proteínas/genética , Transtornos Psicóticos/genética
14.
BMC Bioinformatics ; 21(Suppl 8): 266, 2020 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-32938368

RESUMO

BACKGROUND: The prediction of protein subcellular localization is a key step of the big effort towards protein functional annotation. Many computational methods exist to identify high-level protein subcellular compartments such as nucleus, cytoplasm or organelles. However, many organelles, like mitochondria, have their own internal compartmentalization. Knowing the precise location of a protein inside mitochondria is crucial for its accurate functional characterization. We recently developed DeepMito, a new method based on a 1-Dimensional Convolutional Neural Network (1D-CNN) architecture outperforming other similar approaches available in literature. RESULTS: Here, we explore the adoption of DeepMito for the large-scale annotation of four sub-mitochondrial localizations on mitochondrial proteomes of five different species, including human, mouse, fly, yeast and Arabidopsis thaliana. A significant fraction of the proteins from these organisms lacked experimental information about sub-mitochondrial localization. We adopted DeepMito to fill the gap, providing complete characterization of protein localization at sub-mitochondrial level for each protein of the five proteomes. Moreover, we identified novel mitochondrial proteins fishing on the set of proteins lacking any subcellular localization annotation using available state-of-the-art subcellular localization predictors. We finally performed additional functional characterization of proteins predicted by DeepMito as localized into the four different sub-mitochondrial compartments using both available experimental and predicted GO terms. All data generated in this study were collected into a database called DeepMitoDB (available at http://busca.biocomp.unibo.it/deepmitodb ), providing complete functional characterization of 4307 mitochondrial proteins from the five species. CONCLUSIONS: DeepMitoDB offers a comprehensive view of mitochondrial proteins, including experimental and predicted fine-grain sub-cellular localization and annotated and predicted functional annotations. The database complements other similar resources providing characterization of new proteins. Furthermore, it is also unique in including localization information at the sub-mitochondrial level. For this reason, we believe that DeepMitoDB can be a valuable resource for mitochondrial research.


Assuntos
Biologia Computacional/métodos , Proteínas Mitocondriais/genética , Transporte Proteico/genética , Animais , Humanos
15.
J Proteome Res ; 19(7): 2873-2878, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31971806

RESUMO

Omics techniques provide a spectrum of information at the genomic level, whose analysis can characterize complex traits at a molecular level. The relationship among genotype and phenotype implies that from genome information the molecular pathways and biological processes underlying a given phenotype are discovered. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here we present NETGE-PLUS, a Web server for standard and network-based functional interpretation of gene sets of human and of model organisms, including Sus scrofa, Saccharomyces cerevisiae, Escherichia coli, and Arabidopsis thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, introducing also the possibility of exploring relationships among KEGG pathways. A Web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it.


Assuntos
Arabidopsis , Software , Arabidopsis/genética , Bases de Dados Genéticas , Genômica , Humanos , Armazenamento e Recuperação da Informação , Internet , Probabilidade
16.
Bioinformatics ; 35(9): 1513-1517, 2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-30329016

RESUMO

MOTIVATION: Accurate prediction of protein stability changes upon single-site variations (ΔΔG) is important for protein design, as well as for our understanding of the mechanisms of genetic diseases. The performance of high-throughput computational methods to this end is evaluated mostly based on the Pearson correlation coefficient between predicted and observed data, assuming that the upper bound would be 1 (perfect correlation). However, the performance of these predictors can be limited by the distribution and noise of the experimental data. Here we estimate, for the first time, a theoretical upper-bound to the ΔΔG prediction performances imposed by the intrinsic structure of currently available ΔΔG data. RESULTS: Given a set of measured ΔΔG protein variations, the theoretically "best predictor" is estimated based on its similarity to another set of experimentally determined ΔΔG values. We investigate the correlation between pairs of measured ΔΔG variations, where one is used as a predictor for the other. We analytically derive an upper bound to the Pearson correlation as a function of the noise and distribution of the ΔΔG data. We also evaluate the available datasets to highlight the effect of the noise in conjunction with ΔΔG distribution. We conclude that the upper bound is a function of both uncertainty and spread of the ΔΔG values, and that with current data the best performance should be between 0.7 and 0.8, depending on the dataset used; higher Pearson correlations might be indicative of overtraining. It also follows that comparisons of predictors using different datasets are inherently misleading. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas/genética , Mutação , Estabilidade Proteica
17.
Nucleic Acids Res ; 46(W1): W459-W466, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29718411

RESUMO

Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.


Assuntos
Células Eucarióticas/química , Proteínas de Membrana/genética , Proteínas Mitocondriais/genética , Células Procarióticas/química , Software , Bactérias/química , Bactérias/ultraestrutura , Benchmarking , Membrana Celular/química , Membrana Celular/ultraestrutura , Núcleo Celular/química , Núcleo Celular/ultraestrutura , Cloroplastos/química , Cloroplastos/ultraestrutura , Eucariotos/química , Eucariotos/ultraestrutura , Células Eucarióticas/ultraestrutura , Expressão Gênica , Ontologia Genética , Internet , Proteínas de Membrana/metabolismo , Mitocôndrias/química , Mitocôndrias/ultraestrutura , Proteínas Mitocondriais/metabolismo , Anotação de Sequência Molecular , Células Procarióticas/ultraestrutura , Sinais Direcionadores de Proteínas/genética
18.
Hum Mutat ; 40(9): 1455-1462, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31066146

RESUMO

In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.


Assuntos
Biologia Computacional/métodos , Variação Genética , Proteínas/química , Proteínas/genética , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Fenótipo , Estabilidade Proteica
19.
Hum Mutat ; 40(9): 1463-1473, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31283071

RESUMO

This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.


Assuntos
Calmodulina/química , Calmodulina/genética , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Leveduras/crescimento & desenvolvimento , Algoritmos , Sítios de Ligação , Cálcio/metabolismo , Calmodulina/metabolismo , Evolução Molecular , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptidão Genética , Humanos , Modelos Genéticos , Modelos Moleculares , Conformação Proteica , Engenharia de Proteínas , Leveduras/genética
20.
Hum Mutat ; 40(9): 1495-1506, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31184403

RESUMO

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.


Assuntos
Biologia Computacional/métodos , Metiltransferases/química , Mutação , PTEN Fosfo-Hidrolase/química , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metiltransferases/genética , PTEN Fosfo-Hidrolase/genética , Estabilidade Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA