Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Entropy (Basel) ; 25(12)2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38136454

RESUMO

Research on Explainable Artificial Intelligence has recently started exploring the idea of producing explanations that, rather than being expressed in terms of low-level features, are encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in human-interpretable representation learning (hrl) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post hoc explainers and concept-based neural networks. Our formalization of hrl builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us derive a principled notion of alignment between the machine's representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglement. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.

2.
Ethics Inf Technol ; 23(Suppl 1): 1-6, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33551673

RESUMO

The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates-if and when they want and for specific aims-with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.

3.
BMC Bioinformatics ; 20(1): 338, 2019 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-31208327

RESUMO

BACKGROUND: The advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale. RESULTS: We present OCELOT, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as OCELOT), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.


Assuntos
Genoma Fúngico , Genômica/métodos , Anotação de Sequência Molecular , Proteínas/genética , Saccharomyces cerevisiae/genética , Algoritmos , Tomada de Decisões , Ontologia Genética
4.
Bioinformatics ; 32(23): 3627-3634, 2016 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-27503225

RESUMO

MOTIVATION: Information about RNA-protein interactions is a vital pre-requisite to tackle the dissection of RNA regulatory processes. Despite the recent advances of the experimental techniques, the currently available RNA interactome involves a small portion of the known RNA binding proteins. The importance of determining RNA-protein interactions, coupled with the scarcity of the available information, calls for in silico prediction of such interactions. RESULTS: We present RNAcommender, a recommender system capable of suggesting RNA targets to unexplored RNA binding proteins, by propagating the available interaction information taking into account the protein domain composition and the RNA predicted secondary structure. Our results show that RNAcommender is able to successfully suggest RNA interactors for RNA binding proteins using little or no interaction evidence. RNAcommender was tested on a large dataset of human RBP-RNA interactions, showing a good ranking performance (average AUC ROC of 0.75) and significant enrichment of correct recommendations for 75% of the tested RBPs. RNAcommender can be a valid tool to assist researchers in identifying potential interacting candidates for the majority of RBPs with uncharacterized binding preferences. AVAILABILITY AND IMPLEMENTATION: The software is freely available at http://rnacommender.disi.unitn.it CONTACT: gianluca.corrado@unitn.it or andrea.passerini@unitn.itSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas de Ligação a RNA/química , RNA/química , Software , Humanos , Ligação Proteica
5.
BMC Bioinformatics ; 15: 16, 2014 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-24428894

RESUMO

BACKGROUND: Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. RESULTS: By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85-100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177-W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094-2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. CONCLUSIONS: The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321-W326, 2004].


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Estrutura Terciária de Proteína , Software
6.
BMC Bioinformatics ; 15: 309, 2014 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-25238967

RESUMO

BACKGROUND: Viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that effectively counter potentially resistant mutants. RESULTS: We propose a simple statistical relational learning approach for mutant prediction where the input consists of mutation data with drug-resistance information, either as sets of mutations conferring resistance to a certain drug, or as sets of mutants with information on their susceptibility to the drug. The algorithm learns a set of relational rules characterizing drug-resistance and uses them to generate a set of potentially resistant mutants. Learning a weighted combination of rules allows to attach generated mutants with a resistance score as predicted by the statistical relational model and select only the highest scoring ones. CONCLUSIONS: Promising results were obtained in generating resistant mutations for both nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. The approach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations.


Assuntos
Algoritmos , Farmacorresistência Viral , Infecções por HIV/virologia , HIV/genética , Modelos Genéticos , Mutação , Inibidores da Transcriptase Reversa/farmacologia , Sequência de Aminoácidos , Inteligência Artificial , HIV/efeitos dos fármacos , HIV/enzimologia , Infecções por HIV/tratamento farmacológico , Transcriptase Reversa do HIV/química , Transcriptase Reversa do HIV/metabolismo , Humanos , Modelos Biológicos , Modelos Estatísticos , Dados de Sequência Molecular , Nucleosídeos/química , Nucleosídeos/farmacologia , Inibidores da Transcriptase Reversa/química
7.
BMC Bioinformatics ; 15: 103, 2014 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-24725682

RESUMO

BACKGROUND: Protein-protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels. RESULTS: Inspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels. CONCLUSIONS: We study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein-domain-residue hierarchy.


Assuntos
Inteligência Artificial , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Semântica , Modelos Moleculares , Ligação Proteica , Proteínas/metabolismo , Software
8.
BMC Genomics ; 15: 304, 2014 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-24758252

RESUMO

BACKGROUND: The progress in mapping RNA-protein and RNA-RNA interactions at the transcriptome-wide level paves the way to decipher possible combinatorial patterns embedded in post-transcriptional regulation of gene expression. RESULTS: Here we propose an innovative computational tool to extract clusters of mRNA trans-acting co-regulators (RNA binding proteins and non-coding RNAs) from pairwise interaction annotations. In addition the tool allows to analyze the binding site similarity of co-regulators belonging to the same cluster, given their positional binding information. The tool has been tested on experimental collections of human and yeast interactions, identifying modules that coordinate functionally related messages. CONCLUSIONS: This tool is an original attempt to uncover combinatorial patterns using all the post-transcriptional interaction data available so far. PTRcombiner is available at http://disi.unitn.it/~passerini/software/PTRcombiner/.


Assuntos
Regulação da Expressão Gênica , Processamento Pós-Transcricional do RNA , Sítios de Ligação
9.
Genome Res ; 21(6): 898-907, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21482623

RESUMO

High-throughput X-ray absorption spectroscopy was used to measure transition metal content based on quantitative detection of X-ray fluorescence signals for 3879 purified proteins from several hundred different protein families generated by the New York SGX Research Center for Structural Genomics. Approximately 9% of the proteins analyzed showed the presence of transition metal atoms (Zn, Cu, Ni, Co, Fe, or Mn) in stoichiometric amounts. The method is highly automated and highly reliable based on comparison of the results to crystal structure data derived from the same protein set. To leverage the experimental metalloprotein annotations, we used a sequence-based de novo prediction method, MetalDetector, to identify Cys and His residues that bind to transition metals for the redundancy reduced subset of 2411 sequences sharing <70% sequence identity and having at least one His or Cys. As the HT-XAS identifies metal type and protein binding, while the bioinformatics analysis identifies metal- binding residues, the results were combined to identify putative metal-binding sites in the proteins and their associated families. We explored the combination of this data with homology models to generate detailed structure models of metal-binding sites for representative proteins. Finally, we used extended X-ray absorption fine structure data from two of the purified Zn metalloproteins to validate predicted metalloprotein binding site structures. This combination of experimental and bioinformatics approaches provides comprehensive active site analysis on the genome scale for metalloproteins as a class, revealing new insights into metalloprotein structure and function.


Assuntos
Metaloproteínas/química , Software , Espectroscopia por Absorção de Raios X/métodos , Sítios de Ligação/genética , Biologia Computacional/métodos , Fluorescência , Genômica/métodos , Metais Pesados/análise , Síncrotrons
10.
Front Artif Intell ; 7: 1346684, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38419732

RESUMO

Bundle recommendation aims to generate bundles of associated products that users tend to consume as a whole under certain circumstances. Modeling the bundle utility for users is a non-trivial task, as it requires to account for the potential interdependencies between bundle attributes. To address this challenge, we introduce a new preference-based approach for bundle recommendation exploiting the Choquet integral. This allows us to formalize preferences for coalitions of environmental-related attributes, thus recommending product bundles accounting for synergies among product attributes. An experimental evaluation of a dataset of local food products in Northern Italy shows how the Choquet integral allows the natural formalization of a sensible notion of environmental friendliness and that standard approaches based on weighted sums of attributes end up recommending bundles with lower environmental friendliness even if weights are explicitly learned to maximize it. We further show how preference elicitation strategies can be leveraged to acquire weights of the Choquet integral from user feedback in terms of preferences over candidate bundles, and show how a handful of queries allow to recommend optimal bundles for a diverse set of user prototypes.

11.
Nat Rev Microbiol ; 22(4): 191-205, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37968359

RESUMO

Machine learning is increasingly important in microbiology where it is used for tasks such as predicting antibiotic resistance and associating human microbiome features with complex host diseases. The applications in microbiology are quickly expanding and the machine learning tools frequently used in basic and clinical research range from classification and regression to clustering and dimensionality reduction. In this Review, we examine the main machine learning concepts, tasks and applications that are relevant for experimental and clinical microbiologists. We provide the minimal toolbox for a microbiologist to be able to understand, interpret and use machine learning in their experimental and translational activities.


Assuntos
Aprendizado de Máquina , Microbiota , Humanos
12.
Nucleic Acids Res ; 39(Web Server issue): W288-92, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21576237

RESUMO

MetalDetector identifies CYS and HIS involved in transition metal protein binding sites, starting from sequence alone. A major new feature of release 2.0 is the ability to predict which residues are jointly involved in the coordination of the same metal ion. The server is available at http://metaldetector.dsi.unifi.it/v2.0/.


Assuntos
Metaloproteínas/química , Metais/química , Software , Sítios de Ligação , Cisteína/química , Histidina/química , Internet , Análise de Sequência de Proteína
13.
Cognition ; 234: 105355, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36791607

RESUMO

Bayesianism assumes that probabilistic updating does not depend on the sensory modality by which information is processed. In this study, we investigate whether probability judgments based on visual and auditory information conform to this assumption. In a series of five experiments, we found that this is indeed the case when information is acquired through a single modality (i.e., only auditory or only visual) but not necessarily so when it comes from multiple modalities (i.e., audio-visual). In the latter case, judgments prove more accurate when both visual and auditory information individually support (i.e., increase the probability of) the hypothesis they also jointly support (synergy condition) than when either visual or auditory information support one hypothesis that is not the one they jointly support (contrast condition). In the extreme case in which both visual and auditory information individually support an alternative hypothesis to the one they jointly support (i.e., double-contrast condition), participants' accuracy is not only lower than in the synergy condition but near chance. This synergy-contrast effect represents a violation of the assumption that information modality is irrelevant for Bayesian updating and indicates an important limitation of multisensory integration, one which has not been previously documented.


Assuntos
Percepção Auditiva , Percepção Visual , Humanos , Teorema de Bayes , Resolução de Problemas , Julgamento , Estimulação Acústica , Estimulação Luminosa
14.
BMC Genomics ; 13: 220, 2012 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-22672192

RESUMO

BACKGROUND: The classical view on eukaryotic gene expression proposes the scheme of a forward flow for which fluctuations in mRNA levels upon a stimulus contribute to determine variations in mRNA availability for translation. Here we address this issue by simultaneously profiling with microarrays the total mRNAs (the transcriptome) and the polysome-associated mRNAs (the translatome) after EGF treatment of human cells, and extending the analysis to other 19 different transcriptome/translatome comparisons in mammalian cells following different stimuli or undergoing cell programs. RESULTS: Triggering of the EGF pathway results in an early induction of transcriptome and translatome changes, but 90% of the significant variation is limited to the translatome and the degree of concordant changes is less than 5%. The survey of other 19 different transcriptome/translatome comparisons shows that extensive uncoupling is a general rule, in terms of both RNA movements and inferred cell activities, with a strong tendency of translation-related genes to be controlled purely at the translational level. By different statistical approaches, we finally provide evidence of the lack of dependence between changes at the transcriptome and translatome levels. CONCLUSIONS: We propose a model of diffused independency between variation in transcript abundances and variation in their engagement on polysomes, which implies the existence of specific mechanisms to couple these two ways of regulating gene expression.


Assuntos
Fator de Crescimento Epidérmico/farmacologia , Biossíntese de Proteínas/efeitos dos fármacos , Transcriptoma/efeitos dos fármacos , Receptores ErbB/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Células HeLa , Humanos , RNA/metabolismo , Transdução de Sinais
15.
BMC Bioinformatics ; 11: 115, 2010 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-20199672

RESUMO

BACKGROUND: Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues. RESULTS: We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood. CONCLUSIONS: Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.


Assuntos
Domínio Catalítico , Conformação Proteica , Proteínas/química , Catálise , Bases de Dados de Proteínas , Modelos Teóricos , Dobramento de Proteína
16.
IEEE Trans Med Imaging ; 39(8): 2676-2687, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32406829

RESUMO

Deep learning (DL) has proved successful in medical imaging and, in the wake of the recent COVID-19 pandemic, some works have started to investigate DL-based solutions for the assisted diagnosis of lung diseases. While existing works focus on CT scans, this paper studies the application of DL techniques for the analysis of lung ultrasonography (LUS) images. Specifically, we present a novel fully-annotated dataset of LUS images collected from several Italian hospitals, with labels indicating the degree of disease severity at a frame-level, video-level, and pixel-level (segmentation masks). Leveraging these data, we introduce several deep models that address relevant tasks for the automatic analysis of LUS images. In particular, we present a novel deep network, derived from Spatial Transformer Networks, which simultaneously predicts the disease severity score associated to a input frame and provides localization of pathological artefacts in a weakly-supervised way. Furthermore, we introduce a new method based on uninorms for effective frame score aggregation at a video-level. Finally, we benchmark state of the art deep models for estimating pixel-level segmentations of COVID-19 imaging biomarkers. Experiments on the proposed dataset demonstrate satisfactory results on all the considered tasks, paving the way to future research on DL for the assisted diagnosis of COVID-19 from LUS data.


Assuntos
Infecções por Coronavirus/diagnóstico por imagem , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Pneumonia Viral/diagnóstico por imagem , Ultrassonografia/métodos , Betacoronavirus , COVID-19 , Humanos , Pulmão/diagnóstico por imagem , Pandemias , Sistemas Automatizados de Assistência Junto ao Leito , SARS-CoV-2
17.
Bioinformatics ; 24(18): 2094-5, 2008 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-18635571

RESUMO

UNLABELLED: The web server MetalDetector classifies histidine residues in proteins into one of two states (free or metal bound) and cysteines into one of three states (free, metal bound or disulfide bridged). A decision tree integrates predictions from two previously developed methods (DISULFIND and Metal Ligand Predictor). Cross-validated performance assessment indicates that our server predicts disulfide bonding state at 88.6% precision and 85.1% recall, while it identifies cysteines and histidines in transition metal-binding sites at 79.9% precision and 76.8% recall, and at 60.8% precision and 40.7% recall, respectively. AVAILABILITY: Freely available at http://metaldetector.dsi.unifi.it. SUPPLEMENTARY INFORMATION: Details and data can be found at http://metaldetector.dsi.unifi.it/help.php.


Assuntos
Biologia Computacional/métodos , Cisteína/química , Dissulfetos/química , Histidina/química , Metaloproteínas/química , Análise de Sequência de Proteína , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Bases de Dados de Proteínas , Dissulfetos/metabolismo , Internet , Metaloproteínas/metabolismo , Dados de Sequência Molecular , Alinhamento de Sequência
18.
BMC Bioinformatics ; 9: 20, 2008 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-18194539

RESUMO

BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.


Assuntos
Dissulfetos/química , Modelos Químicos , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Ligação Proteica
19.
Nucleic Acids Res ; 34(Web Server issue): W177-81, 2006 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-16844986

RESUMO

DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at http://disulfind.dsi.unifi.it/.


Assuntos
Cisteína/química , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Dissulfetos/química , Internet , Interface Usuário-Computador
20.
BMC Bioinformatics ; 8: 39, 2007 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-17280606

RESUMO

BACKGROUND: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial. RESULTS: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: http://zincfinder.dsi.unifi.it CONCLUSION: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists.


Assuntos
Algoritmos , Metaloproteínas/química , Modelos Químicos , Modelos Moleculares , Proteoma/química , Análise de Sequência de Proteína/métodos , Zinco/química , Sequência de Aminoácidos , Sítios de Ligação , Metaloproteínas/ultraestrutura , Dados de Sequência Molecular , Ligação Proteica , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA