Búsqueda | Portal Regional de la BVS

1.

Machine learning approach for prediction of outcomes in anticoagulated patients with atrial fibrillation.

Bernardini, Andrea; Bindini, Luca; Antonucci, Emilia; Berteotti, Martina; Giusti, Betti; Testa, Sophie; Palareti, Gualtiero; Poli, Daniela; Frasconi, Paolo; Marcucci, Rossella.

Int J Cardiol ; 407: 132088, 2024 Jul 15.

Artículo en Inglés | MEDLINE | ID: mdl-38657869

RESUMEN

BACKGROUND: The accuracy of available prediction tools for clinical outcomes in patients with atrial fibrillation (AF) remains modest. Machine Learning (ML) has been used to predict outcomes in the AF population, but not in a population entirely on anticoagulant therapy. METHODS AND AIMS: Different supervised ML models were applied to predict all-cause death, cardiovascular (CV) death, major bleeding and stroke in anticoagulated patients with AF, processing data from the multicenter START-2 Register. RESULTS: 11078 AF patients (male n = 6029, 54.3%) were enrolled with a median follow-up period of 1.5 years [IQR 1.0-2.6]. Patients on Vitamin K Antagonists (VKA) were 5135 (46.4%) and 5943 (53.6%) were on Direct Oral Anticoagulants (DOAC). Using Multi-Gate Mixture of Experts, a cross-validated AUC of 0.779 ± 0.016 and 0.745 ± 0.022 were obtained, respectively, for the prediction of all-cause death and CV-death in the overall population. The best ML model outperformed CHA2DSVA2SC and HAS-BLED for all-cause death prediction (p < 0.001 for both). When compared to HAS-BLED, Gradient Boosting improved major bleeding prediction in DOACs patients (0.711 vs. 0.586, p < 0.001). A very low number of events during follow-up (52) resulted in a suboptimal ischemic stroke prediction (best AUC of 0.606 ± 0.117 in overall population). Body mass index, age, renal function, platelet count and hemoglobin levels resulted the most important variables for ML prediction. CONCLUSIONS: In AF patients, ML models showed good discriminative ability to predict all-cause death, regardless of the type of anticoagulation strategy, and major bleeding on DOAC therapy, outperforming CHA2DS2VASC and the HAS-BLED scores for risk prediction in these populations.

Asunto(s)

Anticoagulantes , Fibrilación Atrial , Aprendizaje Automático , Humanos , Fibrilación Atrial/tratamiento farmacológico , Fibrilación Atrial/complicaciones , Masculino , Femenino , Anciano , Anticoagulantes/uso terapéutico , Accidente Cerebrovascular/prevención & control , Accidente Cerebrovascular/epidemiología , Accidente Cerebrovascular/etiología , Anciano de 80 o más Años , Sistema de Registros , Persona de Mediana Edad , Estudios de Seguimiento , Valor Predictivo de las Pruebas , Hemorragia/inducido químicamente , Hemorragia/epidemiología , Resultado del Tratamiento , Medición de Riesgo/métodos

2.

Two-Dimensional Aortic Size Normalcy: A Novelty Detection Approach.

Frasconi, Paolo; Baracchi, Daniele; Giusti, Betti; Kura, Ada; Spaziani, Gaia; Cherubini, Antonella; Favilli, Silvia; Di Lenarda, Andrea; Pepe, Guglielmina; Nistri, Stefano.

Diagnostics (Basel) ; 11(2)2021 Feb 02.

Artículo en Inglés | MEDLINE | ID: mdl-33540834

RESUMEN

Background: To develop a tool for assessing normalcy of the thoracic aorta (TA) by echocardiography, based on either a linear regression model (Z-score), or a machine learning technique, namely one-class support vector machine (OC-SVM) (Q-score). Methods: TA diameters were measured in 1112 prospectively enrolled healthy subjects, aging 5 to 89 years. Considering sex, age and body surface area we developed two calculators based on the traditional Z-score and the novel Q-score. The calculators were compared in 198 adults with TA > 40 mm, and in 466 patients affected by either Marfan syndrome or bicuspid aortic valve (BAV). Results: Q-score attained a better Area Under the Curve (0.989; 95% CI 0.984-0.993, sensitivity = 97.5%, specificity = 95.4%) than Z-score (0.955; 95% CI 0.942-0.967, sensitivity = 81.3%, specificity = 93.3%; p < 0.0001) in patients with TA > 40 mm. The prevalence of TA dilatation in Marfan and BAV patients was higher as Z-score > 2 than as Q-score < 4% (73.4% vs. 50.09%, p < 0.00001). Conclusions: Q-score is a novel tool for assessing TA normalcy based on a model requiring less assumptions about the distribution of the relevant variables. Notably, diameters do not need to depend linearly on anthropometric measurements. Additionally, Q-score can capture the joint distribution of these variables with all four diameters simultaneously, thus accounting for the overall aortic shape. This approach results in a lower rate of predicted TA abnormalcy in patients at risk of TA aneurysm. Further prognostic studies will be necessary for assessing the relative effectiveness of Q-score versus Z-score.

3.

Classification of Cancer Pathology Reports: A Large-Scale Comparative Study.

Martina, Stefano; Ventura, Leonardo; Frasconi, Paolo.

IEEE J Biomed Health Inform ; 24(11): 3085-3094, 2020 11.

Artículo en Inglés | MEDLINE | ID: mdl-32749978

RESUMEN

We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification. Moreover, the maximum aggregator offers a way to interpret the classification process.

Asunto(s)

Neoplasias , Humanos

4.

Publisher Correction: Whole-Brain Vasculature Reconstruction at the Single Capillary Level.

Di Giovanna, Antonino Paolo; Tibo, Alessandro; Silvestri, Ludovico; Müllenbroich, Marie Caroline; Costantini, Irene; Allegra Mascaro, Anna Letizia; Sacconi, Leonardo; Frasconi, Paolo; Pavone, Francesco Saverio.

Sci Rep ; 9(1): 8765, 2019 Jun 14.

Artículo en Inglés | MEDLINE | ID: mdl-31201354

RESUMEN

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

5.

Whole-Brain Vasculature Reconstruction at the Single Capillary Level.

Di Giovanna, Antonino Paolo; Tibo, Alessandro; Silvestri, Ludovico; Müllenbroich, Marie Caroline; Costantini, Irene; Allegra Mascaro, Anna Letizia; Sacconi, Leonardo; Frasconi, Paolo; Pavone, Francesco Saverio.

Sci Rep ; 8(1): 12573, 2018 08 22.

Artículo en Inglés | MEDLINE | ID: mdl-30135559

RESUMEN

The distinct organization of the brain's vascular network ensures that it is adequately supplied with oxygen and nutrients. However, despite this fundamental role, a detailed reconstruction of the brain-wide vasculature at the capillary level remains elusive, due to insufficient image quality using the best available techniques. Here, we demonstrate a novel approach that improves vascular demarcation by combining CLARITY with a vascular staining approach that can fill the entire blood vessel lumen and imaging with light-sheet fluorescence microscopy. This method significantly improves image contrast, particularly in depth, thereby allowing reliable application of automatic segmentation algorithms, which play an increasingly important role in high-throughput imaging of the terabyte-sized datasets now routinely produced. Furthermore, our novel method is compatible with endogenous fluorescence, thus allowing simultaneous investigations of vasculature and genetically targeted neurons. We believe our new method will be valuable for future brain-wide investigations of the capillary network.

Asunto(s)

Encéfalo/irrigación sanguínea , Capilares/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador , Microscopía Fluorescente , Animales , Encéfalo/citología , Capilares/fisiología , Masculino , Ratones , Ratones Endogámicos C57BL , Neovascularización Fisiológica , Neuronas/citología , Relación Señal-Ruido , Tomografía

6.

Shift Aggregate Extract Networks.

Orsini, Francesco; Baracchi, Daniele; Frasconi, Paolo.

Front Robot AI ; 5: 42, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-33500928

RESUMEN

We introduce an architecture based on deep hierarchical decompositions to learn effective representations of large graphs. Our framework extends classic R-decompositions used in kernel methods, enabling nested part-of-part relations. Unlike recursive neural networks, which unroll a template on input graphs directly, we unroll a neural network template over the decomposition hierarchy, allowing us to deal with the high degree variability that typically characterize social network graphs. Deep hierarchical decompositions are also amenable to domain compression, a technique that reduces both space and time complexity by exploiting symmetries. We show empirically that our approach is able to outperform current state-of-the-art graph classification methods on large social network datasets, while at the same time being competitive on small chemobiological benchmark datasets.

7.

RNAcommender: genome-wide recommendation of RNA-protein interactions.

Corrado, Gianluca; Tebaldi, Toma; Costa, Fabrizio; Frasconi, Paolo; Passerini, Andrea.

Bioinformatics ; 32(23): 3627-3634, 2016 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-27503225

RESUMEN

MOTIVATION: Information about RNA-protein interactions is a vital pre-requisite to tackle the dissection of RNA regulatory processes. Despite the recent advances of the experimental techniques, the currently available RNA interactome involves a small portion of the known RNA binding proteins. The importance of determining RNA-protein interactions, coupled with the scarcity of the available information, calls for in silico prediction of such interactions. RESULTS: We present RNAcommender, a recommender system capable of suggesting RNA targets to unexplored RNA binding proteins, by propagating the available interaction information taking into account the protein domain composition and the RNA predicted secondary structure. Our results show that RNAcommender is able to successfully suggest RNA interactors for RNA binding proteins using little or no interaction evidence. RNAcommender was tested on a large dataset of human RBP-RNA interactions, showing a good ranking performance (average AUC ROC of 0.75) and significant enrichment of correct recommendations for 75% of the tested RBPs. RNAcommender can be a valid tool to assist researchers in identifying potential interacting candidates for the majority of RBPs with uncharacterized binding preferences. AVAILABILITY AND IMPLEMENTATION: The software is freely available at http://rnacommender.disi.unitn.it CONTACT: gianluca.corrado@unitn.it or andrea.passerini@unitn.itSupplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Proteínas de Unión al ARN/química , ARN/química , Programas Informáticos , Humanos , Unión Proteica

8.

Quantitative neuroanatomy of all Purkinje cells with light sheet microscopy and high-throughput image analysis.

Silvestri, Ludovico; Paciscopi, Marco; Soda, Paolo; Biamonte, Filippo; Iannello, Giulio; Frasconi, Paolo; Pavone, Francesco S.

Front Neuroanat ; 9: 68, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26074783

RESUMEN

Characterizing the cytoarchitecture of mammalian central nervous system on a brain-wide scale is becoming a compelling need in neuroscience. For example, realistic modeling of brain activity requires the definition of quantitative features of large neuronal populations in the whole brain. Quantitative anatomical maps will also be crucial to classify the cytoarchtitectonic abnormalities associated with neuronal pathologies in a high reproducible and reliable manner. In this paper, we apply recent advances in optical microscopy and image analysis to characterize the spatial distribution of Purkinje cells (PCs) across the whole cerebellum. Light sheet microscopy was used to image with micron-scale resolution a fixed and cleared cerebellum of an L7-GFP transgenic mouse, in which all PCs are fluorescently labeled. A fast and scalable algorithm for fully automated cell identification was applied on the image to extract the position of all the fluorescent PCs. This vectorized representation of the cell population allows a thorough characterization of the complex three-dimensional distribution of the neurons, highlighting the presence of gaps inside the lamellar organization of PCs, whose density is believed to play a significant role in autism spectrum disorders. Furthermore, clustering analysis of the localized somata permits dividing the whole cerebellum in groups of PCs with high spatial correlation, suggesting new possibilities of anatomical partition. The quantitative approach presented here can be extended to study the distribution of different types of cell in many brain regions and across the whole encephalon, providing a robust base for building realistic computational models of the brain, and for unbiased morphological tissue screening in presence of pathologies and/or drug treatments.

9.

Computer-based automatic identification of neurons in gigavoxel-sized 3D human brain images.

Soda, Paolo; Acciai, Ludovica; Cordelli, Ermanno; Costantini, Irene; Sacconi, Leonardo; Pavone, Francesco Saverio; Conti, Valerio; Guerrini, Renzo; Frasconi, Paolo; Iannello, Giulio.

Annu Int Conf IEEE Eng Med Biol Soc ; 2015: 7724-7, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26738082

RESUMEN

Achieving a comprehensive knowledge of the human brain cytoarchitecture is a fundamental step to understand how the nervous system works, i.e., one of the greatest challenge of 21(st) century science. The recent development of biological tissue labeling and automated microscopic imaging systems has permitted to acquire images at the micro-resolution, which produce a huge quantity of data that cannot be manually analyzed. In case of mammals brain, automatic methods to extract objective information at the microscale have been applied until now to mice, macaque and cat 3D volume images. Here we report a method to automatically localize neurons in a sample of human brain removed during a surgical procedure for the treatments of drug resistant epilepsy in a child with hemimegalencephaly, whose neurons and neurites were fluorescence labelled and finally imaged using the two-photon fluorescence microscope. The method provides the map of both parvalbuminergic neurons and all other cells nuclei with a satisfactory f-score measured using more than two thousand human labelled soma.

Asunto(s)

Encéfalo/citología , Imagenología Tridimensional/métodos , Neuroimagen/métodos , Neuronas/citología , Humanos

10.

Large-scale automated identification of mouse brain cells in confocal light sheet microscopy images.

Frasconi, Paolo; Silvestri, Ludovico; Soda, Paolo; Cortini, Roberto; Pavone, Francesco S; Iannello, Giulio.

Bioinformatics ; 30(17): i587-93, 2014 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-25161251

RESUMEN

MOTIVATION: Recently, confocal light sheet microscopy has enabled high-throughput acquisition of whole mouse brain 3D images at the micron scale resolution. This poses the unprecedented challenge of creating accurate digital maps of the whole set of cells in a brain. RESULTS: We introduce a fast and scalable algorithm for fully automated cell identification. We obtained the whole digital map of Purkinje cells in mouse cerebellum consisting of a set of 3D cell center coordinates. The method is accurate and we estimated an F1 measure of 0.96 using 56 representative volumes, totaling 1.09 GVoxel and containing 4138 manually annotated soma centers. AVAILABILITY AND IMPLEMENTATION: Source code and its documentation are available at http://bcfind.dinfo.unifi.it/. The whole pipeline of methods is implemented in Python and makes use of Pylearn2 and modified parts of Scikit-learn. Brain images are available on request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Encéfalo/citología , Imagenología Tridimensional/métodos , Microscopía Confocal/métodos , Neuronas/citología , Algoritmos , Animales , Ratones

11.

Markov logic networks for optical chemical structure recognition.

Frasconi, Paolo; Gabbrielli, Francesco; Lippi, Marco; Marinai, Simone.

J Chem Inf Model ; 54(8): 2380-90, 2014 Aug 25.

Artículo en Inglés | MEDLINE | ID: mdl-25068386

RESUMEN

Optical chemical structure recognition is the problem of converting a bitmap image containing a chemical structure formula into a standard structured representation of the molecule. We introduce a novel approach to this problem based on the pipelined integration of pattern recognition techniques with probabilistic knowledge representation and reasoning. Basic entities and relations (such as textual elements, points, lines, etc.) are first extracted by a low-level processing module. A probabilistic reasoning engine based on Markov logic, embodying chemical and graphical knowledge, is subsequently used to refine these pieces of information. An annotated connection table of atoms and bonds is finally assembled and converted into a standard chemical exchange format. We report a successful evaluation on two large image data sets, showing that the method compares favorably with the current state-of-the-art, especially on degraded low-resolution images. The system is available as a web server at http://mlocsr.dinfo.unifi.it.

Asunto(s)

Cadenas de Markov , Reconocimiento de Normas Patrones Automatizadas/estadística & datos numéricos , Bibliotecas de Moléculas Pequeñas/química , Programas Informáticos , Gráficos por Computador , Bases de Datos de Compuestos Químicos , Procesamiento de Imagen Asistido por Computador

12.

Predicting metal-binding sites from protein sequence.

Passerini, Andrea; Lippi, Marco; Frasconi, Paolo.

IEEE/ACM Trans Comput Biol Bioinform ; 9(1): 203-13, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-21606549

RESUMEN

Prediction of binding sites from sequence can significantly help toward determining the function of uncharacterized proteins on a genomic scale. The task is highly challenging due to the enormous amount of alternative candidate configurations. Previous research has only considered this prediction problem starting from 3D information. When starting from sequence alone, only methods that predict the bonding state of selected residues are available. The sole exception consists of pattern-based approaches, which rely on very specific motifs and cannot be applied to discover truly novel sites. We develop new algorithmic ideas based on structured-output learning for determining transition-metal-binding sites coordinated by cysteines and histidines. The inference step (retrieving the best scoring output) is intractable for general output types (i.e., general graphs). However, under the assumption that no residue can coordinate more than one metal ion, we prove that metal binding has the algebraic structure of a matroid, allowing us to employ a very efficient greedy algorithm. We test our predictor in a highly stringent setting where the training set consists of protein chains belonging to SCOP folds different from the ones used for accuracy estimation. In this setting, our predictor achieves 56 percent precision and 60 percent recall in the identification of ligand-ion bonds.

Asunto(s)

Sitios de Unión , Biología Computacional/métodos , Metales , Proteínas , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Metales/química , Metales/metabolismo , Datos de Secuencia Molecular , Proteínas/química , Proteínas/metabolismo

13.

MetalDetector v2.0: predicting the geometry of metal binding sites from protein sequence.

Passerini, Andrea; Lippi, Marco; Frasconi, Paolo.

Nucleic Acids Res ; 39(Web Server issue): W288-92, 2011 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-21576237

RESUMEN

MetalDetector identifies CYS and HIS involved in transition metal protein binding sites, starting from sequence alone. A major new feature of release 2.0 is the ability to predict which residues are jointly involved in the coordination of the same metal ion. The server is available at http://metaldetector.dsi.unifi.it/v2.0/.

Asunto(s)

Metaloproteínas/química , Metales/química , Programas Informáticos , Sitios de Unión , Cisteína/química , Histidina/química , Internet , Análisis de Secuencia de Proteína

14.

Characterization of metalloproteins by high-throughput X-ray absorption spectroscopy.

Shi, Wuxian; Punta, Marco; Bohon, Jen; Sauder, J Michael; D'Mello, Rhijuta; Sullivan, Mike; Toomey, John; Abel, Don; Lippi, Marco; Passerini, Andrea; Frasconi, Paolo; Burley, Stephen K; Rost, Burkhard; Chance, Mark R.

Genome Res ; 21(6): 898-907, 2011 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-21482623

RESUMEN

High-throughput X-ray absorption spectroscopy was used to measure transition metal content based on quantitative detection of X-ray fluorescence signals for 3879 purified proteins from several hundred different protein families generated by the New York SGX Research Center for Structural Genomics. Approximately 9% of the proteins analyzed showed the presence of transition metal atoms (Zn, Cu, Ni, Co, Fe, or Mn) in stoichiometric amounts. The method is highly automated and highly reliable based on comparison of the results to crystal structure data derived from the same protein set. To leverage the experimental metalloprotein annotations, we used a sequence-based de novo prediction method, MetalDetector, to identify Cys and His residues that bind to transition metals for the redundancy reduced subset of 2411 sequences sharing <70% sequence identity and having at least one His or Cys. As the HT-XAS identifies metal type and protein binding, while the bioinformatics analysis identifies metal- binding residues, the results were combined to identify putative metal-binding sites in the proteins and their associated families. We explored the combination of this data with homology models to generate detailed structure models of metal-binding sites for representative proteins. Finally, we used extended X-ray absorption fine structure data from two of the purified Zn metalloproteins to validate predicted metalloprotein binding site structures. This combination of experimental and bioinformatics approaches provides comprehensive active site analysis on the genome scale for metalloproteins as a class, revealing new insights into metalloprotein structure and function.

Asunto(s)

Metaloproteínas/química , Programas Informáticos , Espectroscopía de Absorción de Rayos X/métodos , Sitios de Unión/genética , Biología Computacional/métodos , Fluorescencia , Genómica/métodos , Metales Pesados/análisis , Sincrotrones

15.

Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights.

Lippi, Marco; Frasconi, Paolo.

Bioinformatics ; 25(18): 2326-33, 2009 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-19592394

RESUMEN

MOTIVATION: Accurate prediction of contacts between beta-strand residues can significantly contribute towards ab initio prediction of the 3D structure of many proteins. Contacts in the same protein are highly interdependent. Therefore, significant improvements can be expected by applying statistical relational learners that overcome the usual machine learning assumption that examples are independent and identically distributed. Furthermore, the dependencies among beta-residue contacts are subject to strong regularities, many of which are known a priori. In this article, we take advantage of Markov logic, a statistical relational learning framework that is able to capture dependencies between contacts, and constrain the solution according to domain knowledge expressed by means of weighted rules in a logical language. RESULTS: We introduce a novel hybrid architecture based on neural and Markov logic networks with grounding-specific weights. On a non-redundant dataset, our method achieves 44.9% F(1) measure, with 47.3% precision and 42.7% recall, which is significantly better (P < 0.01) than previously reported performance obtained by 2D recursive neural networks. Our approach also significantly improves the number of chains for which beta-strands are nearly perfectly paired (36% of the chains are predicted with F(1) >or= 70% on coarse map). It also outperforms more general contact predictors on recent CASP 2008 targets.

Asunto(s)

Cadenas de Markov , Redes Neurales de la Computación , Proteínas/química , Biología Computacional/métodos , Bases de Datos de Proteínas , Conformación Proteica

16.

MetalDetector: a web server for predicting metal-binding sites and disulfide bridges in proteins from sequence.

Lippi, Marco; Passerini, Andrea; Punta, Marco; Rost, Burkhard; Frasconi, Paolo.

Bioinformatics ; 24(18): 2094-5, 2008 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-18635571

RESUMEN

UNLABELLED: The web server MetalDetector classifies histidine residues in proteins into one of two states (free or metal bound) and cysteines into one of three states (free, metal bound or disulfide bridged). A decision tree integrates predictions from two previously developed methods (DISULFIND and Metal Ligand Predictor). Cross-validated performance assessment indicates that our server predicts disulfide bonding state at 88.6% precision and 85.1% recall, while it identifies cysteines and histidines in transition metal-binding sites at 79.9% precision and 76.8% recall, and at 60.8% precision and 40.7% recall, respectively. AVAILABILITY: Freely available at http://metaldetector.dsi.unifi.it. SUPPLEMENTARY INFORMATION: Details and data can be found at http://metaldetector.dsi.unifi.it/help.php.

Asunto(s)

Biología Computacional/métodos , Cisteína/química , Disulfuros/química , Histidina/química , Metaloproteínas/química , Análisis de Secuencia de Proteína , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Bases de Datos de Proteínas , Disulfuros/metabolismo , Internet , Metaloproteínas/metabolismo , Datos de Secuencia Molecular , Alineación de Secuencia

17.

A simplified approach to disulfide connectivity prediction from protein sequences.

Vincent, Marc; Passerini, Andrea; Labbé, Matthieu; Frasconi, Paolo.

BMC Bioinformatics ; 9: 20, 2008 Jan 14.

Artículo en Inglés | MEDLINE | ID: mdl-18194539

RESUMEN

BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.

Asunto(s)

Disulfuros/química , Modelos Químicos , Modelos Moleculares , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Datos de Secuencia Molecular , Unión Proteica

18.

Classification of small molecules by two- and three-dimensional decomposition kernels.

Ceroni, Alessio; Costa, Fabrizio; Frasconi, Paolo.

Bioinformatics ; 23(16): 2038-45, 2007 Aug 15.

Artículo en Inglés | MEDLINE | ID: mdl-17550912

RESUMEN

MOTIVATION: Several kernel-based methods have been recently introduced for the classification of small molecules. Most available kernels on molecules are based on 2D representations obtained from chemical structures, but far less work has focused so far on the definition of effective kernels that can also exploit 3D information. RESULTS: We introduce new ideas for building kernels on small molecules that can effectively use and combine 2D and 3D information. We tested these kernels in conjunction with support vector machines for binary classification on the 60 NCI cancer screening datasets as well as on the NCI HIV data set. Our results show that 3D information leveraged by these kernels can consistently improve prediction accuracy in all datasets. AVAILABILITY: An implementation of the small molecule classifier is available from http://www.dsi.unifi.it/neural/src/3DDK.

Asunto(s)

Biomarcadores de Tumor/química , Modelos Químicos , Modelos Moleculares , Proteínas de Neoplasias/química , Proteínas de Neoplasias/ultraestructura , Neoplasias/metabolismo , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Simulación por Computador , Datos de Secuencia Molecular , Proteínas de Neoplasias/clasificación , Reconocimiento de Normas Patrones Automatizadas/métodos , Conformación Proteica

19.

Predicting zinc binding at the proteome level.

Passerini, Andrea; Andreini, Claudia; Menchetti, Sauro; Rosato, Antonio; Frasconi, Paolo.

BMC Bioinformatics ; 8: 39, 2007 Feb 05.

Artículo en Inglés | MEDLINE | ID: mdl-17280606

RESUMEN

BACKGROUND: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial. RESULTS: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: http://zincfinder.dsi.unifi.it CONCLUSION: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists.

Asunto(s)

Algoritmos , Metaloproteínas/química , Modelos Químicos , Modelos Moleculares , Proteoma/química , Análisis de Secuencia de Proteína/métodos , Zinc/química , Secuencia de Aminoácidos , Sitios de Unión , Metaloproteínas/ultraestructura , Datos de Secuencia Molecular , Unión Proteica , Mapeo de Interacción de Proteínas/métodos , Alineación de Secuencia/métodos

20.

Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks.

Passerini, Andrea; Punta, Marco; Ceroni, Alessio; Rost, Burkhard; Frasconi, Paolo.

Proteins ; 65(2): 305-16, 2006 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-16927295

RESUMEN

Accurate predictions of metal-binding sites in proteins by using sequence as the only source of information can significantly help in the prediction of protein structure and function, genome annotation, and in the experimental determination of protein structure. Here, we introduce a method for identifying histidines and cysteines that participate in binding of several transition metals and iron complexes. The method predicts histidines as being in either of two states (free or metal bound) and cysteines in either of three states (free, metal bound, or in disulfide bridges). The method uses only sequence information by utilizing position-specific evolutionary profiles as well as more global descriptors such as protein length and amino acid composition. Our solution is based on a two-stage machine-learning approach. The first stage consists of a support vector machine trained to locally classify the binding state of single histidines and cysteines. The second stage consists of a bidirectional recurrent neural network trained to refine local predictions by taking into account dependencies among residues within the same protein. A simple finite state automaton is employed as a postprocessing in the second stage in order to enforce an even number of disulfide-bonded cysteines. We predict histidines and cysteines in transition-metal-binding sites at 73% precision and 61% recall. We observe significant differences in performance depending on the ligand (histidine or cysteine) and on the metal bound. We also predict cysteines participating in disulfide bridges at 86% precision and 87% recall. Results are compared to those that would be obtained by using expert information as represented by PROSITE motifs and, for disulfide bonds, to state-of-the-art methods.

Asunto(s)

Cisteína/química , Cisteína/metabolismo , Histidina/química , Histidina/metabolismo , Metales Pesados/química , Metales Pesados/metabolismo , Redes Neurales de la Computación , Secuencia de Aminoácidos , Sitios de Unión , Biología Computacional , Cisteína/genética , Histidina/genética , Metaloproteínas/química , Metaloproteínas/genética , Metaloproteínas/metabolismo , Datos de Secuencia Molecular

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA