Búsqueda | Portal de Búsqueda de la BVS España

Hierarchical representation for PPI sites prediction.

Quadrini, Michela; Daberdaku, Sebastian; Ferrari, Carlo.

BMC Bioinformatics ; 23(1): 96, 2022 Mar 20.

Artículo en Inglés | MEDLINE | ID: mdl-35307006

RESUMEN

BACKGROUND: Protein-protein interactions have pivotal roles in life processes, and aberrant interactions are associated with various disorders. Interaction site identification is key for understanding disease mechanisms and design new drugs. Effective and efficient computational methods for the PPI prediction are of great value due to the overall cost of experimental methods. Promising results have been obtained using machine learning methods and deep learning techniques, but their effectiveness depends on protein representation and feature selection. RESULTS: We define a new abstraction of the protein structure, called hierarchical representations, considering and quantifying spatial and sequential neighboring among amino acids. We also investigate the effect of molecular abstractions using the Graph Convolutional Networks technique to classify amino acids as interface and no-interface ones. Our study takes into account three abstractions, hierarchical representations, contact map, and the residue sequence, and considers the eight functional classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0. The performance of our method, evaluated using standard metrics, is compared to the ones obtained with some state-of-the-art protein interface predictors. The analysis of the performance values shows that our method outperforms the considered competitors when the considered molecules are structurally similar. CONCLUSIONS: The hierarchical representation can capture the structural properties that promote the interactions and can be used to represent proteins with unknown structures by codifying only their sequential neighboring. Analyzing the results, we conclude that classes should be arranged according to their architectures rather than functions.

Asunto(s)

Aprendizaje Automático , Proteínas , Aminoácidos , Proteínas/química

Antibody interface prediction with 3D Zernike descriptors and SVM.

Daberdaku, Sebastian; Ferrari, Carlo.

Bioinformatics ; 35(11): 1870-1876, 2019 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-30395191

RESUMEN

MOTIVATION: Antibodies are a class of proteins capable of specifically recognizing and binding to a virtually infinite number of antigens. This binding malleability makes them the most valuable category of biopharmaceuticals for both diagnostic and therapeutic applications. The correct identification of the antigen-binding residues in the antibody is crucial for all antibody design and engineering techniques and could also help to understand the complex antigen binding mechanisms. However, the antibody-binding interface prediction field appears to be still rather underdeveloped. RESULTS: We present a novel method for antibody interface prediction from their experimentally solved structures based on 3D Zernike Descriptors. Roto-translationally invariant descriptors are computed from circular patches of the antibody surface enriched with a chosen subset of physico-chemical properties from the AAindex1 amino acid index set, and are used as samples for a binary classification problem. An SVM classifier is used to distinguish interface surface patches from non-interface ones. The proposed method was shown to outperform other antigen-binding interface prediction software. AVAILABILITY AND IMPLEMENTATION: Linux binaries and Python scripts are available at https://github.com/sebastiandaberdaku/AntibodyInterfacePrediction. The datasets generated and/or analyzed during the current study are available at https://doi.org/10.6084/m9.figshare.5442229. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Programas Informáticos , Máquina de Vectores de Soporte , Aminoácidos , Anticuerpos , Proteínas

Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach.

Tavazzi, Erica; Daberdaku, Sebastian; Vasta, Rosario; Calvo, Andrea; Chiò, Adriano; Di Camillo, Barbara.

BMC Med Inform Decis Mak ; 20(Suppl 5): 174, 2020 08 20.

Artículo en Inglés | MEDLINE | ID: mdl-32819346

RESUMEN

BACKGROUND: Clinical registers constitute an invaluable resource in the medical data-driven decision making context. Accurate machine learning and data mining approaches on these data can lead to faster diagnosis, definition of tailored interventions, and improved outcome prediction. A typical issue when implementing such approaches is the almost unavoidable presence of missing values in the collected data. In this work, we propose an imputation algorithm based on a mutual information-weighted k-nearest neighbours approach, able to handle the simultaneous presence of missing information in different types of variables. We developed and validated the method on a clinical register, constituted by the information collected over subsequent screening visits of a cohort of patients affected by amyotrophic lateral sclerosis. METHODS: For each subject with missing data to be imputed, we create a feature vector constituted by the information collected over his/her first three months of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An ad hoc similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features captured by the mutual information statistic. RESULTS: We validated the proposed imputation method on an independent test set, comparing its performance with those of three state-of-the-art competitors, resulting in better performance. We further assessed the validity of our algorithm by comparing the performance of a survival classifier built on the data imputed with our method versus the one built on the data imputed with the best-performing competitor. CONCLUSIONS: Imputation of missing data is a crucial -and often mandatory- step when working with real-world datasets. The algorithm proposed in this work could effectively impute an amyotrophic lateral sclerosis clinical dataset, by handling the temporal and the mixed-type nature of the data and by exploiting the cross-information among features. We also showed how the imputation quality can affect a machine learning task.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Minería de Datos , Conjuntos de Datos como Asunto , Esclerosis Amiotrófica Lateral , Teorema de Bayes , Enfermedad/clasificación , Humanos , Almacenamiento y Recuperación de la Información

Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.

Daberdaku, Sebastian; Ferrari, Carlo.

BMC Bioinformatics ; 19(1): 35, 2018 02 06.

Artículo en Inglés | MEDLINE | ID: mdl-29409446

RESUMEN

BACKGROUND: The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. RESULTS: In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). CONCLUSIONS: The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.

Asunto(s)

Proteínas/química , Máquina de Vectores de Soporte , Aminoácidos/química , Área Bajo la Curva , Sitios de Unión , Dominios y Motivos de Interacción de Proteínas , Proteínas/metabolismo , Curva ROC

Predicting functional impairment trajectories in amyotrophic lateral sclerosis: a probabilistic, multifactorial model of disease progression.

Tavazzi, Erica; Daberdaku, Sebastian; Zandonà, Alessandro; Vasta, Rosario; Nefussy, Beatrice; Lunetta, Christian; Mora, Gabriele; Mandrioli, Jessica; Grisan, Enrico; Tarlarini, Claudia; Calvo, Andrea; Moglia, Cristina; Drory, Vivian; Gotkine, Marc; Chiò, Adriano; Di Camillo, Barbara.

J Neurol ; 269(7): 3858-3878, 2022 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-35266043

RESUMEN

OBJECTIVE: To employ Artificial Intelligence to model, predict and simulate the amyotrophic lateral sclerosis (ALS) progression over time in terms of variable interactions, functional impairments, and survival. METHODS: We employed demographic and clinical variables, including functional scores and the utilisation of support interventions, of 3940 ALS patients from four Italian and two Israeli registers to develop a new approach based on Dynamic Bayesian Networks (DBNs) that models the ALS evolution over time, in two distinct scenarios of variable availability. The method allows to simulate patients' disease trajectories and predict the probability of functional impairment and survival at different time points. RESULTS: DBNs explicitly represent the relationships between the variables and the pathways along which they influence the disease progression. Several notable inter-dependencies were identified and validated by comparison with literature. Moreover, the implemented tool allows the assessment of the effect of different markers on the disease course, reproducing the probabilistically expected clinical progressions. The tool shows high concordance in terms of predicted and real prognosis, assessed as time to functional impairments and survival (integral of the AU-ROC in the first 36 months between 0.80-0.93 and 0.84-0.89 for the two scenarios, respectively). CONCLUSIONS: Provided only with measurements commonly collected during the first visit, our models can predict time to the loss of independence in walking, breathing, swallowing, communicating, and survival and it can be used to generate in silico patient cohorts with specific characteristics. Our tool provides a comprehensive framework to support physicians in treatment planning and clinical decision-making.

Asunto(s)

Esclerosis Amiotrófica Lateral , Esclerosis Amiotrófica Lateral/diagnóstico , Inteligencia Artificial , Teorema de Bayes , Progresión de la Enfermedad , Humanos , Modelos Estadísticos

A Combined Interpolation and Weighted K-Nearest Neighbours Approach for the Imputation of Longitudinal ICU Laboratory Data.

Daberdaku, Sebastian; Tavazzi, Erica; Di Camillo, Barbara.

J Healthc Inform Res ; 4(2): 174-188, 2020 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-35415441

RESUMEN

The presence of missing data is a common problem that affects almost all clinical datasets. Since most available data mining and machine learning algorithms require complete datasets, accurately imputing (i.e. "filling in") the missing data is an essential step. This paper presents a methodology for the missing data imputation of longitudinal clinical data based on the integration of linear interpolation and a weighted K-Nearest Neighbours (KNN) algorithm. The Maximal Information Coefficient (MIC) values among features are employed as weights for the distance computation in the KNN algorithm in order to integrate intra- and inter-patient information. An interpolation-based imputation approach was also employed and tested both independently and in combination with the KNN algorithm. The final imputation is carried out by applying the best performing method for each feature. The methodology was validated on a dataset of clinical laboratory test results of 13 commonly measured analytes of patients in an intensive care unit (ICU) setting. The performance results are compared with those of 3D-MICE, a state-of-the-art imputation method for cross-sectional and longitudinal patient data. This work was presented in the context of the 2019 ICHI Data Analytics Challenge on Missing data Imputation (DACMI).

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA