RESUMO
Protein-protein interactions (PPIs) are fundamental processes governing cellular functions, crucial for understanding biological systems at the molecular level. Compared to experimental methods for PPI prediction and site identification, computational deep learning approaches represent an affordable and efficient solution to tackle these problems. Since protein structure can be summarized as a graph, graph neural networks (GNNs) represent the ideal deep learning architecture for the task. In this work, PPI prediction is modeled as a node-focused binary classification task using a GNN to determine whether a generic residue is part of the interface. Biological data were obtained from the Protein Data Bank in Europe (PDBe), leveraging the Protein Interfaces, Surfaces, and Assemblies (PISA) service. To gain a deeper understanding of how proteins interact, the data obtained from PISA were assembled into three datasets: Whole, Interface, and Chain, consisting of data on the whole protein, couples of interacting chains, and single chains, respectively. These three datasets correspond to three different nuances of the problem: identifying interfaces between protein complexes, between chains of the same protein, and interface regions in general. The results indicate that GNNs are capable of solving each of the three tasks with very good performance levels.
Assuntos
Bases de Dados de Proteínas , Redes Neurais de Computação , Mapeamento de Interação de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Aprendizado Profundo , Mapas de Interação de Proteínas , Algoritmos , Ligação ProteicaRESUMO
Graph Neural Networks have proven to be very valuable models for the solution of a wide variety of problems on molecular graphs, as well as in many other research fields involving graph-structured data. Molecules are heterogeneous graphs composed of atoms of different species. Composite graph neural networks process heterogeneous graphs with multiple-state-updating networks, each one dedicated to a particular node type. This approach allows for the extraction of information from s graph more efficiently than standard graph neural networks that distinguish node types through a one-hot encoded type of vector. We carried out extensive experimentation on eight molecular graph datasets and on a large number of both classification and regression tasks. The results we obtained clearly show that composite graph neural networks are far more efficient in this setting than standard graph neural networks.
Assuntos
Redes Neurais de Computação , AlgoritmosRESUMO
The recent release of COVID-19 spike glycoprotein allows detailed analysis of the structural features that are required for stabilizing the infective form of its quaternary assembly. Trying to disassemble the trimeric structure of COVID-19 spike glycoprotein, we analyzed single protomer surfaces searching for concave moieties that are located at the three protomer-protomer interfaces. The presence of some druggable pockets at these interfaces suggested that some of the available drugs in Drug Bank could destabilize the quaternary spike glycoprotein formation by binding to these pockets, therefore interfering with COVID-19 life cycle. The approach we propose here can be an additional strategy to fight against the deadly virus. Ligands of COVID-19 spike glycoprotein that we have predicted in the present computational investigation, might be the basis for new experimental studies in vitro and in vivo.
Assuntos
Betacoronavirus/efeitos dos fármacos , Infecções por Coronavirus/tratamento farmacológico , Avaliação Pré-Clínica de Medicamentos , Pneumonia Viral/tratamento farmacológico , Multimerização Proteica/efeitos dos fármacos , Bibliotecas de Moléculas Pequenas/farmacologia , Glicoproteína da Espícula de Coronavírus/antagonistas & inibidores , Glicoproteína da Espícula de Coronavírus/química , Sequência de Aminoácidos , Antivirais/química , Antivirais/farmacologia , Antivirais/uso terapêutico , Betacoronavirus/química , Betacoronavirus/fisiologia , Sítios de Ligação , COVID-19 , Infecções por Coronavirus/epidemiologia , Ligantes , Modelos Moleculares , Pandemias , Pneumonia Viral/epidemiologia , SARS-CoV-2 , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/uso terapêuticoRESUMO
Predicting drug side effects before they occur is a critical task for keeping the number of drug-related hospitalizations low and for improving drug discovery processes. Automatic predictors of side-effects generally are not able to process the structure of the drug, resulting in a loss of information. Graph neural networks have seen great success in recent years, thanks to their ability of exploiting the information conveyed by the graph structure and labels. These models have been used in a wide variety of biological applications, among which the prediction of drug side-effects on a large knowledge graph. Exploiting the molecular graph encoding the structure of the drug represents a novel approach, in which the problem is formulated as a multi-class multi-label graph-focused classification. We developed a methodology to carry out this task, using recurrent Graph Neural Networks, and building a dataset from freely accessible and well established data sources. The results show that our method has an improved classification capability, under many parameters and metrics, with respect to previously available predictors. The method is not ready for clinical tests yet, as the specificity is still below the preliminary 25% threshold. Future efforts will aim at improving this aspect.
Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Benchmarking , Descoberta de Drogas , Redes Neurais de ComputaçãoRESUMO
Drug Side-Effects (DSEs) have a high impact on public health, care system costs, and drug discovery processes. Predicting the probability of side-effects, before their occurrence, is fundamental to reduce this impact, in particular on drug discovery. Candidate molecules could be screened before undergoing clinical trials, reducing the costs in time, money, and health of the participants. Drug side-effects are triggered by complex biological processes involving many different entities, from drug structures to protein-protein interactions. To predict their occurrence, it is necessary to integrate data from heterogeneous sources. In this work, such heterogeneous data is integrated into a graph dataset, expressively representing the relational information between different entities, such as drug molecules and genes. The relational nature of the dataset represents an important novelty for drug side-effect predictors. Graph Neural Networks (GNNs) are exploited to predict DSEs on our dataset with very promising results. GNNs are deep learning models that can process graph-structured data, with minimal information loss, and have been applied on a wide variety of biological tasks. Our experimental results confirm the advantage of using relationships between data entities, suggesting interesting future developments in this scope. The experimentation also shows the importance of specific subsets of data in determining associations between drugs and side-effects.
Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Descoberta de Drogas , Redes Neurais de Computação , Probabilidade , Projetos de PesquisaRESUMO
Recent studies confirmed that people unexposed to SARS-CoV-2 have preexisting reactivity, probably due to previous exposure to widely circulating common cold coronaviruses. Such preexistent reactivity against SARS-CoV-2 comes from memory T cells that can specifically recognize a SARS-CoV-2 epitope of structural and non-structural proteins and the homologous epitopes from common cold coronaviruses. Therefore, it is important to understand the SARS-CoV-2 cross-reactivity by investigating these protein sequence similarities with those of different circulating coronaviruses. In addition, the emerging SARS-CoV-2 variants lead to an intense interest in whether mutations in proteins (especially in the spike) could potentially compromise vaccine effectiveness. Since it is not clear that the differences in clinical outcomes are caused by common cold coronaviruses, a deeper investigation on cross-reactive T-cell immunity to SARS-CoV-2 is crucial to examine the differential COVID-19 symptoms and vaccine performance. Therefore, the present study can be a starting point for further research on cross-reactive T cell recognition between circulating common cold coronaviruses and SARS-CoV-2, including the most recent variants Delta and Omicron. In the end, a deep learning approach, based on Siamese networks, is proposed to accurately and efficiently calculate a BLAST-like similarity score between protein sequences.
RESUMO
With a structural bioinformatic approach, we have explored amino acid compositions at PISA defined interfaces between small molecules and proteins that are contained in an optimized subset of 11,351 PDB files. The use of a series of restrictions, to prevent redundancy and biases from interactions between amino acids with charged side chains and ions, yielded a final data set of 45,230 protein-small molecule interfaces. We have compared occurrences of natural amino acids in surface exposed regions and binding sites for all the proteins of our data set. From our structural bioinformatic survey, the most relevant signal arose from the unexpected Gly abundance at enzyme catalytic sites. This finding suggested that Gly must have a fundamental role in stabilizing concave protein surface moieties. Subsequently, we have tried to predict the effect of in silico Gly mutations in hen egg white lysozyme to optimize those conditions that can reshape the protein surface with the appearance of new pockets. Replacing amino acids having bulky side chains with Gly in specific protein regions seems a feasible way for designing proteins with additional surface pockets, which can alter protein surface dynamics, therefore, representing controllable switches for protein activity.
Assuntos
Biologia Computacional , Glicina , Aminoácidos/química , Aminoácidos/genética , Sítios de Ligação/genética , Glicina/química , Glicina/genética , Conformação Proteica , Proteínas/químicaRESUMO
Understanding the molecular mechanisms that correlate pathologies with missense mutations is of critical importance for disease risk estimations and for devising personalized therapies. Thus, we have performed a bioinformatic survey of ClinVar, a database of human genomic variations, to find signals that can account for missense mutation pathogenicity. Arginine resulted as the most frequently replaced amino acid both in benign and pathogenic mutations. By adding the structural dimension to this investigation to increase its resolution, we found that arginine mutations occurring at the protein-DNA interface increase pathogenicity 6.5 times with respect to benign variants. Glycine is the second amino acid among all the pathological missense mutations. Necessarily replaced by larger amino acids, glycine substitutions perturb the structural stability of proteins and, therefore, their functions, being mostly located in buried protein moieties. Arginine and glycine appear as representative of missense mutations causing respective changes in interaction processes and protein structural features, the two main molecular mechanisms of genome-induced pathologies.
Assuntos
Biologia Computacional , Mutação de Sentido Incorreto , Humanos , Mutação , ProteínasRESUMO
Nowadays, it is well established that most of the human diseases which are not related to pathogen infections have their origin from DNA disorders. Thus, DNA mutations, waiting for the availability of CRISPR-like remedies, will propagate into proteomics, offering the possibility to select natural or synthetic molecules to fight against the effects of malfunctioning proteins. Drug discovery, indeed, is a flourishing field of biotechnological research to improve human health, even though the development of a new drug is increasingly more expensive in spite of the massive use of informatics in Medicinal Chemistry. CRISPR technology adds new alternatives to cure diseases by removing DNA defects responsible of genome-related pathologies. In principle, the same technology, however, could also be exploited to induce protein mutations whose effects are controlled by the presence of suitable ligands. In this paper, a new idea is proposed for the realization of mutated proteins, on the surface of which more spacious transient pockets are formed and, therefore, are more suitable for hosting drugs. In particular, new allosteric sites are obtained by replacing amino-acids with bulky side chains with glycine, Gly, the smallest natural amino-acid. We also present a machine learning approach to evaluate the druggability score of new (or enlarged) pockets. Preliminary experimental results are very promising, showing that 10% of the sites created by the Gly-pipe software are druggable.