RESUMO
Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.
RESUMO
Proteins perform many essential functions in biological systems and can be successfully developed as bio-therapeutics. It is invaluable to be able to predict their properties based on a proposed sequence and structure. In this study, we developed a novel generalizable deep learning framework, LM-GVP, composed of a protein Language Model (LM) and Graph Neural Network (GNN) to leverage information from both 1D amino acid sequences and 3D structures of proteins. Our approach outperformed the state-of-the-art protein LMs on a variety of property prediction tasks including fluorescence, protease stability, and protein functions from Gene Ontology (GO). We also illustrated insights into how a GNN prediction head can inform the fine-tuning of protein LMs to better leverage structural information. We envision that our deep learning framework will be generalizable to many protein property prediction problems to greatly accelerate protein engineering and drug development.
Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Idioma , Redes Neurais de Computação , Proteínas/químicaRESUMO
Deep learning has drawn significant attention in different areas including drug discovery. It has been proposed that it could outperform other machine learning algorithms, especially with big data sets. In the field of pharmaceutical industry, machine learning models are built to understand quantitative structure-activity relationships (QSARs) and predict molecular activities, including absorption, distribution, metabolism, and excretion (ADME) properties, using only molecular structures. Previous reports have demonstrated the advantages of using deep neural networks (DNNs) for QSAR modeling. One of the challenges while building DNN models is identifying the hyperparameters that lead to better generalization of the models. In this study, we investigated several tunable hyperparameters of deep neural network models on 24 industrial ADME data sets. We analyzed the sensitivity and influence of five different hyperparameters including the learning rate, weight decay for L2 regularization, dropout rate, activation function, and the use of batch normalization. This paper focuses on strategies and practices for DNN model building. Further, the optimized model for each data set was built and compared with the benchmark models used in production. Based on our benchmarking results, we propose several practices for building DNN QSAR models.
Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Absorção Fisico-Química , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Relação Quantitativa Estrutura-AtividadeRESUMO
Partial covalent interactions (PCIs) in proteins, which include hydrogen bonds, salt bridges, cation-π, and π-π interactions, contribute to thermodynamic stability and facilitate interactions with other biomolecules. Several score functions have been developed within the Rosetta protein modeling framework that identify and evaluate these PCIs through analyzing the geometry between participating atoms. However, we hypothesize that PCIs can be unified through a simplified electron orbital representation. To test this hypothesis, we have introduced orbital based chemical descriptors for PCIs into Rosetta, called the PCI score function. Optimal geometries for the PCIs are derived from a statistical analysis of high-quality protein structures obtained from the Protein Data Bank (PDB), and the relative orientation of electron deficient hydrogen atoms and electron-rich lone pair or π orbitals are evaluated. We demonstrate that nativelike geometries of hydrogen bonds, salt bridges, cation-π, and π-π interactions are recapitulated during minimization of protein conformation. The packing density of tested protein structures increased from the standard score function from 0.62 to 0.64, closer to the native value of 0.70. Overall, rotamer recovery improved when using the PCI score function (75%) as compared to the standard Rosetta score function (74%). The PCI score function represents an improvement over the standard Rosetta score function for protein model scoring; in addition, it provides a platform for future directions in the analysis of small molecule to protein interactions, which depend on partial covalent interactions.
Assuntos
Modelos Moleculares , Proteínas/química , Cristalografia por Raios X , Bases de Dados de Proteínas , Elétrons , Ligação de Hidrogênio , Conformação Proteica , RotaçãoRESUMO
The computational design of proteins that bind small molecule ligands is one of the unsolved challenges in protein engineering. It is complicated by the relatively small size of the ligand which limits the number of intermolecular interactions. Furthermore, near-perfect geometries between interacting partners are required to achieve high binding affinities. For apolar, rigid small molecules the interactions are dominated by short-range van der Waals forces. As the number of polar groups in the ligand increases, hydrogen bonds, salt bridges, cation-π, and π-π interactions gain importance. These partial covalent interactions are longer ranged, and additionally, their strength depends on the environment (e.g. solvent exposure). To assess the current state of protein-small molecule interface design, we benchmark the popular computer algorithm Rosetta on a diverse set of 43 protein-ligand complexes. On average, we achieve sequence recoveries in the binding site of 59% when the ligand is allowed limited reorientation, and 48% when the ligand is allowed full reorientation. When simulating the redesign of a protein binding site, sequence recovery among residues that contribute most to binding was 52% when slight ligand reorientation was allowed, and 27% when full ligand reorientation was allowed. As expected, sequence recovery correlates with ligand displacement.
Assuntos
Simulação de Acoplamento Molecular , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Ligantes , Ligação Proteica , Engenharia de Proteínas , SoftwareRESUMO
Structure-based drug design is frequently used to accelerate the development of small-molecule therapeutics. Although substantial progress has been made in X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, the availability of high-resolution structures is limited owing to the frequent inability to crystallize or obtain sufficient NMR restraints for large or flexible proteins. Computational methods can be used to both predict unknown protein structures and model ligand interactions when experimental data are unavailable. This paper describes a comprehensive and detailed protocol using the Rosetta modeling suite to dock small-molecule ligands into comparative models. In the protocol presented here, we review the comparative modeling process, including sequence alignment, threading and loop building. Next, we cover docking a small-molecule ligand into the protein comparative model. In addition, we discuss criteria that can improve ligand docking into comparative models. Finally, and importantly, we present a strategy for assessing model quality. The entire protocol is presented on a single example selected solely for didactic purposes. The results are therefore not representative and do not replace benchmarks published elsewhere. We also provide an additional tutorial so that the user can gain hands-on experience in using Rosetta. The protocol should take 5-7 h, with additional time allocated for computer generation of models.
Assuntos
Modelos Moleculares , Simulação de Acoplamento Molecular , Conformação Proteica , Desenho de Fármacos , Ligantes , Alinhamento de Sequência/métodos , Software , Interface Usuário-ComputadorRESUMO
It has been demonstrated previously that symmetric, homodimeric proteins are energetically favored, which explains their abundance in nature. It has been proposed that such symmetric homodimers underwent gene duplication and fusion to evolve into protein topologies that have a symmetric arrangement of secondary structure elements--"symmetric superfolds". Here, the ROSETTA protein design software was used to computationally engineer a perfectly symmetric variant of imidazole glycerol phosphate synthase and its corresponding symmetric homodimer. The new protein, termed FLR, adopts the symmetric (ßα)(8) TIM-barrel superfold. The protein is soluble and monomeric and exhibits two-fold symmetry not only in the arrangement of secondary structure elements but also in sequence and at atomic detail, as verified by crystallography. When cut in half, FLR dimerizes readily to form the symmetric homodimer. The successful computational design of FLR demonstrates progress in our understanding of the underlying principles of protein stability and presents an attractive strategy for the in silico construction of larger protein domains from smaller pieces.
Assuntos
Aminoidrolases/química , Biologia Computacional , Simulação por Computador , Aminoidrolases/metabolismo , Cristalografia por Raios X , Modelos Moleculares , Estrutura Terciária de Proteína , SoftwareRESUMO
Na(+)- and Cl(-)-dependent uptake of neurotransmitters via transporters of the SLC6 family, including the human serotonin transporter (SLC6A4), is critical for efficient synaptic transmission. Although residues in the human serotonin transporter involved in direct Cl(-) coordination of human serotonin transport have been identified, the role of Cl(-) in the transport mechanism remains unclear. Through a combination of mutagenesis, chemical modification, substrate and charge flux measurements, and molecular modeling studies, we reveal an unexpected role for the highly conserved transmembrane segment 1 residue Asn-101 in coupling Cl(-) binding to concentrative neurotransmitter uptake.
Assuntos
Asparagina/química , Cloretos/química , Neurotransmissores/metabolismo , Proteínas da Membrana Plasmática de Transporte de Serotonina/química , Animais , Cisteína/química , Eletrofisiologia/métodos , Células HeLa , Humanos , Íons , Mutagênese Sítio-Dirigida , Norepinefrina/metabolismo , Oócitos/metabolismo , Técnicas de Patch-Clamp , Plasmídeos/metabolismo , Ratos , Serotonina/metabolismo , Xenopus laevisRESUMO
The human serotonin (5-hydroxytryptamine, 5-HT) transporter (hSERT) is responsible for the reuptake of 5-HT following synaptic release, as well as for import of the biogenic amine into several non-5-HT synthesizing cells including platelets. The antidepressant citalopram blocks SERT and thereby inhibits the transport of 5-HT. To identify key residues establishing high-affinity citalopram binding, we have built a comparative model of hSERT and Drosophila melanogaster SERT (dSERT) based on the Aquifex aeolicus leucine transporter (LeuT(Aa)) crystal structure. In this study, citalopram has been docked into the homology model of hSERT and dSERT using RosettaLigand. Our models reproduce the differential binding affinities for the R- and S-isomers of citalopram in hSERT and the impact of several hSERT mutants. Species-selective binding affinities for hSERT and dSERT also can be reproduced. Interestingly, the model predicts a hydrogen bond between E444 in transmembrane domain 8 (TM8) and Y95 in TM1 that places Y95 in a downward position, thereby removing Y95 from a direct interaction with S-citalopram. Mutation of E444D results in a 10-fold reduced binding affinity for S-citalopram, supporting the hypothesis that Y95 and E444 form a stabilizing interaction in the S-citalopram/hSERT complex.