Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
1.
bioRxiv ; 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38712044

RESUMO

Embeddings from protein language models (PLM's) capture intricate patterns for protein sequences, enabling more accurate and efficient prediction of protein properties. Incorporating protein structure information as direct input into PLMs results in an improvement on the predictive ability of protein embeddings on downstream tasks. In this work we demonstrate that indirectly infusing structure information into PLMs also leads to performance gains on structure related tasks. The key difference between this framework and others is that at inference time the model does not require access to structure to produce its embeddings.

2.
ArXiv ; 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38745704

RESUMO

Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol.

3.
bioRxiv ; 2024 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-38746274

RESUMO

The explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding. Interestingly, no method utilizes a language representation of the interaction itself. We developed an interaction LM (iLM), which uses a novel language to represent interactions between protein/peptide sequences. Sliding Window Interaction Grammar (SWING) leverages differences in amino acid properties to generate an interaction vocabulary. This vocabulary is the input into a LM followed by a supervised prediction step where the LM's representations are used as features. SWING was first applied to predicting peptide:MHC (pMHC) interactions. SWING was not only successful at generating Class I and Class II models that have comparable prediction to state-of-the-art approaches, but the unique Mixed Class model was also successful at jointly predicting both classes. Further, the SWING model trained only on Class I alleles was predictive for Class II, a complex prediction task not attempted by any existing approach. For de novo data, using only Class I or Class II data, SWING also accurately predicted Class II pMHC interactions in murine models of SLE (MRL/lpr model) and T1D (NOD model), that were validated experimentally. To further evaluate SWING's generalizability, we tested its ability to predict the disruption of specific protein-protein interactions by missense mutations. Although modern methods like AlphaMissense and ESM1b can predict interfaces and variant effects/pathogenicity per mutation, they are unable to predict interaction-specific disruptions. SWING was successful at accurately predicting the impact of both Mendelian mutations and population variants on PPIs. This is the first generalizable approach that can accurately predict interaction-specific disruptions by missense mutations with only sequence information. Overall, SWING is a first-in-class generalizable zero-shot iLM that learns the language of PPIs.

4.
ArXiv ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38764591

RESUMO

Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.

5.
Drug Metab Dispos ; 52(2): 69-79, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-37973374

RESUMO

Lung cancer is the leading cause of cancer deaths worldwide. We found that the cytochrome P450 isoform CYP4F11 is significantly overexpressed in patients with lung squamous cell carcinoma. CYP4F11 is a fatty acid ω-hydroxylase and catalyzes the production of the lipid mediator 20-hydroxyeicosatetraenoic acid (20-HETE) from arachidonic acid. 20-HETE promotes cell proliferation and migration in cancer. Inhibition of 20-HETE-generating cytochrome P450 enzymes has been implicated as novel cancer therapy for more than a decade. However, the exact role of CYP4F11 and its potential as drug target for lung cancer therapy has not been established yet. Thus, we performed a transient knockdown of CYP4F11 in the lung cancer cell line NCI-H460. Knockdown of CYP4F11 significantly inhibits lung cancer cell proliferation and migration while the 20-HETE production is significantly reduced. For biochemical characterization of CYP4F11-inhibitor interactions, we generated recombinant human CYP4F11. Spectroscopic ligand binding assays were conducted to evaluate CYP4F11 binding to the unselective CYP4A/F inhibitor HET0016. HET0016 shows high affinity to recombinant CYP4F11 and inhibits CYP4F11-mediated 20-HETE production in vitro with a nanomolar IC 50 Cross evaluation of HET0016 in NCI-H460 cells shows that lung cancer cell proliferation is significantly reduced together with 20-HETE production. However, HET0016 also displays antiproliferative effects that are not 20-HETE mediated. Future studies aim to establish the role of CYP4F11 in lung cancer and the underlying mechanism and investigate the potential of CYP4F11 as a therapeutic target for lung cancer. SIGNIFICANCE STATEMENT: Lung cancer is a deadly cancer with limited treatment options. Cytochrome P450 4F11 (CYP4F11) is significantly upregulated in lung squamous cell carcinoma. Knockdown of CYP4F11 in a lung cancer cell line significantly attenuates cell proliferation and migration with reduced production of the lipid mediator 20-hydroxyeicosatetraenoic acid (20-HETE). Studies with the unselective inhibitor HET0016 show a high inhibitory potency of CYP4F11-mediated 20-HETE production using recombinant enzyme. Overall, our studies demonstrate the potential of targeting CYP4F11 for new transformative lung cancer treatment.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Carcinoma de Células Escamosas , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Ácidos Graxos , Sistema Enzimático do Citocromo P-450/metabolismo , Citocromo P-450 CYP4A , Eicosanoides , Ácidos Hidroxieicosatetraenoicos/metabolismo , Família 4 do Citocromo P450/genética
6.
J Biol Chem ; 300(1): 105583, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38141770

RESUMO

Membrane polyphosphoinositides (PPIs) are lipid-signaling molecules that undergo metabolic turnover and influence a diverse range of cellular functions. PPIs regulate the activity and/or spatial localization of a number of actin-binding proteins (ABPs) through direct interactions; however, it is much less clear whether ABPs could also be an integral part in regulating PPI signaling. In this study, we show that ABP profilin1 (Pfn1) is an important molecular determinant of the cellular content of PI(4,5)P2 (the most abundant PPI in cells). In growth factor (EGF) stimulation setting, Pfn1 depletion does not impact PI(4,5)P2 hydrolysis but enhances plasma membrane (PM) enrichment of PPIs that are produced downstream of activated PI3-kinase, including PI(3,4,5)P3 and PI(3,4)P2, the latter consistent with increased PM recruitment of SH2-containing inositol 5' phosphatase (SHIP2) (a key enzyme for PI(3,4)P2 biosynthesis). Although Pfn1 binds to PPIs in vitro, our data suggest that Pfn1's affinity to PPIs and PM presence in actual cells, if at all, is negligible, suggesting that Pfn1 is unlikely to directly compete with SHIP2 for binding to PM PPIs. Additionally, we provide evidence for Pfn1's interaction with SHIP2 in cells and modulation of this interaction upon EGF stimulation, raising an alternative possibility of Pfn1 binding as a potential restrictive mechanism for PM recruitment of SHIP2. In conclusion, our findings challenge the dogma of Pfn1's binding to PM by PPI interaction, uncover a previously unrecognized role of Pfn1 in PI(4,5)P2 homeostasis and provide a new mechanistic avenue of how an ABP could potentially impact PI3K signaling byproducts in cells through lipid phosphatase control.


Assuntos
Fosfatidilinositóis , Profilinas , Fator de Crescimento Epidérmico/metabolismo , Fosfatidilinositol 3-Quinases/metabolismo , Fosfatidilinositol-3,4,5-Trifosfato 5-Fosfatases/metabolismo , Fosfatidilinositóis/metabolismo , Humanos , Células HEK293 , Profilinas/metabolismo
7.
J Chem Inf Model ; 64(7): 2488-2495, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-38113513

RESUMO

Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 µM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.


Assuntos
Proteínas , Ubiquitina-Proteína Ligases , Simulação de Acoplamento Molecular , Ligantes , Estudos Prospectivos , Proteínas/química , Ligação Proteica , Ubiquitina-Proteína Ligases/metabolismo
8.
J Comput Aided Mol Des ; 38(1): 3, 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38062207

RESUMO

Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor. Here we describe Open-ComBind, an easy-to-use, open-source version of the ComBind molecular docking pipeline that leverages information from multiple ligands without known bound structures to enhance pose selection. We first create distributions of feature similarities between ligand pose pairs, comparing near-native poses with all sampled docked poses. These distributions capture the likelihood of observing similar features, such as hydrogen bonds or hydrophobic contacts, in different pose configurations. These similarity distributions are then combined with a per-ligand docking score to enhance overall pose selection by 5% and 4.5% for high-affinity and congeneric series helper ligands, respectively. Open-ComBind reduces the average RMSD of ligands in our benchmark dataset by 9.0%. We provide Open-ComBind as an easy-to-use command line and Python API to increase pose prediction performance at www.github.com/drewnutt/open_combind .


Assuntos
Desenho de Fármacos , Proteínas , Simulação de Acoplamento Molecular , Ligação Proteica , Ligantes , Proteínas/química , Sítios de Ligação
9.
Biophys J ; 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38104241

RESUMO

Protein structure predictions from deep learning models like AlphaFold2, despite their remarkable accuracy, are likely insufficient for direct use in downstream tasks like molecular docking. The functionality of such models could be improved with a combination of increased accuracy and physical intuition. We propose a new method to train deep learning protein structure prediction models using molecular dynamics force fields to work toward these goals. Our custom PyTorch loss function, OpenMM-Loss, represents the potential energy of a predicted structure. OpenMM-Loss can be applied to any all-atom representation of a protein structure capable of mapping into our software package, SidechainNet. We demonstrate our method's efficacy by finetuning OpenFold. We show that subsequently predicted protein structures, both before and after a relaxation procedure, exhibit comparable accuracy while displaying lower potential energy and improved structural quality as assessed by MolProbity metrics.

10.
ACS Omega ; 8(44): 41680-41688, 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37970017

RESUMO

The success of machine learning is, in part, due to a large volume of data available to train models. However, the amount of training data for structure-based molecular property prediction remains limited. The previously described CrossDocked2020 data set expanded the available training data for binding pose classification in a molecular docking setting but did not address expanding the amount of receptor-ligand binding affinity data. We present experiments demonstrating that imputing binding affinity labels for complexes without experimentally determined binding affinities is a viable approach to expanding training data for structure-based models of receptor-ligand binding affinity. In particular, we demonstrate that utilizing imputed labels from a convolutional neural network trained only on the affinity data present in CrossDocked2020 results in a small improvement in the binding affinity regression performance, despite the additional sources of noise that such imputed labels add to the training data. The code, data splits, and imputation labels utilized in this paper are freely available at https://github.com/francoep/ImputationPaper.

11.
J Chem Inf Model ; 63(23): 7401-7411, 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-38000780

RESUMO

We performed exhaustive torsion sampling on more than 3 million compounds using the GFN2-xTB method and performed a comparison of experimental crystallographic and gas-phase conformers. Many conformer sampling methods derive torsional angle distributions from experimental crystallographic data, limiting the torsion preferences to molecules that must be stable, synthetically accessible, and able to be crystallized. In this work, we evaluate the differences in torsional preferences of experimental crystallographic geometries and gas-phase computed conformers from a broad selection of compounds to determine whether torsional angle distributions obtained from semiempirical methods are suitable priors for conformer sampling. We find that differences in torsion preferences can be mostly attributed to a lack of available experimental crystallographic data with small deviations derived from gas-phase geometry differences. GFN2 demonstrates the ability to provide accurate and reliable torsional preferences that can provide a basis for new methods free from the limitations of experimental data collection. We provide Gaussian-based fits and sampling distributions suitable for torsion sampling and propose an alternative to the widely used "experimental torsion and knowledge distance geometry" (ETKDG) method using quantum torsion-derived distance geometry (QTDG) methods.

12.
J Chem Inf Model ; 63(21): 6598-6607, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37903507

RESUMO

Conformer generation, the assignment of realistic 3D coordinates to a small molecule, is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g., only sampling torsions). Here, we empirically elucidate some general principles about the size, diversity, and quality of the conformational ensembles needed to get the best performance in common structure-based drug discovery tasks. In many cases, our findings may parallel "common knowledge" well-known to practitioners of the field. Nonetheless, we feel that it is valuable to quantify these conformational effects while reproducing and expanding upon previous studies. Specifically, we investigate the performance of a state-of-the-art generative deep learning approach versus a more classical geometry-based approach, the effect of energy minimization as a postprocessing step, the effect of ensemble size (maximum number of conformers), and construction (filtering by root-mean-square deviation for diversity) and how these choices influence the ability to recapitulate bioactive conformations and perform pharmacophore screening and molecular docking.


Assuntos
Algoritmos , Desenho de Fármacos , Modelos Moleculares , Simulação de Acoplamento Molecular , Conformação Molecular , Ligantes
13.
bioRxiv ; 2023 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-37904961

RESUMO

We present a novel and interpretable approach for predicting small-molecule binding affinities using context explanation networks (CENs). Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of pre-calculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs. inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each pre-calculated term to the final affinity prediction, with implications for subsequent lead optimization.

14.
ArXiv ; 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37547658

RESUMO

Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in a diffusion-inspired manner. In our method, PLANTAIN, a neural network is used to develop a very fast pose scoring function. We parameterize a simple scoring function on the fly and use L-BFGS minimization to optimize an initially random ligand pose. Using rigorous benchmarking practices, we demonstrate that our method achieves state-of-the-art performance while running ten times faster than the next-best method. We release PLANTAIN publicly and hope that it improves the utility of virtual screening workflows.

15.
J Chem Inf Model ; 62(8): 1819-1829, 2022 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-35380443

RESUMO

The lead optimization phase of drug discovery refines an initial hit molecule for desired properties, especially potency. Synthesis and experimental testing of the small perturbations during this refinement can be quite costly and time-consuming. Relative binding free energy (RBFE, also referred to as ΔΔG) methods allow the estimation of binding free energy changes after small changes to a ligand scaffold. Here, we propose and evaluate a Siamese convolutional neural network (CNN) for the prediction of RBFE between two bound ligands. We show that our multitask loss is able to improve on a previous state-of-the-art Siamese network for RBFE prediction via increased regularization of the latent space. The Siamese network architecture is well suited to the prediction of RBFE in comparison to a standard CNN trained on the same data (Pearson's R of 0.553 and 0.5, respectively). When evaluated on a left-out protein family, our Siamese CNN shows variability in its RBFE predictive performance depending on the protein family being evaluated (Pearson's R ranging from -0.44 to 0.97). RBFE prediction performance can be improved during generalization by injecting only a few examples (few-shot learning) from the evaluation data set during model training.


Assuntos
Redes Neurais de Computação , Proteínas , Descoberta de Drogas , Entropia , Ligantes , Proteínas/química
16.
Chem Sci ; 13(9): 2701-2713, 2022 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-35356675

RESUMO

The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein-ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein-ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.

17.
Molecules ; 26(23)2021 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-34885952

RESUMO

Virtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.


Assuntos
Desenho de Fármacos , Descoberta de Drogas , Software , Aprendizado Profundo , Desenho de Fármacos/métodos , Descoberta de Drogas/métodos , Humanos , Simulação de Acoplamento Molecular
18.
Exp Eye Res ; 213: 108861, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34822853

RESUMO

Aberrant angiogenesis lies at the heart of a wide range of ocular pathologies such as proliferative diabetic retinopathy, wet age-related macular degeneration and retinopathy of prematurity. This study explores the anti-angiogenic activity of a novel small molecule investigative compound capable of inhibiting profilin1-actin interaction recently identified by our group. We demonstrate that our compound is capable of inhibiting migration, proliferation and angiogenic activity of microvascular endothelial cells in vitro as well as choroidal neovascularization (CNV) ex vivo. In mouse model of laser-injury induced CNV, intravitreal administration of this compound diminishes sub-retinal neovascularization. Finally, our preliminary structure-activity relationship study (SAR) demonstrates that this small molecule compound is amenable to improvement in biological activity through structural modifications.


Assuntos
Inibidores da Angiogênese/uso terapêutico , Neovascularização de Coroide/tratamento farmacológico , Neovascularização Retiniana/tratamento farmacológico , Actinas/antagonistas & inibidores , Animais , Linhagem Celular , Movimento Celular/efeitos dos fármacos , Proliferação de Células/efeitos dos fármacos , Neovascularização de Coroide/metabolismo , Modelos Animais de Doenças , Células Endoteliais/efeitos dos fármacos , Humanos , Injeções Intravítreas , Camundongos , Camundongos Endogâmicos C57BL , Profilinas/antagonistas & inibidores , Neovascularização Retiniana/metabolismo , Vasos Retinianos/citologia , Fator A de Crescimento do Endotélio Vascular/antagonistas & inibidores , Degeneração Macular Exsudativa/tratamento farmacológico , Degeneração Macular Exsudativa/metabolismo
19.
Chem Sci ; 12(23): 8036-8047, 2021 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-34194693

RESUMO

Machine learning has been increasingly applied to the field of computer-aided drug discovery in recent years, leading to notable advances in binding-affinity prediction, virtual screening, and QSAR. Surprisingly, it is less often applied to lead optimization, the process of identifying chemical fragments that might be added to a known ligand to improve its binding affinity. We here describe a deep convolutional neural network that predicts appropriate fragments given the structure of a receptor/ligand complex. In an independent benchmark of known ligands with missing (deleted) fragments, our DeepFrag model selected the known (correct) fragment from a set over 6500 about 58% of the time. Even when the known/correct fragment was not selected, the top fragment was often chemically similar and may well represent a valid substitution. We release our trained DeepFrag model and associated software under the terms of the Apache License, Version 2.0.

20.
Proteins ; 89(11): 1489-1496, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34213059

RESUMO

Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure and can be extended by users to include new protein structures as they are released. In this article, we provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.


Assuntos
Aminoácidos/química , Aprendizado de Máquina , Proteínas/química , Software , Sequência de Aminoácidos , Conjuntos de Dados como Assunto , Redes Neurais de Computação , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA