RESUMO
Protein-protein interactions (PPIs) are a major component of the cellular biochemical reaction network. Rich sequence information and machine learning techniques reduce the dependence of exploring PPIs on wet experiments, which are costly and time-consuming. This paper proposes a PPI prediction model, multi-scale architecture residual network for PPIs (MARPPI), based on dual-channel and multi-feature. Multi-feature leverages Res2vec to obtain the association information between residues, and utilizes pseudo amino acid composition, autocorrelation descriptors and multivariate mutual information to achieve the amino acid composition and order information, physicochemical properties and information entropy, respectively. Dual channel utilizes multi-scale architecture improved ResNet network which extracts protein sequence features to reduce protein feature loss. Compared with other advanced methods, MARPPI achieves 96.03%, 99.01% and 91.80% accuracy in the intraspecific datasets of Saccharomyces cerevisiae, Human and Helicobacter pylori, respectively. The accuracy on the two interspecific datasets of Human-Bacillus anthracis and Human-Yersinia pestis is 97.29%, and 95.30%, respectively. In addition, results on specific datasets of disease (neurodegenerative and metabolic disorders) demonstrate the ability to detect hidden interactions. To better illustrate the performance of MARPPI, evaluations on independent datasets and PPIs network suggest that MARPPI can be used to predict cross-species interactions. The above shows that MARPPI can be regarded as a concise, efficient and accurate tool for PPI datasets.
Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Humanos , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Mapas de Interação de Proteínas , Aminoácidos/metabolismoRESUMO
MOTIVATION: Protein-protein interaction sites (PPIS) are crucial for deciphering protein action mechanisms and related medical research, which is the key issue in protein action research. Recent studies have shown that graph neural networks have achieved outstanding performance in predicting PPIS. However, these studies often neglect the modeling of information at different scales in the graph and the symmetry of protein molecules within three-dimensional space. RESULTS: In response to this gap, this article proposes the MEG-PPIS approach, a PPIS prediction method based on multi-scale graph information and E(n) equivariant graph neural network (EGNN). There are two channels in MEG-PPIS: the original graph and the subgraph obtained by graph pooling. The model can iteratively update the features of the original graph and subgraph through the weight-sharing EGNN. Subsequently, the max-pooling operation aggregates the updated features of the original graph and subgraph. Ultimately, the model feeds node features into the prediction layer to obtain prediction results. Comparative assessments against other methods on benchmark datasets reveal that MEG-PPIS achieves optimal performance across all evaluation metrics and gets the fastest runtime. Furthermore, specific case studies demonstrate that our method can predict more true positive and true negative sites than the current best method, proving that our model achieves better performance in the PPIS prediction task. AVAILABILITY AND IMPLEMENTATION: The data and code are available at https://github.com/dhz234/MEG-PPIS.git.
Assuntos
Redes Neurais de Computação , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Biologia Computacional/métodos , Mapas de Interação de ProteínasRESUMO
Deep learning is improving and changing the process of de novo molecular design at a rapid pace. In recent years, great progress has been made in drug discovery and development by using deep generative models for de novo molecular design. However, most of the existing methods are string-based or graph-based and are limited by the lack of some very important properties, such as the three-dimensional information of molecules. We propose DNMG, a deep generative adversarial network (GAN) combined with transfer learning. Specifically, we use a Wasserstein-variant GAN based network architecture that considers the 3D grid spatial information of the ligand with atomic physicochemical properties to generate a representation of the molecule, which is then parsed into SMILES strings using an improved captioning network. Comprehensive in experiments demonstrate the ability of DNMG to generate valid and novel drug-like ligands. The DNMG model is used to design inhibitors for three targets, MK14, FNTA, and CDK2. The computational results show that the molecules generated by DNMG have better binding ability to the target proteins and better physicochemical properties. Overall, our deep generative model has excellent potential to generate molecules with high binding affinity for targets and explore the space of drug-like chemistry.
Assuntos
Desenho de Fármacos , Descoberta de Drogas , Modelos Moleculares , Descoberta de Drogas/métodos , Ligantes , ProteínasRESUMO
Recent years have seen tremendous success in the design of novel drug molecules through deep generative models. Nevertheless, existing methods only generate drug-like molecules, which require additional structural optimization to be developed into actual drugs. In this study, a deep learning method for generating target-specific ligands was proposed. This method is useful when the dataset for target-specific ligands is limited. Deep learning methods can extract and learn features (representations) in a data-driven way with little or no human participation. Generative pretraining (GPT) was used to extract the contextual features of the molecule. Three different protein-encoding methods were used to extract the physicochemical properties and amino acid information of the target protein. Protein-encoding and molecular sequence information are combined to guide molecule generation. Transfer learning was used to fine-tune the pretrained model to generate molecules with better binding ability to the target protein. The model was validated using three different targets. The docking results show that our model is capable of generating new molecules with higher docking scores for the target proteins.
Assuntos
Desenho de Fármacos , Proteínas , Estrutura Molecular , Proteínas/química , Aminoácidos , Ligantes , Aprendizado de MáquinaRESUMO
BACKGROUND: Protein-protein interactions (PPIs) dominate intracellular molecules to perform a series of tasks such as transcriptional regulation, information transduction, and drug signalling. The traditional wet experiment method to obtain PPIs information is costly and time-consuming. RESULT: In this paper, SDNN-PPI, a PPI prediction method based on self-attention and deep learning is proposed. The method adopts amino acid composition (AAC), conjoint triad (CT), and auto covariance (AC) to extract global and local features of protein sequences, and leverages self-attention to enhance DNN feature extraction to more effectively accomplish the prediction of PPIs. In order to verify the generalization ability of SDNN-PPI, a 5-fold cross-validation on the intraspecific interactions dataset of Saccharomyces cerevisiae (core subset) and human is used to measure our model in which the accuracy reaches 95.48% and 98.94% respectively. The accuracy of 93.15% and 88.33% are obtained in the interspecific interactions dataset of human-Bacillus Anthracis and Human-Yersinia pestis, respectively. In the independent data set Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, all prediction accuracy is 100%, which is higher than the previous PPIs prediction methods. To further evaluate the advantages and disadvantages of the model, the one-core and crossover network are conducted to predict PPIs, and the data show that the model correctly predicts the interaction pairs in the network. CONCLUSION: In this paper, AAC, CT and AC methods are used to encode the sequence, and SDNN-PPI method is proposed to predict PPIs based on self-attention deep learning neural network. Satisfactory results are obtained on interspecific and intraspecific data sets, and good performance is also achieved in cross-species prediction. It can also correctly predict the protein interaction of cell and tumor information contained in one-core network and crossover network.The SDNN-PPI proposed in this paper not only explores the mechanism of protein-protein interaction, but also provides new ideas for drug design and disease prevention.
Assuntos
Redes Neurais de Computação , Mapeamento de Interação de Proteínas , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Humanos , Camundongos , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMO
BACKGROUND: Protein-protein interaction (PPI) is very important for many biochemical processes. Therefore, accurate prediction of PPI can help us better understand the role of proteins in biochemical processes. Although there are many methods to predict PPI in biology, they are time-consuming and lack accuracy, so it is necessary to build an efficiently and accurately computational model in the field of PPI prediction. RESULTS: We present a novel sequence-based computational approach called DCSE (Double-Channel-Siamese-Ensemble) to predict potential PPI. In the encoding layer, we treat each amino acid as a word, and map it into an N-dimensional vector. In the feature extraction layer, we extract features from local and global perspectives by Multilayer Convolutional Neural Network (MCN) and Multilayer Bidirectional Gated Recurrent Unit with Convolutional Neural Networks (MBC). Finally, the output of the feature extraction layer is then fed into the prediction layer to output whether the input protein pair will interact each other. The MCN and MBC are siamese and ensemble based network, which can effectively improve the performance of the model. In order to demonstrate our model's performance, we compare it with four machine learning based and three deep learning based models. The results show that our method outperforms other models in all evaluation criteria. The Accuracy, Precision, [Formula: see text], Recall and MCC of our model are 0.9303, 0.9091, 0.9268, 0.9452, 0.8609. For the other seven models, the highest Accuracy, Precision, [Formula: see text], Recall and MCC are 0.9288, 0.9243, 0.9246, 0.9250, 0.8572. We also test our model in the imbalanced dataset and transfer our model to another species. The results show our model is excellent. CONCLUSION: Our model achieves the best performance by comparing it with seven other models. NLP-based coding method has a good effect on PPI prediction task. MCN and MBC extract protein sequence features from local and global perspectives and these two feature extraction layers are based on siamese and ensemble network structures. Siamese-based network structure can keep the features consistent and ensemble based network structure can effectively improve the accuracy of the model.
Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Aprendizado de Máquina , Proteínas/metabolismoRESUMO
The prediction of a protein-protein interaction site (PPI site) plays a very important role in the biochemical process, and lots of computational methods have been proposed in the past. However, the majority of the past methods are time consuming and lack accuracy. Hence, coming up with an effective computational method is necessary. In this article, we present a novel computational model called RGN (residue-based graph attention and convolutional network) to predict PPI sites. In our paper, the protein is treated as a graph. The amino acid can be seen as the node in the graph structure. The position-specific scoring matrix, hidden Markov model, hydrogen bond estimation algorithm, and ProtBert are applied as node features. The edges are decided by the spatial distance between the amino acids. Then, we utilize a residue-based graph convolutional network and graph attention network to further extract the deeper feature. Finally, the processed node feature is fed into the prediction layer. We show the superiority of our model by comparing it with the other four protein structure-based methods and five protein sequence-based methods. Our model obtains the best performance on all the evaluation metrics (accuracy, precision, recall, F1 score, Matthews correlation coefficient, area under the receiver operating characteristic curve, and area under the precision recall curve). We also conduct a case study to demonstrate that extracting the protein information from the protein structure perspective is effective and points out the difficult aspect of PPI site prediction.
Assuntos
Algoritmos , Redes Neurais de Computação , Proteínas/química , Aminoácidos/química , Curva ROCRESUMO
Molecular toxicity prediction plays an important role in drug discovery, which is directly related to human health and drug fate. Accurately determining the toxicity of molecules can help weed out low-quality molecules in the early stage of drug discovery process and avoid depletion later in the drug development process. Nowadays, more and more researchers are starting to use machine learning methods to predict the toxicity of molecules, but these models do not fully exploit the 3D information of molecules. Quantum chemical information, which provides stereo structural information of molecules, can influence their toxicity. To this end, we propose QuantumTox, the first application of quantum chemistry in the field of drug molecule toxicity prediction compared to existing work. We extract the quantum chemical information of molecules as their 3D features. In the downstream prediction phase, we use Gradient Boosting Decision Tree and Bagging ensemble learning methods together to improve the accuracy and generalization of the model. A series of experiments on various tasks show that our model consistently outperforms the baseline model and that the model still performs well on small datasets of less than 300.
Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Descoberta de Drogas/métodosRESUMO
Background: Biomedical named entity recognition is one of the important tasks of biomedical literature mining. With the development of natural language processing technology, many deep learning models are used to extract valuable information from the biomedical literature, which promotes the development of effective BioNER models. However, for specialized domains with diverse and complex contexts and a richer set of semantically related entity types (e.g., drug molecules, targets, pathways, etc., in the biomedical domain), whether the dependencies of these drugs, diseases, and targets can be helpful still needs to be explored. Method: Providing additional dependency information beyond context, a method based on the graph attention network and BERT pre-training model named MKGAT is proposed to improve BioNER performance in the biomedical domain. To enhance BioNER by using external dependency knowledge, we integrate BERT-processed text embeddings and entity dependencies to construct better entity embedding representations for biomedical named entity recognition. Results: The proposed method obtains competitive accuracy and higher efficiency than the state-of-the-art method on three datasets, namely, NCBI-disease corpus, BC2GM, and BC5CDR-chem, with a precision of 90.71%, 88.19%, and 95.71%, recall of 92.52%, 88.05%, and 95.62%, and F1-scores of 91.61%, 88.12%, and 95.66%, respectively, which performs better than existing methods. Conclusion: Drug, disease, and protein dependencies can allow entities to be better represented in neural networks, thereby improving the performance of BioNER.