Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Methods ; 223: 75-82, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38286333

RESUMO

The accurate identification of drug-protein interactions (DPIs) is crucial in drug development, especially concerning G protein-coupled receptors (GPCRs), which are vital targets in drug discovery. However, experimental validation of GPCR-drug pairings is costly, prompting the need for accurate predictive methods. To address this, we propose MFD-GDrug, a multimodal deep learning model. Leveraging the ESM pretrained model, we extract protein features and employ a CNN for protein feature representation. For drugs, we integrated multimodal features of drug molecular structures, including three-dimensional features derived from Mol2vec and the topological information of drug graph structures extracted through Graph Convolutional Neural Networks (GCN). By combining structural characterizations and pretrained embeddings, our model effectively captures GPCR-drug interactions. Our tests on leading GPCR-drug interaction datasets show that MFD-GDrug outperforms other methods, demonstrating superior predictive accuracy.


Assuntos
Aprendizado Profundo , Interações Medicamentosas , Desenvolvimento de Medicamentos , Descoberta de Drogas , Redes Neurais de Computação
2.
Comput Biol Med ; 167: 107618, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37925912

RESUMO

Protein sequence classification is a crucial research field in bioinformatics, playing a vital role in facilitating functional annotation, structure prediction, and gaining a deeper understanding of protein function and interactions. With the rapid development of high-throughput sequencing technologies, a vast amount of unknown protein sequence data is being generated and accumulated, leading to an increasing demand for protein classification and annotation. Existing machine learning methods still have limitations in protein sequence classification, such as low accuracy and precision of classification models, rendering them less valuable in practical applications. Additionally, these models often lack strong generalization capabilities and cannot be widely applied to various types of proteins. Therefore, accurately classifying and predicting proteins remains a challenging task. In this study, we propose a protein sequence classifier called Multi-Laplacian Regularized Random Vector Functional Link (MLapRVFL). By incorporating Multi-Laplacian and L2,1-norm regularization terms into the basic Random Vector Functional Link (RVFL) method, we effectively improve the model's generalization performance, enhance the robustness and accuracy of the classification model. The experimental results on two commonly used datasets demonstrate that MLapRVFL outperforms popular machine learning methods and achieves superior predictive performance compared to previous studies. In conclusion, the proposed MLapRVFL method makes significant contributions to protein sequence prediction.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos , Proteínas/genética , Algoritmos
3.
Comb Chem High Throughput Screen ; 26(8): 1609-1617, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36654466

RESUMO

BACKGROUND: The cost of synthetic DNA has limited applications in frontier science and technology fields such as synthetic biology, DNA storage, and DNA chips. OBJECTIVE: The objective of this study is to find an algorithm-optimized scheme for the in situ synthesis of DNA microarrays, which can reduce the cost of DNA synthesis. METHODS: Here, based on the characteristics of in situ chemical synthesis of DNA microarrays, an optimization algorithm was proposed. Through data grading, the sequences with the same base at as many different features as possible were synthesized in parallel to reduce synthetic cycles. RESULTS AND DISCUSSION: The simulation results of 10 and 100 randomly selected sequences showed that when level=2, the reduction ratio in the number of synthetic cycles was the largest, 40% and 32.5%, respectively. Subsequently, the algorithm-optimized scheme was applied to the electrochemical synthesis of 12,000 sequences required for DNA storage. The results showed that compared to the 508 cycles required by the conventional synthesis scheme, the algorithmoptimized scheme only required 342 cycles, which reduced by 32.7%. In addition, the reduced 166 cycles reduced the total synthesis time by approximately 11 hours. CONCLUSIONS: The algorithm-optimized synthesis scheme can not only reduce the synthesis time of DNA microarrays and improve synthesis efficiency, but more importantly, it can also reduce the cost of DNA synthesis by nearly 1/3. In addition, it is compatible with various in situ synthesis methods of DNA microarrays, including soft-lithography, photolithography, a photoresist layer, electrochemistry and photoelectrochemistry. Therefore, it has very important application value.


Assuntos
DNA , Análise de Sequência com Séries de Oligonucleotídeos/métodos , DNA/genética
4.
Front Genet ; 13: 935717, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36506312

RESUMO

There is a great deal of importance to SNARE proteins, and their absence from function can lead to a variety of diseases. The SNARE protein is known as a membrane fusion protein, and it is crucial for mediating vesicle fusion. The identification of SNARE proteins must therefore be conducted with an accurate method. Through extensive experiments, we have developed a model based on graph-regularized k-local hyperplane distance nearest neighbor model (GHKNN) binary classification. In this, the model uses the physicochemical property extraction method to extract protein sequence features and the SMOTE method to upsample protein sequence features. The combination achieves the most accurate performance for identifying all protein sequences. Finally, we compare the model based on GHKNN binary classification with other classifiers and measure them using four different metrics: SN, SP, ACC, and MCC. In experiments, the model performs significantly better than other classifiers.

5.
Artigo em Inglês | MEDLINE | ID: mdl-32671038

RESUMO

The G Protein-Coupled Receptor (GPCR) family consists of more than 800 different members. In this article, we attempt to use the physicochemical properties of Composition, Transition, Distribution (CTD) to represent GPCRs. The dimensionality reduction method of MRMD2.0 filters the physicochemical properties of GPCR redundancy. Matplotlib plots the coordinates to distinguish GPCRs from other protein sequences. The chart data show a clear distinction effect, and there is a well-defined boundary between the two. The experimental results show that our method can predict GPCRs.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA