Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37812388

RESUMEN

MOTIVATION: Numerous high-accuracy drug-target affinity (DTA) prediction models, whose performance is heavily reliant on the drug and target feature information, are developed at the expense of complexity and interpretability. Feature extraction and optimization constitute a critical step that significantly influences the enhancement of model performance, robustness, and interpretability. Many existing studies aim to comprehensively characterize drugs and targets by extracting features from multiple perspectives; however, this approach has drawbacks: (i) an abundance of redundant or noisy features; and (ii) the feature sets often suffer from high dimensionality. RESULTS: In this study, to obtain a model with high accuracy and strong interpretability, we utilize various traditional and cutting-edge feature selection and dimensionality reduction techniques to process self-associated features and adjacent associated features. These optimized features are then fed into learning to rank to achieve efficient DTA prediction. Extensive experimental results on two commonly used datasets indicate that, among various feature optimization methods, the regression tree-based feature selection method is most beneficial for constructing models with good performance and strong robustness. Then, by utilizing Shapley Additive Explanations values and the incremental feature selection approach, we obtain that the high-quality feature subset consists of the top 150D features and the top 20D features have a breakthrough impact on the DTA prediction. In conclusion, our study thoroughly validates the importance of feature optimization in DTA prediction and serves as inspiration for constructing high-performance and high-interpretable models. AVAILABILITY AND IMPLEMENTATION: https://github.com/RUXIAOQING964914140/FS_DTA.


Asunto(s)
Modelos Químicos , Preparaciones Farmacéuticas , Análisis de Regresión , Preparaciones Farmacéuticas/química
2.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33454758

RESUMEN

Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.


Asunto(s)
Algoritmos , Biología Computacional/métodos , ADN/química , Drogas en Investigación/farmacología , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , ADN/genética , ADN/metabolismo , Descubrimiento de Drogas , Drogas en Investigación/síntesis química , Humanos , Dominios Proteicos , Estructura Secundaria de Proteína , Proteínas/genética , Proteínas/metabolismo , Homología de Secuencia de Aminoácido
3.
Bioinformatics ; 38(7): 1964-1971, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35134828

RESUMEN

MOTIVATION: Drug-target interaction prediction plays an important role in new drug discovery and drug repurposing. Binding affinity indicates the strength of drug-target interactions. Predicting drug-target binding affinity is expected to provide promising candidates for biologists, which can effectively reduce the workload of wet laboratory experiments and speed up the entire process of drug research. Given that, numerous new proteins are sequenced and compounds are synthesized, several improved computational methods have been proposed for such predictions, but there are still some challenges. (i) Many methods only discuss and implement one application scenario, they focus on drug repurposing and ignore the discovery of new drugs and targets. (ii) Many methods do not consider the priority order of proteins (or drugs) related to each target drug (or protein). Therefore, it is necessary to develop a comprehensive method that can be used in multiple scenarios and focuses on candidate order. RESULTS: In this study, we propose a method called NerLTR-DTA that uses the neighbor relationship of similarity and sharing to extract features, and applies a ranking framework with regression attributes to predict affinity values and priority order of query drug (or query target) and its related proteins (or compounds). It is worth noting that using the characteristics of learning to rank to set different queries can smartly realize the multi-scenario application of the method, including the discovery of new drugs and new targets. Experimental results on two commonly used datasets show that NerLTR-DTA outperforms some state-of-the-art competing methods. NerLTR-DTA achieves excellent performance in all application scenarios mentioned in this study, and the rm(test)2 values guarantee such excellent performance is not obtained by chance. Moreover, it can be concluded that NerLTR-DTA can provide accurate ranking lists for the relevant results of most queries through the statistics of the association relationship of each query drug (or query protein). In general, NerLTR-DTA is a powerful tool for predicting drug-target associations and can contribute to new drug discovery and drug repurposing. AVAILABILITY AND IMPLEMENTATION: The proposed method is implemented in Python and Java. Source codes and datasets are available at https://github.com/RUXIAOQING964914140/NerLTR-DTA.


Asunto(s)
Algoritmos , Programas Informáticos , Desarrollo de Medicamentos/métodos , Descubrimiento de Drogas/métodos , Reposicionamiento de Medicamentos , Proteínas/química
4.
J Proteome Res ; 18(7): 2931-2939, 2019 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-31136183

RESUMEN

Cellular respiration provides direct energy substances for living organisms. Electron storage and transportation should be completed through electron transport chains during the cellular respiration process. Thus, identifying electron transport proteins is an important research task. In protein identification, selection of the feature extraction method and classification algorithm has a direct bearing on classification. The distance-based Top-n-gram method, which was proposed based on the frequency profile and considered evolutionary information, was used in this study for feature extraction. The Max-Relevance-Max-Distance algorithm was adopted for feature selection. The first 4D features that greatly influenced the classification result were selected to form the feature data set. Finally, the random forest algorithm was used to identify electron transport proteins. Under the 10-fold cross-validation of the model constructed in this study, sensitivity, specificity, and accuracy rates surpassed 85%, 80%, and 82%, respectively. In the testing set, F-measure, AUC value, and accuracy exceeded 74%, 95%, and 86%, respectively. These experimental results indicated that the classification model built in this study is an effective tool in identifying electron transport proteins.


Asunto(s)
Algoritmos , Proteínas Portadoras/análisis , Proteínas del Complejo de Cadena de Transporte de Electrón/análisis , Transporte de Electrón , Clasificación , Modelos Químicos , Sensibilidad y Especificidad
5.
Front Genet ; 12: 680117, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34234813

RESUMEN

Exploring drug-target interactions by biomedical experiments requires a lot of human, financial, and material resources. To save time and cost to meet the needs of the present generation, machine learning methods have been introduced into the prediction of drug-target interactions. The large amount of available drug and target data in existing databases, the evolving and innovative computer technologies, and the inherent characteristics of various types of machine learning have made machine learning techniques the mainstream method for drug-target interaction prediction research. In this review, details of the specific applications of machine learning in drug-target interaction prediction are summarized, the characteristics of each algorithm are analyzed, and the issues that need to be further addressed and explored for future research are discussed. The aim of this review is to provide a sound basis for the construction of high-performance models.

6.
Brief Funct Genomics ; 20(5): 312-322, 2021 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-34189559

RESUMEN

Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.


Asunto(s)
Descubrimiento de Drogas , Preparaciones Farmacéuticas , Desarrollo de Medicamentos , Aprendizaje Automático , Simulación del Acoplamiento Molecular
7.
Comput Biol Med ; 119: 103660, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32090901

RESUMEN

Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.


Asunto(s)
Algoritmos , Preparaciones Farmacéuticas , Aprendizaje Automático , Receptores Acoplados a Proteínas G
8.
Front Microbiol ; 10: 507, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30972038

RESUMEN

The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.

9.
Mol Ther Nucleic Acids ; 18: 16-23, 2019 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-31479921

RESUMEN

Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA