Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
IEEE J Biomed Health Inform ; 27(11): 5675-5684, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37672364

RESUMEN

Many powerful computational methods based on graph neural networks (GNNs) have been proposed to predict drug-protein interactions (DPIs). It can effectively reduce laboratory workload and the cost of drug discovery and drug repurposing. However, many clinical functions of drugs and proteins are unknown due to their unobserved indications. Therefore, it is difficult to establish a reliable drug-protein heterogeneous network that can describe the relationships between drugs and proteins based on the available information. To solve this problem, we propose a DPI prediction method that can self-adaptively adjust the topological structure of the heterogeneous networks, and name it SATS. SATS establishes a representation learning module based on graph attention network to carry out the drug-protein heterogeneous network. It can self-adaptively learn the relationships among the nodes based on their attributes and adjust the topological structure of the network according to the training loss of the model. Finally, SATS predicts the interaction propensity between drugs and proteins based on their embeddings. The experimental results show that SATS can effectively improve the topological structure of the network. The performance of SATS outperforms several state-of-the-art DPI prediction methods under various evaluation metrics. These prove that SATS is useful to deal with incomplete data and unreliable networks. The case studies on the top section of the prediction results further demonstrate that SATS is powerful for discovering novel DPIs.


Asunto(s)
Benchmarking , Descubrimiento de Drogas , Humanos , Interacciones Farmacológicas , Reposicionamiento de Medicamentos , Redes Neurales de la Computación
2.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2200-2209, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37021862

RESUMEN

Exploring drug-protein interactions (DPIs) through computational methods can effectively reduce the workload and the cost of DPI identification. Previous works try to predict DPIs by integrating and analyzing the unique features of drugs and proteins. They cannot adequately analyze the consistency between the drug features and the protein features due to their different semantics. However, the consistency of their features, such as the correlation originating from their sharing diseases, may reveal some potential DPIs. Here we propose a deep neural network-based co-coding method (DNNCC for short) to predict novel DPIs. DNNCC projects the original features of drugs and proteins to a common embedding space through a co-coding strategy. In this way, the embedding features of drugs and proteins have the same semantics. Therefore, the prediction module can discover the unknown DPIs by exploring the feature consistency between drugs and proteins. The experimental results indicate that the performance of DNNCC is significantly superior to five state-of-the-art DPI prediction methods under several evaluation metrics. The superiority of integrating and analyzing the common features of drugs and proteins is proved by the ablation experiments. The novel DPIs predicted by DNNCC verify that DNNCC is a powerful prior tool that can effectively discover potential DPIs.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Proteínas/genética
3.
BMC Bioinformatics ; 23(1): 438, 2022 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-36266626

RESUMEN

Recently, Deep Learning based automatic generation of treatment recommendation has been attracting much attention. However, medical datasets are usually small, which may lead to over-fitting and inferior performances of deep learning models. In this paper, we propose multi-objective data enhancement method to indirectly scale up the medical data to avoid over-fitting and generate high quantity treatment recommendations. Specifically, we define a main and several auxiliary tasks on the same dataset and train a specific model for each of these tasks to learn different aspects of knowledge in limited data scale. Meanwhile, a Soft Parameter Sharing method is exploited to share learned knowledge among models. By sharing the knowledge learned by auxiliary tasks to the main task, the proposed method can take different semantic distributions into account during the training process of the main task. We collected an ultrasound dataset of thyroid nodules that contains Findings, Impressions and Treatment Recommendations labeled by professional doctors. We conducted various experiments on the dataset to validate the proposed method and justified its better performance than existing methods.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Proyectos de Investigación , Conocimiento
4.
Bioinformatics ; 38(22): 5073-5080, 2022 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-36111859

RESUMEN

MOTIVATION: Large-scale heterogeneous data provide diverse perspectives for predicting drug-protein interactions (DPIs). However, the available information on molecular interactions and clinical associations related to drugs or proteins is incomplete because there may be unproven interactions and associations. This incomplete information in the available data is presented in the form of non-interaction and non-correlation, which may mislead the prediction model. Existing methods fuse incomplete and complete information without considering their integrity, so the negative effects of incomplete information still exist. RESULTS: We develop a network-based DPI prediction method named BRWCP, which uses the complete information network to correct the prediction results acquired by the incomplete information network. By integrating relevant heterogeneous information that may be incomplete, the feature similarities of drugs and proteins are obtained. Combining the feature similarities and known DPIs, an incomplete information-based drug-protein heterogeneous network is constructed. Then, a bidirectional random walk with pruning algorithm is adopted in this heterogeneous network to predict potential DPIs. Next, the predicted DPIs are combined with the chemical fingerprint similarity of drugs and amino acid sequence similarity of proteins to construct the complete information network. The bidirectional random walk with pruning algorithm is applied in the new network to obtain the final prediction results until it converges. Experimental results show that BRWCP is superior to several state-of-the-art DPI prediction methods, and case studies further confirm its ability to tap potential DPIs. AVAILABILITY AND IMPLEMENTATION: The code and data used in BRWCP are available at https://github.com/lyfdomain/BRWCP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Biología Computacional/métodos , Proteínas , Interacciones Farmacológicas
5.
Database (Oxford) ; 20222022 08 25.
Artículo en Inglés | MEDLINE | ID: mdl-36006844

RESUMEN

Although several traditional Chinese medicine (TCM)-related databases have emerged, they focus on researching single medicinal materials, which is far from sufficient for clinical research and application. In comparison, compound prescriptions are more informative and meaningful in TCM, for they embody the information on the compatibility of TCM besides the relatively isolated information about single medicinal materials. The compatibility information is essential in TCM because it conveys not only what components are involved to treat special diseases but also how to combine these single medical materials. We established a database of Chinese patent medicine and compound prescription (CPMCP). It demonstrates the prescription information of Chinese patent medicines (CPMs) and ancient Chinese medicine prescriptions (CMPs). CPMCP reports their comprehensive and standardized information such as the components, indications and contraindications. It is worth mentioning that we organized relevant experts and spent lots of time manually mapping the functions of compound prescriptions in ancient Chinese to the standardized TCM symptom vocabularies, obtaining a total of 71 414 associations between compound prescriptions and TCM symptoms. In this way, CPMCP established the associations between TCM and modern medicine (MM) according to the associations between TCM symptoms and MM symptoms. In addition, to further exhibit the compatibility mechanism of compound prescriptions, CPMCP summarizes a set of common drug combination principles by analyzing the existing prescriptions. We believe that CPMCP can promote the modernization of TCM and make greater contributions to MM. Database URL http://cpmcp.top.


Asunto(s)
Medicamentos Herbarios Chinos , China , Medicamentos Herbarios Chinos/uso terapéutico , Medicina Tradicional China , Medicamentos sin Prescripción/uso terapéutico , Prescripciones
6.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35212712

RESUMEN

Although sifting functional genes has been discussed for years, traditional selection methods tend to be ineffective in capturing potential specific genes. First, typical methods focus on finding features (genes) relevant to class while irrelevant to each other. However, the features that can offer rich discriminative information are more likely to be the complementary ones. Next, almost all existing methods assess feature relations in pairs, yielding an inaccurate local estimation and lacking a global exploration. In this paper, we introduce multi-variable Area Under the receiver operating characteristic Curve (AUC) to globally evaluate the complementarity among features by employing Area Above the receiver operating characteristic Curve (AAC). Due to AAC, the class-relevant information newly provided by a candidate feature and that preserved by the selected features can be achieved beyond pairwise computation. Furthermore, we propose an AAC-based feature selection algorithm, named Multi-variable AUC-based Combined Features Complementarity, to screen discriminative complementary feature combinations. Extensive experiments on public datasets demonstrate the effectiveness of the proposed approach. Besides, we provide a gene set about prostate cancer and discuss its potential biological significance from the machine learning aspect and based on the existing biomedical findings of some individual genes.


Asunto(s)
Algoritmos , Aprendizaje Automático , Área Bajo la Curva , Curva ROC
7.
Bioinformatics ; 37(20): 3618-3625, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-34019069

RESUMEN

MOTIVATION: Exploring the potential drug-target interactions (DTIs) is a key step in drug discovery and repurposing. In recent years, predicting the probable DTIs through computational methods has gradually become a research hot spot. However, most of the previous studies failed to judiciously take into account the consistency between the chemical properties of drug and its functions. The changes of these relationships may lead to a severely negative effect on the prediction of DTIs. RESULTS: We propose an autoencoder-based method, AEFS, under spatial consistency constraints to predict DTIs. A heterogeneous network is established to integrate the information of drugs, proteins and diseases. The original drug features are projected to an embedding (protein) space by a multi-layer encoder, and further projected into label (disease) space by a decoder. In this process, the clinical information of drugs is introduced to assist the DTI prediction. By maintaining the distribution of drug correlation in the original feature, embedding and label space, AEFS keeps the consistency between chemical properties and functions of drugs. Experimental comparisons indicate that AEFS is more robust for imbalanced data and of significantly superior performance in DTI prediction. Case studies further confirm its ability to mine the latent DTIs. AVAILABILITY AND IMPLEMENTATION: The code of AEFS is available at https://github.com/JackieSun818/AEFS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
IEEE/ACM Trans Comput Biol Bioinform ; 15(5): 1538-1548, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-28600259

RESUMEN

Selecting functional genes is essential for analyzing microarray data. Among many available feature (gene) selection approaches, the ones on the basis of the large margin nearest neighbor receive more attention due to their low computational costs and high accuracies in analyzing the high-dimensional data. Yet, there still exist some problems that hamper the existing approaches in sifting real target genes, including selecting erroneous nearest neighbors, high sensitivity to irrelevant genes, and inappropriate evaluation criteria. Previous pioneer works have partly addressed some of the problems, but none of them are capable of solving these problems simultaneously. In this paper, we propose a new local-nearest-neighbors-based feature weighting approach to alleviate the above problems. The proposed approach is based on the trick of locally minimizing the within-class distances and maximizing the between-class distances with the nearest neighbors rule. We further define a feature weight vector, and construct it by minimizing the cost function with a regularization term. The proposed approach can be applied naturally to the multi-class problems and does not require extra modification. Experimental results on the UCI and the open microarray data sets validate the effectiveness and efficiency of the new approach.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Genes/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Perfilación de la Expresión Génica , Reconocimiento de Normas Patrones Automatizadas , Análisis de Secuencia de ADN
9.
Genes (Basel) ; 8(10)2017 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-28956817

RESUMEN

Detecting associations between an input gene set and annotated gene sets (e.g., pathways) is an important problem in modern molecular biology. In this paper, we propose two algorithms, termed NetPEA and NetPEA', for conducting network-based pathway enrichment analysis. Our algorithms consider not only shared genes but also gene-gene interactions. Both algorithms utilize a protein-protein interaction network and a random walk with a restart procedure to identify hidden relationships between an input gene set and pathways, but both use different randomization strategies to evaluate statistical significance and as a result emphasize different pathway properties. Compared to an over representation-based method, our algorithms can identify more statistically significant pathways. Compared to an existing network-based algorithm, EnrichNet, our algorithms have a higher sensitivity in revealing the true causal pathways while at the same time achieving a higher specificity. A literature review of selected results indicates that some of the novel pathways reported by our algorithms are biologically relevant and important. While the evaluations are performed only with KEGG pathways, we believe the algorithms can be valuable for general functional discovery from high-throughput experiments.

10.
BMC Bioinformatics ; 18(Suppl 3): 50, 2017 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-28361689

RESUMEN

BACKGROUND: The Receiver Operator Characteristic (ROC) curve is well-known in evaluating classification performance in biomedical field. Owing to its superiority in dealing with imbalanced and cost-sensitive data, the ROC curve has been exploited as a popular metric to evaluate and find out disease-related genes (features). The existing ROC-based feature selection approaches are simple and effective in evaluating individual features. However, these approaches may fail to find real target feature subset due to their lack of effective means to reduce the redundancy between features, which is essential in machine learning. RESULTS: In this paper, we propose to assess feature complementarity by a trick of measuring the distances between the misclassified instances and their nearest misses on the dimensions of pairwise features. If a misclassified instance and its nearest miss on one feature dimension are far apart on another feature dimension, the two features are regarded as complementary to each other. Subsequently, we propose a novel filter feature selection approach on the basis of the ROC analysis. The new approach employs an efficient heuristic search strategy to select optimal features with highest complementarities. The experimental results on a broad range of microarray data sets validate that the classifiers built on the feature subset selected by our approach can get the minimal balanced error rate with a small amount of significant features. CONCLUSIONS: Compared with other ROC-based feature selection approaches, our new approach can select fewer features and effectively improve the classification performance.


Asunto(s)
Área Bajo la Curva , Biología Computacional/métodos , Algoritmos , Investigación Empírica , Modelos Teóricos , Análisis de Secuencia por Matrices de Oligonucleótidos , Curva ROC
11.
Int J Data Min Bioinform ; 12(4): 434-50, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26510296

RESUMEN

Sifting functional genes is crucial to the new strategies for drug discovery and prospective patient-tailored therapy. Generally, simply generating gene subset by selecting the top k individually superior genes may obtain an inferior gene combination, for some selected genes may be redundant with respect to some others. In this paper, we propose to select gene subset based on the criterion of minimum Bayesian error probability. The method dynamically evaluates all available genes and sifts only one gene at a time. A gene is selected if its combination with the other selected genes can gain better classification information. Within the generated gene subset, each individual gene is the most discriminative one in comparison with those that classify cancers in the same way as this gene does and different genes are more discriminative in combination than in individual. The genes selected in this way are likely to be functional ones from the system biology perspective, for genes tend to co-regulate rather than regulate individually. Experimental results show that the classifiers induced based on this method are capable of classifying cancers with high accuracy, while only a small number of genes are involved.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genes , Análisis de Secuencia de ADN/métodos , Teorema de Bayes
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...