RESUMO
Synergistic drug combinations can improve the therapeutic effect and reduce the drug dosage to avoid toxicity. In previous years, an in vitro approach was utilized to screen synergistic drug combinations. However, the in vitro method is time-consuming and expensive. With the rapid growth of high-throughput data, computational methods are becoming efficient tools to predict potential synergistic drug combinations. Considering the limitations of the previous computational methods, we developed a new model named Siamese Network and Random Matrix Projection for AntiCancer Drug Combination prediction (SNRMPACDC). Firstly, the Siamese convolutional network and random matrix projection were used to process the features of the two drugs into drug combination features. Then, the features of the cancer cell line were processed through the convolutional network. Finally, the processed features were integrated and input into the multi-layer perceptron network to get the predicted score. Compared with the traditional method of splicing drug features into drug combination features, SNRMPACDC improved the interpretability of drug combination features to a certain extent. In addition, the introduction of convolutional networks can better extract the potential information in the features. SNRMPACDC achieved the root mean-squared error of 15.01 and the Pearson correlation coefficient of 0.75 in 5-fold cross-validation of regression prediction for response data. In addition, SNRMPACDC achieved the AUC of 0.91 ± 0.03 and the AUPR of 0.62 ± 0.05 in 5-fold cross-validation of classification prediction of synergistic or not. These results are almost better than all the previous models. SNRMPACDC would be an effective approach to infer potential anticancer synergistic drug combinations.
Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica , Biologia Computacional , Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Sinergismo Farmacológico , Biologia Computacional/métodos , Combinação de Medicamentos , Simulação por ComputadorRESUMO
Adverse drug-drug interactions (DDIs) have become an increasingly serious problem in the medical and health system. Recently, the effective application of deep learning and biomedical knowledge graphs (KGs) have improved the DDI prediction performance of computational models. However, the problems of feature redundancy and KG noise also arise, bringing new challenges for researchers. To overcome these challenges, we proposed a Multi-Channel Feature Fusion model for multi-typed DDI prediction (MCFF-MTDDI). Specifically, we first extracted drug chemical structure features, drug pairs' extra label features, and KG features of drugs. Then, these different features were effectively fused by a multi-channel feature fusion module. Finally, multi-typed DDIs were predicted through the fully connected neural network. To our knowledge, we are the first to integrate the extra label information into KG-based multi-typed DDI prediction; besides, we innovatively proposed a novel KG feature learning method and a State Encoder to obtain target drug pairs' KG-based features which contained more abundant and more key drug-related KG information with less noise; furthermore, a Gated Recurrent Unit-based multi-channel feature fusion module was proposed in an innovative way to yield more comprehensive feature information about drug pairs, effectively alleviating the problem of feature redundancy. We experimented with four datasets in the multi-class and the multi-label prediction tasks to comprehensively evaluate the performance of MCFF-MTDDI for predicting interactions of known-known drugs, known-new drugs and new-new drugs. In addition, we further conducted ablation studies and case studies. All the results fully demonstrated the effectiveness of MCFF-MTDDI.
Assuntos
Sistemas de Liberação de Medicamentos , Redes Neurais de Computação , Humanos , Interações Medicamentosas , PesquisadoresRESUMO
Exiting computational models for drug-target binding affinity prediction have much room for improvement in prediction accuracy, robustness and generalization ability. Most deep learning models lack interpretability analysis and few studies provide application examples. Based on these observations, we presented a novel model named Molecule Representation Block-based Drug-Target binding Affinity prediction (MRBDTA). MRBDTA is composed of embedding and positional encoding, molecule representation block and interaction learning module. The advantages of MRBDTA are reflected in three aspects: (i) developing Trans block to extract molecule features through improving the encoder of transformer, (ii) introducing skip connection at encoder level in Trans block and (iii) enhancing the ability to capture interaction sites between proteins and drugs. The test results on two benchmark datasets manifest that MRBDTA achieves the best performance compared with 11 state-of-the-art models. Besides, through replacing Trans block with single Trans encoder and removing skip connection in Trans block, we verified that Trans block and skip connection could effectively improve the prediction accuracy and reliability of MRBDTA. Then, relying on multi-head attention mechanism, we performed interpretability analysis to illustrate that MRBDTA can correctly capture part of interaction sites between proteins and drugs. In case studies, we firstly employed MRBDTA to predict binding affinities between Food and Drug Administration-approved drugs and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) replication-related proteins. Secondly, we compared true binding affinities between 3C-like proteinase and 185 drugs with those predicted by MRBDTA. The final results of case studies reveal reliable performance of MRBDTA in drug design for SARS-CoV-2.
Assuntos
COVID-19 , SARS-CoV-2 , Estados Unidos , Humanos , Reprodutibilidade dos Testes , Sistemas de Liberação de Medicamentos , ProteínasRESUMO
MicroRNAs (miRNAs) play crucial roles in human disease and can be targeted by small molecule (SM) drugs according to numerous studies, which shows that identifying SM-miRNA associations in human disease is important for drug development and disease treatment. We proposed the method of Ensemble of Kernel Ridge Regression-based Small Molecule-MiRNA Association prediction (EKRRSMMA) to uncover potential SM-miRNA associations by combing feature dimensionality reduction and ensemble learning. First, we constructed different feature subsets for both SMs and miRNAs. Then, we trained homogeneous base learners based on distinct feature subsets and took the average of scores obtained from these base learners as SM-miRNA association score. In EKRRSMMA, feature dimensionality reduction technology was employed in the process of construction of feature subsets to reduce the influence of noisy data. Besides, the base learner, namely KRR_avg, was the combination of two classifiers constructed under SM space and miRNA space, which could make full use of the information of SM and miRNA. To assess the prediction performance of EKRRSMMA, we conducted Leave-One-Out Cross-Validation (LOOCV), SM-fixed local LOOCV, miRNA-fixed local LOOCV and 5-fold CV based on two datasets. For Dataset 1 (Dataset 2), EKRRSMMA got the Area Under receiver operating characteristic Curves (AUCs) of 0.9793 (0.8871), 0.8071 (0.7705), 0.9732 (0.8586) and 0.9767 ± 0.0014 (0.8560 ± 0.0027). Besides, we conducted four case studies. As a result, 32 (5-Fluorouracil), 19 (17ß-Estradiol), 26 (5-Aza-2'-deoxycytidine) and 11 (cyclophosphamide) out of top 50 predicted potentially associated miRNAs were confirmed by database or experimental literature. Above evaluation results demonstrated that EKRRSMMA is reliable for predicting SM-miRNA associations.
Assuntos
MicroRNAs , Algoritmos , Área Sob a Curva , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Curva ROCRESUMO
In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA-disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA-disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA-disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA-disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /-0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA-disease associations.
Assuntos
Neoplasias da Mama , Neoplasias Pulmonares , MicroRNAs , Algoritmos , Biologia Computacional/métodos , Feminino , Predisposição Genética para Doença , Humanos , Neoplasias Pulmonares/genética , MicroRNAs/genéticaRESUMO
MicroRNAs (miRNAs) play crucial roles in multiple biological processes and human diseases and can be considered as therapeutic targets of small molecules (SMs). Because biological experiments used to verify SM-miRNA associations are time-consuming and expensive, it is urgent to propose new computational models to predict new SM-miRNA associations. Here, we proposed a novel method called Dual-network Collaborative Matrix Factorization (DCMF) for predicting the potential SM-miRNA associations. Firstly, we utilized the Weighted K Nearest Known Neighbors (WKNKN) method to preprocess SM-miRNA association matrix. Then, we constructed matrix factorization model to obtain two feature matrices containing latent features of SM and miRNA, respectively. Finally, the predicted SM-miRNA association score matrix was obtained by calculating the inner product of two feature matrices. The main innovations of this method were that the use of WKNKN method can preprocess the missing values of association matrix and the introduction of dual network can integrate more diverse similarity information into DCMF. For evaluating the validity of DCMF, we implemented four different cross validations (CVs) based on two distinct datasets and two different case studies. Finally, based on dataset 1 (dataset 2), DCMF achieved Area Under receiver operating characteristic Curves (AUC) of 0.9868 (0.8770), 0.9833 (0.8836), 0.8377 (0.7591) and 0.9836 ± 0.0030 (0.8632 ± 0.0042) in global Leave-One-Out Cross Validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV and 5-fold CV, respectively. For case studies, plenty of predicted associations have been confirmed by published experimental literature. Therefore, DCMF is an effective tool to predict potential SM-miRNA associations.
Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Curva ROCRESUMO
Seeds are important microbial vectors, and seed-associated pathogens can be introduced into a country through trade, resulting in yield and quality losses in agriculture. The aim of this study was to characterize the microbial communities associated with barley seeds, and based on which, to develop technical approaches to trace their geographical origins, and to inspect and identify quarantine pathogens. Our analysis defined the core microbiota of barley seed and revealed significant differences in the barley seed-associated microbial communities among different continents, suggesting a strong geographic specificity of the barley seed microbiota. By implementing a machine learning model, we achieved over 95% accuracy in tracing the origin of barley seeds. Furthermore, the analysis of co-occurrence and exclusion patterns provided important insights into the identification of candidate biocontrol agents or microbial inoculants that could be useful in improving barley yield and quality. A core pathogen database was developed, and a procedure for inspecting potential quarantine species associated with barley seed was established. These approaches proved effective in detecting four fungal and three bacterial quarantine species for the first time in the port of China. This study not only characterized the core microbiota of barley seeds but also provided practical approaches for tracing the regional origin of barley and identifying potential quarantine pathogens.
Assuntos
Bactérias , Fungos , Hordeum , Microbiota , Doenças das Plantas , Sementes , Hordeum/microbiologia , Sementes/microbiologia , Bactérias/isolamento & purificação , Bactérias/classificação , Bactérias/genética , Doenças das Plantas/microbiologia , Doenças das Plantas/prevenção & controle , Fungos/isolamento & purificação , Fungos/classificação , Fungos/genética , China , QuarentenaRESUMO
Studies have shown that the number of microbes in humans is almost 10 times that of cells. These microbes have been proven to play an important role in a variety of physiological processes, such as enhancing immunity, improving the digestion of gastrointestinal tract and strengthening metabolic function. In addition, in recent years, more and more research results have indicated that there are close relationships between the emergence of the human noncommunicable diseases and microbes, which provides a novel insight for us to further understand the pathogenesis of the diseases. An in-depth study about the relationships between diseases and microbes will not only contribute to exploring new strategies for the diagnosis and treatment of diseases but also significantly heighten the efficiency of new drugs development. However, applying the methods of biological experimentation to reveal the microbe-disease associations is costly and inefficient. In recent years, more and more researchers have constructed multiple computational models to predict microbes that are potentially associated with diseases. Here, we start with a brief introduction of microbes and databases as well as web servers related to them. Then, we mainly introduce four kinds of computational models, including score function-based models, network algorithm-based models, machine learning-based models and experimental analysis-based models. Finally, we summarize the advantages as well as disadvantages of them and set the direction for the future work of revealing microbe-disease associations based on computational models. We firmly believe that computational models are expected to be important tools in large-scale predictions of disease-related microbes.
Assuntos
Bases de Dados Factuais , Doença , Aprendizado de Máquina , Microbiota , Modelos Biológicos , HumanosRESUMO
Effective drugs are urgently needed to overcome human complex diseases. However, the research and development of novel drug would take long time and cost much money. Traditional drug discovery follows the rule of one drug-one target, while some studies have demonstrated that drugs generally perform their task by affecting related pathway rather than targeting single target. Thus, the new strategy of drug discovery, namely pathway-based drug discovery, have been proposed. Obviously, identifying associations between drugs and pathways plays a key role in the development of pathway-based drug discovery. Revealing the drug-pathway associations by experiment methods would take much time and cost. Therefore, some computational models were established to predict potential drug-pathway associations. In this review, we first introduced the background of drug and the concept of drug-pathway associations. Then, some publicly accessible databases and web servers about drug-pathway associations were listed. Next, we summarized some state-of-the-art computational methods in the past years for inferring drug-pathway associations and divided these methods into three classes, namely Bayesian spare factor-based, matrix decomposition-based and other machine learning methods. In addition, we introduced several evaluation strategies to estimate the predictive performance of various computational models. In the end, we discussed the advantages and limitations of existing computational methods and provided some suggestions about the future directions of the data collection and the calculation models development.
Assuntos
Biologia Computacional/métodos , Sistemas de Liberação de Medicamentos , Algoritmos , Teorema de Bayes , Descoberta de Drogas/métodos , HumanosRESUMO
In recent years, increasing microRNA (miRNA)-disease associations were identified through traditionally biological experiments. These associations contribute to revealing molecular mechanism of diseases and preventing and curing diseases. To improve the efficiency of miRNA-disease association discovery, some calculation methods were developed as auxiliary tools for researchers. In the current study, we raised a novel model named Bayesian Ranking for MiRNA-Disease Association prediction (BRMDA) by improving Bayesian Personalized Ranking from three aspects: (i) taking advantage of similarity of diseases and miRNAs; (ii) incorporating miRNA bias for miRNAs associated with different number of diseases; and (iii) implementing neighborhood-based approach for new miRNAs and diseases. For each investigated disease, BRMDA used the set of triples (i.e. disease, labeled miRNA, unlabeled miRNA) that reflected association preference of the disease to miRNAs as training set, which made full use of unknown samples rather than simply considering them as negative samples. To investigate the predictive performance of BRMDA, we employed leave-one-out cross-validation and obtained Area Under the Curve of 0.8697, which outperformed many classical methods. Besides, we further implemented three distinct classes of case studies for three common Neoplasms. As a result, there are 44 (Colon Neoplasms), 49 (Esophageal Neoplasms) and 49 (Lung Neoplasms) among the top 50 predicted miRNAs validated through experiments. In short, BRMDA would be a trustable tool for inferring valuable associations.
Assuntos
Teorema de Bayes , Predisposição Genética para Doença , MicroRNAs/genética , Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Humanos , Neoplasias/genéticaRESUMO
MicroRNA (miRNA) plays an important role in the occurrence, development, diagnosis and treatment of diseases. More and more researchers begin to pay attention to the relationship between miRNA and disease. Compared with traditional biological experiments, computational method of integrating heterogeneous biological data to predict potential associations can effectively save time and cost. Considering the limitations of the previous computational models, we developed the model of deep-belief network for miRNA-disease association prediction (DBNMDA). We constructed feature vectors to pre-train restricted Boltzmann machines for all miRNA-disease pairs and applied positive samples and the same number of selected negative samples to fine-tune DBN to obtain the final predicted scores. Compared with the previous supervised models that only use pairs with known label for training, DBNMDA innovatively utilizes the information of all miRNA-disease pairs during the pre-training process. This step could reduce the impact of too few known associations on prediction accuracy to some extent. DBNMDA achieves the AUC of 0.9104 based on global leave-one-out cross validation (LOOCV), the AUC of 0.8232 based on local LOOCV and the average AUC of 0.9048 ± 0.0026 based on 5-fold cross validation. These AUCs are better than other previous models. In addition, three different types of case studies for three diseases were implemented to demonstrate the accuracy of DBNMDA. As a result, 84% (breast neoplasms), 100% (lung neoplasms) and 88% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by recent literature. Therefore, we could conclude that DBNMDA is an effective method to predict potential miRNA-disease associations.
Assuntos
Predisposição Genética para Doença , MicroRNAs/genética , Neoplasias da Mama , Humanos , Neoplasias Pulmonares , Reprodutibilidade dos TestesRESUMO
Circular RNAs (circRNAs) are a class of single-stranded, covalently closed RNA molecules with a variety of biological functions. Studies have shown that circRNAs are involved in a variety of biological processes and play an important role in the development of various complex diseases, so the identification of circRNA-disease associations would contribute to the diagnosis and treatment of diseases. In this review, we summarize the discovery, classifications and functions of circRNAs and introduce four important diseases associated with circRNAs. Then, we list some significant and publicly accessible databases containing comprehensive annotation resources of circRNAs and experimentally validated circRNA-disease associations. Next, we introduce some state-of-the-art computational models for predicting novel circRNA-disease associations and divide them into two categories, namely network algorithm-based and machine learning-based models. Subsequently, several evaluation methods of prediction performance of these computational models are summarized. Finally, we analyze the advantages and disadvantages of different types of computational models and provide some suggestions to promote the development of circRNA-disease association identification from the perspective of the construction of new computational models and the accumulation of circRNA-related data.
Assuntos
Biologia Computacional/métodos , Neoplasias/genética , RNA Circular/genética , Algoritmos , Bases de Dados Genéticas , Feminino , Humanos , Aprendizado de Máquina , Modelos GenéticosRESUMO
Mounting evidence has demonstrated the significance of taking microRNAs (miRNAs) as the target of small molecule (SM) drugs for disease treatment. Given the fact that exploring new SM-miRNA associations through biological experiments is extremely expensive, several computing models have been constructed to reveal the possible SM-miRNA associations. Here, we built a computing model of Bounded Nuclear Norm Regularization for SM-miRNA Associations prediction (BNNRSMMA). Specifically, we first constructed a heterogeneous SM-miRNA network utilizing miRNA similarity, SM similarity, confirmed SM-miRNA associations and defined a matrix to represent the heterogeneous network. Then, we constructed a model to complete this matrix by minimizing its nuclear norm. The Alternating Direction Method of Multipliers was adopted to minimize the nuclear norm and obtain predicted scores. The main innovation lies in two aspects. During completion, we limited all elements of the matrix within the interval of (0,1) to make sure they have practical significance. Besides, instead of strictly fitting all known elements, a regularization term was incorporated to tolerate the noise in integrated similarities. Furthermore, four kinds of cross-validations on two datasets and two types of case studies were performed to evaluate the predictive performance of BNNRSMMA. Finally, BNNRSMMA attained areas under the curve of 0.9822 (0.8433), 0.9793 (0.8852), 0.8253 (0.7350) and 0.9758 ± 0.0029 (0.8759 ± 0.0041) under global leave-one-out cross-validation (LOOCV), miRNA-fixed LOOCV, SM-fixed LOOCV and 5-fold cross-validation based on Dataset 1(Dataset 2), respectively. With regard to case studies, plenty of predicted associations have been verified by experimental literatures. All these results confirmed that BNNRSMMA is a reliable tool for inferring associations.
Assuntos
Biologia Computacional/métodos , Descoberta de Drogas/métodos , Ligantes , MicroRNAs/química , Algoritmos , Área Sob a Curva , Biologia Computacional/normas , Descoberta de Drogas/normas , Humanos , MicroRNAs/genética , Curva ROC , Reprodutibilidade dos Testes , Bibliotecas de Moléculas PequenasRESUMO
Many biological experimental studies have confirmed that microRNAs (miRNAs) play a significant role in human complex diseases. Exploring miRNA-disease associations could be conducive to understanding disease pathogenesis at the molecular level and developing disease diagnostic biomarkers. However, since conducting traditional experiments is a costly and time-consuming way, plenty of computational models have been proposed to predict miRNA-disease associations. In this study, we presented a neoteric Bayesian model (KBMFMDA) that combines kernel-based nonlinear dimensionality reduction, matrix factorization and binary classification. The main idea of KBMFMDA is to project miRNAs and diseases into a unified subspace and estimate the association network in that subspace. KBMFMDA obtained the AUCs of 0.9132, 0.8708, 0.9008±0.0044 in global and local leave-one-out and five-fold cross validation. Moreover, KBMFMDA was applied to three important human cancers in three different kinds of case studies and most of the top 50 potential disease-related miRNAs were confirmed by many experimental reports.
Assuntos
Estudos de Associação Genética/métodos , MicroRNAs , Neoplasias/genética , Algoritmos , Teorema de Bayes , Neoplasias do Colo/genética , Neoplasias Esofágicas/genética , Humanos , Linfoma/genéticaRESUMO
Accumulating experimental evidence has demonstrated that microRNAs (miRNAs) have a huge impact on numerous critical biological processes and they are associated with different complex human diseases. Nevertheless, the task to predict potential miRNAs related to diseases remains difficult. In this paper, we developed a Kernel Fusion-based Regularized Least Squares for MiRNA-Disease Association prediction model (KFRLSMDA), which applied kernel fusion technique to fuse similarity matrices and then utilized regularized least squares to predict potential miRNA-disease associations. To prove the effectiveness of KFRLSMDA, we adopted leave-one-out cross-validation (LOOCV) and 5-fold cross-validation and then compared KFRLSMDA with 10 previous computational models (MaxFlow, MiRAI, MIDP, RKNNMDA, MCMDA, HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA). Outperforming other models, KFRLSMDA achieved AUCs of 0.9246 in global LOOCV, 0.8243 in local LOOCV and average AUC of 0.9175 ± 0.0008 in 5-fold cross-validation. In addition, respectively, 96%, 100% and 90% of the top 50 potential miRNAs for breast neoplasms, colon neoplasms and oesophageal neoplasms were confirmed by experimental discoveries. We also predicted potential miRNAs related to hepatocellular cancer by removing all known related miRNAs of this cancer and 98% of the top 50 potential miRNAs were verified. Furthermore, we predicted potential miRNAs related to lymphoma using the data set in the old version of the HMDD database and 80% of the top 50 potential miRNAs were confirmed. Therefore, it can be concluded that KFRLSMDA has reliable prediction performance.
Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Estudos de Associação Genética , MicroRNAs/genética , Neoplasias/genética , Neoplasias/patologia , Predisposição Genética para Doença , HumanosRESUMO
The central dogma of molecular biology has told that DNA sequences encode proteins through RNAs, which function as an information intermediary [...].
Assuntos
Simulação por Computador , Predisposição Genética para Doença , RNA não Traduzido/genética , Animais , Humanos , RNA não Traduzido/metabolismoRESUMO
As microRNAs (miRNAs) have been reported to be a type of novel high-value small molecule (SM) drug targets for disease treatments, many researchers are engaged in the field of exploring new SM-miRNA associations. Nevertheless, because of the high cost, adopting traditional biological experiments constrains the efficiency of discovering new associations between SMs and miRNAs. Therefore, as an important auxiliary tool, reliable computational models will be of great help to reveal SM-miRNA associations. In this article, we developed a computational model of sparse learning and heterogeneous graph inference for small molecule-miRNA association prediction (SLHGISMMA). Initially, the sparse learning method (SLM) was implemented to decompose the SM-miRNA adjacency matrix. Then, we integrated the reacquired association information together with the similarity information of SMs and miRNAs into a heterogeneous graph to infer potential SM-miRNA associations. Here, the main innovation of SLHGISMMA lies in the introduction of SLM to eliminate noises of the original adjacency matrix to some extent, which plays an important role in performance improvement. In addition, to assess SLHGISMMA' performance, four different kinds of cross-validations were performed based on two datasets. As a result, based on dataset 1 (dataset 2), SLHGISMMA achieved area under the curves of 0.9273 (0.7774), 0.9365 (0.7973), 0.7703 (0.6556), and 0.9241 ± 0.0052 (0.7724 ± 0.0032) in global leave-one-out cross-validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV, and 5-fold cross-validation, respectively. Moreover, in the case study on three important SMs via removing their known associations, the results showed that most of the top 50 predicted miRNAs were confirmed by the database SM2miR v1.0 or the experimental literature.
Assuntos
Biologia Computacional/métodos , Decitabina/uso terapêutico , Estradiol/uso terapêutico , Fluoruracila/uso terapêutico , MicroRNAs/metabolismo , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Algoritmos , Área Sob a Curva , Simulação por Computador , Humanos , Curva ROCRESUMO
MicroRNAs (miRNAs) play a key role in many critical biological processes and are involved in the occurrence and development of complex human diseases. Many studies demonstrated that discovering the associations between small molecules (SMs) and miRNAs will facilitate the design of miRNA targeted therapeutic strategies for complex human diseases. This work presents a calculation model of cross-layer dependency inference on multilayered networks for small molecule-miRNA association prediction (CLDISMMA), which constructed multilayered networks composed of SMs, miRNAs, and diseases. It utilized the within layer topology and the known cross-layer associations to infer latent representations of all layers for SM-miRNA association prediction. In CLDISMMA, the novelties lie in introducing disease information for SM-miRNA association prediction and utilizing a regularized optimization model to describe the SM-miRNA association prediction problem. To evaluate the performance of CLDISMMA, global leave-one-out cross validation (LOOCV) and miRNA-fixed and SM-fixed local LOOCV were implemented in two data sets. In data set 1, CLDISMMA achieved AUCs of 0.9889, 0.9886, and 0.7755 in turns. The corresponding AUCs were 0.8726, 0.8798, and 0.7021 based on data set 2. In addition, CLDISMMA obtained average AUCs of 0.9887 and 0.8647 in data sets 1 and 2 under 100 times 5-fold cross validation. Furthermore, we employed CLDISMMA to predict SM-miRNA associations based on data set 1, and 21 out of the top 50 predicted associations were confirmed by experimental reports. In the case study for new SMs, 5-fluorouracil and 5-aza-2'-deoxycytidine, 40 and 30 miRNAs, respectively, were verified to be associated with them among the top 50 miRNAs predicted by CLDISMMA.
Assuntos
MicroRNAs/metabolismo , Bibliotecas de Moléculas Pequenas/metabolismo , Humanos , MicroRNAs/química , Modelos Moleculares , Conformação de Ácido Nucleico , TermodinâmicaRESUMO
More and more studies found that many complex human diseases occur accompanied by aberrant expression of microRNAs (miRNAs). Small molecule (SM) drugs have been utilized to treat complex human diseases by affecting the expression of miRNAs. Several computational methods were proposed to infer underlying associations between SMs and miRNAs. In our study, we proposed a new calculation model of random forest based small molecule-miRNA association prediction (RFSMMA) which was based on the known SM-miRNA associations in the SM2miR database. RFSMMA utilized the similarity of SMs and miRNAs as features to represent SM-miRNA pairs and further implemented the machine learning algorithm of random forest to train training samples and obtain a prediction model. In RFSMMA, integrating multiple kinds of similarity can avoid the bias of single similarity and choosing more reliable features from original features can represent SM-miRNA pairs more accurately. We carried out cross validations to assess predictive accuracy of RFSMMA. As a result, RFSMMA acquired AUCs of 0.9854, 0.9839, 0.7052, and 0.9917 ± 0.0008 under global leave-one-out cross validation (LOOCV), miRNA-fixed local LOOCV, SM-fixed local LOOCV, and 5-fold cross validation, respectively, under data set 1. Based on data set 2, RFSMMA obtained AUCs of 0.8456, 0.8463, 0.6653, and 0.8389 ± 0.0033 under four cross validations according to the order mentioned above. In addition, we implemented a case study on three common SMs, namely, 5-fluorouracil, 17ß-estradiol, and 5-aza-2'-deoxycytidine. Among the top 50 associated miRNAs of these three SMs predicted by RFSMMA, 31, 32, and 28 miRNAs were verified, respectively. Therefore, RFSMMA is shown to be an effective and reliable tool for identifying underlying SM-miRNA associations.
Assuntos
Simulação por Computador , MicroRNAs/metabolismo , Bibliotecas de Moléculas Pequenas/metabolismo , Modelos BiológicosRESUMO
MicroRNAs (miRNAs) play an important role in prevention, diagnosis and treatment of human complex diseases. Predicting potential miRNA-disease associations could provide important prior information for medical researchers. Therefore, reliable computational models are expected to be an effective supplement for inferring associations between miRNAs and diseases. In this study, we developed a novel calculative model named Negative Samples Extraction based MiRNA-Disease Association prediction (NSEMDA). NSEMDA filtered reliable negative samples by two positive-unlabeled learning models, namely, the Spy and Rocchio techniques and calculated similarity weights for ambiguous samples. The positive samples, reliable negative samples and ambiguous samples with similarity weights were used to construct a Support Vector Machine-Similarity Weight model to predict miRNA-disease associations. NSEMDA improved the credibility of negative samples and reduced the impact of noise samples by introducing ambiguous samples with similarity weights to train prediction model. As a result, NSEMDA achieved the AUC of 0.8899 in global leave-one-out cross validation (LOOCV) and AUC of 0.8353 under local LOOCV. In 100 times 5-fold cross validation, NSEMDA obtained an average AUC of 0.8878 and standard deviation of 0.0014. These AUCs are higher than many classical models. Besides, we also carried out three kinds of case studies to evaluate the performance of NSEMDA. Among the top 50 potential related miRNAs of esophageal neoplasms, lung neoplasms and carcinoma hepatocellular predicted by NSEMDA, 46, 50 and 45 miRNAs were verified to be associated with the investigated disease by experimental evidences, respectively. Therefore, NSEMDA would be a reliable calculative model for inferring miRNA-disease associations.