Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 9 de 9
1.
Front Aging Neurosci ; 14: 752858, 2022.
Article En | MEDLINE | ID: mdl-35401145

Alzheimer's disease (AD) is one of the most common neurodegenerative diseases. To identify AD-related genes from transcriptomics and help to develop new drugs to treat AD. In this study, firstly, we obtained differentially expressed genes (DEG)-enriched coexpression networks between AD and normal samples in multiple transcriptomics datasets by weighted gene co-expression network analysis (WGCNA). Then, a convergent genomic approach (CFG) integrating multiple AD-related evidence was used to prioritize potential genes from DEG-enriched modules. Subsequently, we identified candidate genes in the potential genes list. Lastly, we combined deepDTnet and SAveRUNNER to predict interaction among candidate genes, drug and AD. Experiments on five datasets show that the CFG score of GJA1 is the highest among all potential driver genes of AD. Moreover, we found GJA1 interacts with AD from target-drugs-diseases network prediction. Therefore, candidate gene GJA1 is the most likely to be target of AD. In summary, identification of AD-related genes contributes to the understanding of AD pathophysiology and the development of new drugs.

2.
Pharmacol Res ; 159: 104932, 2020 09.
Article En | MEDLINE | ID: mdl-32473309

Precision oncology involves effectively selecting drugs for cancer patients and planning an effective treatment regimen. However, for Molecular targeted drug, using genomic state of the drug target to select drugs has limitations. Many patients who could benefit from molecularly targeted drugs, but they are being missed due to the insufficient labelling ability of the existing target genes. For non-specific chemotherapy drugs, most of the first-line anticancer drugs do not have biomarkers to guide doctor make treatment regimen. Furthermore, it is important to determine a long-term treatment plan based on the patient's genomic data during tumor evolution. Therefore, it is necessary to establish a tumor drug sensitivity prediction model, which can assist doctors in designing a personalized tumor treatment regimen. This paper proposed a novel model to predict tumor drug sensitivity including targeted drugs and non-specific chemotherapy drugs. This model uses statistical methods based on Bimodal distribution to select multimodal genetic data to solve dimensional challenges and reduce noise and to establish a classification model to predict the effectiveness of the drug in the tumor cell line using machine learning. The experimental test 87 molecular targeted drugs and non-specific chemotherapy drugs. The results show that the method can effectively predict the sensitivity of tumor drugs with an average sensitivity of 0.98 and specificity of 0.97. This model is worth to promotion. If it can be successfully used in clinical trials, it will effectively assist doctors to develop personalized cancer treatment programs and expand the application of molecularly targeted drugs.


Antineoplastic Agents/pharmacology , Biomarkers, Tumor/antagonists & inhibitors , Decision Support Techniques , Genomics , Machine Learning , Neoplasms/drug therapy , Precision Medicine , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Cell Line, Tumor , Clinical Decision-Making , Databases, Genetic , Drug Screening Assays, Antitumor , Gene Expression Regulation, Neoplastic , Humans , Models, Statistical , Molecular Targeted Therapy , Neoplasms/genetics , Neoplasms/metabolism , Pharmacogenetics , Signal Transduction
3.
Sci Rep ; 10(1): 5755, 2020 04 01.
Article En | MEDLINE | ID: mdl-32238826

The widespread applications of high-throughput sequencing technology have produced a large number of publicly available gene expression datasets. However, due to the gene expression datasets have the characteristics of small sample size, high dimensionality and high noise, the application of biostatistics and machine learning methods to analyze gene expression data is a challenging task, such as the low reproducibility of important biomarkers in different studies. Meta-analysis is an effective approach to deal with these problems, but the current methods have some limitations. In this paper, we propose the meta-analysis based on three nonconvex regularization methods, which are L1/2 regularization (meta-Half), Minimax Concave Penalty regularization (meta-MCP) and Smoothly Clipped Absolute Deviation regularization (meta-SCAD). The three nonconvex regularization methods are effective approaches for variable selection developed in recent years. Through the hierarchical decomposition of coefficients, our methods not only maintain the flexibility of variable selection and improve the efficiency of selecting important biomarkers, but also summarize and synthesize scientific evidence from multiple studies to consider the relationship between different datasets. We give the efficient algorithms and the theoretical property for our methods. Furthermore, we apply our methods to the simulation data and three publicly available lung cancer gene expression datasets, and compare the performance with state-of-the-art methods. Our methods have good performance in simulation studies, and the analysis results on the three publicly available lung cancer gene expression datasets are clinically meaningful. Our methods can also be extended to other areas where datasets are heterogeneous.


Gene Expression Profiling/methods , Genomics/methods , Machine Learning , Algorithms , Databases, Genetic , Datasets as Topic , Gene Regulatory Networks , Humans
4.
Int J Neural Syst ; 29(9): 1950016, 2019 Nov.
Article En | MEDLINE | ID: mdl-31390912

Molecular descriptor selection is an essential procedure to improve a predictive quantitative structure-activity relationship (QSAR) model. However, within the QSAR model, there are a number of redundant, noisy and irrelevant descriptors. In this study, we propose a novel descriptor selection framework using self-paced learning (SPL) via sparse logistic regression (LR) with Logsum penalty (SPL-Logsum), which can simultaneously adaptively identify the simple and complex samples and avoid over-fitting. SPL is inspired by the learning process of humans or animals gradually learned from simple and complex samples to train models, and the Logsum penalized LR helps to select a small subset of significant molecular descriptors for improving the QSAR models. Experimental results on some simulations and three public QSAR datasets show that our proposed SPL-Logsum framework outperforms other existing sparse methods regarding the area under the curve, sensitivity, specificity, accuracy, and P-values.


Models, Biological , Quantitative Structure-Activity Relationship , Computer Simulation , Databases, Factual , Humans , Machine Learning
5.
J Chem Inf Model ; 59(7): 3340-3351, 2019 07 22.
Article En | MEDLINE | ID: mdl-31260620

Identifying drug-target interactions (DTIs) plays an important role in the field of drug discovery, drug side-effects, and drug repositioning. However, in vivo or biochemical experimental methods for identifying new DTIs are extremely expensive and time-consuming. Recently, in silico or various computational methods have been developed for DTI prediction, such as ligand-based approaches and docking approaches, but these traditional computational methods have several limitations. This work utilizes the chemogenomic-based approaches for efficiently identifying potential DTI candidates, namely, self-paced learning with collaborative matrix factorization based on weighted low-rank approximation (SPLCMF) for DTI prediction, which integrates multiple networks related to drugs and targets into regularized least-squares and focuses on learning a low-dimensional vector representation of features. The SPLCMF framework can select samples from easy to complex into training by using soft weighting, which is inclined to more faithfully reflect the latent importance of samples in training. Experimental results on synthetic data and five benchmark data sets show that our proposed SPLCMF outperforms other existing state-of-the-art approaches. These results indicate that our proposed SPLCMF can provide a useful tool to predict unknown DTIs, which may provide new insights into drug discovery, drug side-effect prediction, and repositioning existing drug.


Computational Biology/methods , Drug Discovery , Drug Repositioning , Drug-Related Side Effects and Adverse Reactions , Machine Learning
6.
Sci Rep ; 9(1): 8802, 2019 06 19.
Article En | MEDLINE | ID: mdl-31217424

Blood-Brain-Barrier (BBB) is a strict permeability barrier for maintaining the Central Nervous System (CNS) homeostasis. One of the most important conditions to judge a CNS drug is to figure out whether it has BBB permeability or not. In the past 20 years, the existing prediction approaches are usually based on the data of the physical characteristics and chemical structure of drugs. However, these methods are usually only applicable to small molecule compounds based on passive diffusion through BBB. To deal this problem, one of the most famous methods is multi-core SVM method, which is based on clinical phenotypes about Drug Side Effects and Drug Indications to predict drug penetration of BBB. This paper proposed a Deep Learning method to predict the Blood-Brain-Barrier permeability based on the clinical phenotypes data. The validation result on three datasets proved that Deep Learning method achieves better performance than the other existing methods. The average accuracy of our method reaches 0.97, AUC reaches 0.98, and the F1 score is 0.92. The results proved that Deep Learning methods can significantly improve the prediction accuracy of drug BBB permeability and it can help researchers to reduce clinical trials and find new CNS drugs.


Blood-Brain Barrier/physiology , Deep Learning , Pharmaceutical Preparations/classification , Databases as Topic , Humans , ROC Curve , Reproducibility of Results , Support Vector Machine
7.
Int J Mol Sci ; 19(1)2017 Dec 22.
Article En | MEDLINE | ID: mdl-29271922

The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between the chemical structure and biological activities in the field of drug design and discovery. (1) Background: In the study of QSAR, the chemical structures of compounds are encoded by a substantial number of descriptors. Some redundant, noisy and irrelevant descriptors result in a side-effect for the QSAR model. Meanwhile, too many descriptors can result in overfitting or low correlation between chemical structure and biological bioactivity. (2) Methods: We use novel log-sum regularization to select quite a few descriptors that are relevant to biological activities. In addition, a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. (3) Results: Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. (4) Conclusions: Our proposed multiple linear regression with log-sum penalty is an effective technique for both descriptor selection and prediction of biological activity.


Algorithms , Drug Design , Quantitative Structure-Activity Relationship , Animals , Computer Simulation , Humans , Linear Models , Models, Biological
8.
Sci Rep ; 7(1): 13053, 2017 10 12.
Article En | MEDLINE | ID: mdl-29026100

Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.


Neoplasms/mortality , Supervised Machine Learning , Survival Analysis , Algorithms , Humans , Proportional Hazards Models
9.
Biomed Mater Eng ; 26 Suppl 1: S1837-43, 2015.
Article En | MEDLINE | ID: mdl-26405955

Tuberculosis (TB), caused by infection with mycobacterium tuberculosis, is still a major threat to human health worldwide. Current diagnostic methods encounter some limitations, such as sample collection problem or unsatisfied sensitivity and specificity issue. Moreover, it is hard to identify TB from some of other lung diseases without invasive biopsy. In this paper, the logistic models with three representative regularization approaches including Lasso (the most popular regularization method), and L1/2 (the method that inclines to achieve more sparse solution than Lasso) and Elastic Net (the method that encourages a grouping effect of genes in the results) adopted together to select the common gene signatures in microarray data of peripheral blood cells. As the result, 13 common gene signatures were selected, and sequentially the classifier based on them is constructed by the SVM approach, which can accurately distinguish tuberculosis from other pulmonary diseases and healthy controls. In the test and validation datasets of the blood gene expression profiles, the generated classification model achieved 91.86% sensitivity and 93.48% specificity averagely. Its sensitivity is improved 6%, but only 26% gene signatures used compared to recent research results. These 13 gene signatures selected by our methods can be used as the basis of a blood-based test for the detection of TB from other pulmonary diseases and healthy controls.


Blood Proteins/analysis , Diagnosis, Computer-Assisted/methods , Gene Expression Profiling/methods , Pattern Recognition, Automated/methods , Tuberculosis, Pulmonary/blood , Tuberculosis, Pulmonary/diagnosis , Algorithms , Biomarkers/blood , Humans , Logistic Models , Lung Diseases/blood , Lung Diseases/diagnosis , Reproducibility of Results , Sensitivity and Specificity , Support Vector Machine
...