Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia.

Wang, Ran; Zheng, Xubin; Wang, Jun; Wan, Shibiao; Song, Fangda; Wong, Man Hon; Leung, Kwong Sak; Cheng, Lixin.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35136933

RESUMO

The advances in single-cell RNA sequencing (scRNA-seq) technologies enable the characterization of transcriptomic profiles at the cellular level and demonstrate great promise in bulk sample analysis thereby offering opportunities to transfer gene signature from scRNA-seq to bulk data. However, the gene expression signatures identified from single cells are typically inapplicable to bulk RNA-seq data due to the profiling differences of distinct sequencing technologies. Here, we propose single-cell pair-wise gene expression (scPAGE), a novel method to develop single-cell gene pair signatures (scGPSs) that were beneficial to bulk RNA-seq classification to transfer knowledge across platforms. PAGE was adopted to tackle the challenge of profiling differences. We applied the method to acute myeloid leukemia (AML) and identified the scGPS from mouse scRNA-seq that allowed discriminating between AML and control cells. The scGPS was validated in bulk RNA-seq datasets and demonstrated better performance (average area under the curve [AUC] = 0.96) than the conventional gene expression strategies (average AUC$\le$ 0.88) suggesting its potential in disclosing the molecular mechanism of AML. The scGPS also outperformed its bulk counterpart, which highlighted the benefit of gene signature transfer. Furthermore, we confirmed the utility of scPAGE in sepsis as an example of other disease scenarios. scPAGE leveraged the advantages of single-cell profiles to enhance the analysis of bulk samples revealing great potential of transferring knowledge from single-cell to bulk transcriptome studies.

Assuntos

Leucemia Mieloide Aguda , Análise de Célula Única , Animais , Perfilação da Expressão Gênica/métodos , Leucemia Mieloide Aguda/genética , Camundongos , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma

2.

Deciphering associations between gut microbiota and clinical factors using microbial modules.

Wang, Ran; Zheng, Xubin; Song, Fangda; Wong, Man Hon; Leung, Kwong Sak; Cheng, Lixin.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37084255

RESUMO

MOTIVATION: Human gut microbiota plays a vital role in maintaining body health. The dysbiosis of gut microbiota is associated with a variety of diseases. It is critical to uncover the associations between gut microbiota and disease states as well as other intrinsic or environmental factors. However, inferring alterations of individual microbial taxa based on relative abundance data likely leads to false associations and conflicting discoveries in different studies. Moreover, the effects of underlying factors and microbe-microbe interactions could lead to the alteration of larger sets of taxa. It might be more robust to investigate gut microbiota using groups of related taxa instead of the composition of individual taxa. RESULTS: We proposed a novel method to identify underlying microbial modules, i.e. groups of taxa with similar abundance patterns affected by a common latent factor, from longitudinal gut microbiota and applied it to inflammatory bowel disease (IBD). The identified modules demonstrated closer intragroup relationships, indicating potential microbe-microbe interactions and influences of underlying factors. Associations between the modules and several clinical factors were investigated, especially disease states. The IBD-associated modules performed better in stratifying the subjects compared with the relative abundance of individual taxa. The modules were further validated in external cohorts, demonstrating the efficacy of the proposed method in identifying general and robust microbial modules. The study reveals the benefit of considering the ecological effects in gut microbiota analysis and the great promise of linking clinical factors with underlying microbial modules. AVAILABILITY AND IMPLEMENTATION: https://github.com/rwang-z/microbial_module.git.

Assuntos

Microbioma Gastrointestinal , Doenças Inflamatórias Intestinais , Humanos , Interações Microbianas

3.

bvnGPS: a generalizable diagnostic model for acute bacterial and viral infection using integrative host transcriptomics and pretrained neural networks.

Li, Qizhi; Zheng, Xubin; Xie, Jize; Wang, Ran; Li, Mengyao; Wong, Man-Hon; Leung, Kwong-Sak; Li, Shuai; Geng, Qingshan; Cheng, Lixin.

Bioinformatics ; 39(3)2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36857587

RESUMO

MOTIVATION: The confusion of acute inflammation infected by virus and bacteria or noninfectious inflammation will lead to missing the best therapy occasion resulting in poor prognoses. The diagnostic model based on host gene expression has been widely used to diagnose acute infections, but the clinical usage was hindered by the capability across different samples and cohorts due to the small sample size for signature training and discovery. RESULTS: Here, we construct a large-scale dataset integrating multiple host transcriptomic data and analyze it using a sophisticated strategy which removes batch effect and extracts the common information from different cohorts based on the relative expression alteration of gene pairs. We assemble 2680 samples across 16 cohorts and separately build gene pair signature (GPS) for bacterial, viral, and noninfected patients. The three GPSs are further assembled into an antibiotic decision model (bacterial-viral-noninfected GPS, bvnGPS) using multiclass neural networks, which is able to determine whether a patient is bacterial infected, viral infected, or noninfected. bvnGPS can distinguish bacterial infection with area under the receiver operating characteristic curve (AUC) of 0.953 (95% confidence interval, 0.948-0.958) and viral infection with AUC of 0.956 (0.951-0.961) in the test set (N = 760). In the validation set (N = 147), bvnGPS also shows strong performance by attaining an AUC of 0.988 (0.978-0.998) on bacterial-versus-other and an AUC of 0.994 (0.984-1.000) on viral-versus-other. bvnGPS has the potential to be used in clinical practice and the proposed procedure provides insight into data integration, feature selection and multiclass classification for host transcriptomics data. AVAILABILITY AND IMPLEMENTATION: The codes implementing bvnGPS are available at https://github.com/Ritchiegit/bvnGPS. The construction of iPAGE algorithm and the training of neural network was conducted on Python 3.7 with Scikit-learn 0.24.1 and PyTorch 1.7. The visualization of the results was implemented on R 4.2, Python 3.7, and Matplotlib 3.3.4.

Assuntos

Transcriptoma , Viroses , Humanos , Redes Neurais de Computação , Bactérias , Viroses/diagnóstico , Viroses/genética , Inflamação

4.

A network-based algorithm for the identification of moonlighting noncoding RNAs and its application in sepsis.

Liu, Xueyan; Xu, Yong; Wang, Ran; Liu, Sheng; Wang, Jun; Luo, YongLun; Leung, Kwong-Sak; Cheng, Lixin.

Brief Bioinform ; 22(1): 581-588, 2021 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-32003790

RESUMO

Moonlighting proteins provide more options for cells to execute multiple functions without increasing the genome and transcriptome complexity. Although there have long been calls for computational methods for the prediction of moonlighting proteins, no method has been designed for determining moonlighting long noncoding ribonucleicacidz (RNAs) (mlncRNAs). Previously, we developed an algorithm MoonFinder for the identification of mlncRNAs at the genome level based on the functional annotation and interactome data of lncRNAs and proteins. Here, we update MoonFinder to MoonFinder v2.0 by providing an extensive framework for the detection of protein modules and the establishment of RNA-module associations in human. A novel measure, moonlighting coefficient, was also proposed to assess the confidence of an ncRNA acting in a moonlighting manner. Moreover, we explored the expression characteristics of mlncRNAs in sepsis, in which we found that mlncRNAs tend to be upregulated and differentially expressed. Interestingly, the mlncRNAs are mutually exclusive in terms of coexpression when compared to the other lncRNAs. Overall, MoonFinder v2.0 is dedicated to the prediction of human mlncRNAs and thus bears great promise to serve as a valuable R package for worldwide research communities (https://cran.r-project.org/web/packages/MoonFinder/index.html). Also, our analyses provide the first attempt to characterize mlncRNA expression and coexpression properties in adult sepsis patients, which will facilitate the understanding of the interaction and expression patterns of mlncRNAs.

Assuntos

Redes Reguladoras de Genes , Genômica/métodos , RNA Longo não Codificante/genética , Sepse/genética , Humanos , Mapas de Interação de Proteínas , Proteoma/genética , Proteoma/metabolismo , RNA Longo não Codificante/metabolismo , Sepse/metabolismo , Software

5.

Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Li, Hongjian; Lu, Gang; Sze, Kam-Heung; Su, Xianwei; Chan, Wai-Yee; Leung, Kwong-Sak.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34169324

RESUMO

The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.

Assuntos

Benchmarking/métodos , Aprendizado de Máquina , Modelos Estatísticos , Algoritmos , Benchmarking/normas , Bases de Dados Factuais , Ligantes , Modelos Moleculares , Conformação Molecular , Ligação Proteica , Análise de Regressão , Reprodutibilidade dos Testes , Fluxo de Trabalho

6.

meGPS: a multi-omics signature for hepatocellular carcinoma detection integrating methylome and transcriptome data.

Wu, Qiong; Zheng, Xubin; Leung, Kwong-Sak; Wong, Man-Hon; Tsui, Stephen Kwok-Wing; Cheng, Lixin.

Bioinformatics ; 38(14): 3513-3522, 2022 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-35674358

RESUMO

MOTIVATION: Hepatocellular carcinoma (HCC) is a primary malignancy with a poor prognosis. Recently, multi-omics molecular-level measurement enables HCC diagnosis and prognosis prediction, which is crucial for early intervention of personalized therapy to diminish mortality. Here, we introduce a novel strategy utilizing DNA methylation and RNA expression data to achieve a multi-omics gene pair signature (GPS) for HCC discrimination. RESULTS: The immune genes with negative correlations between expression and promoter methylation are enriched in the highly connected cancer-related pathway network, which are considered as the candidates for HCC detection. After that, we separately construct a methylation GPS (mGPS) and an expression GPS (eGPS), and then assemble them as a meGPS with five gene pairs, in which the significant methylation and expression changes occur between HCC tumor and non-tumor groups. Reliable performance has been validated by independent tissue (age, gender and etiology) and blood datasets. This study proposes a procedure for multi-omics GPS identification and develops a novel HCC signature using both methylome and transcriptome data, suggesting potential molecular targets for the detection and therapy of HCC. AVAILABILITY AND IMPLEMENTATION: Models are available at https://github.com/bioinformaticStudy/meGPS.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Transcriptoma , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Epigenoma , Metilação de DNA , Biomarcadores Tumorais/genética , Regulação Neoplásica da Expressão Gênica

7.

Systematic prediction of autophagy-related proteins using Arabidopsis thaliana interactome data.

Cheng, Lixin; Zeng, Yonglun; Hu, Shuai; Zhang, Ning; Cheung, Kenneth C P; Li, Baiying; Leung, Kwong-Sak; Jiang, Liwen.

Plant J ; 105(3): 708-720, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33128829

RESUMO

Autophagy is a self-degradative process that is crucial for maintaining cellular homeostasis by removing damaged cytoplasmic components and recycling nutrients. Such an evolutionary conserved proteolysis process is regulated by the autophagy-related (Atg) proteins. The incomplete understanding of plant autophagy proteome and the importance of a proteome-wide understanding of the autophagy pathway prompted us to predict Atg proteins and regulators in Arabidopsis. Here, we developed a systems-level algorithm to identify autophagy-related modules (ARMs) based on protein subcellular localization, protein-protein interactions, and known Atg proteins. This generates a detailed landscape of the autophagic modules in Arabidopsis. We found that the newly identified genes in each ARM tend to be upregulated and coexpressed during the senescence stage of Arabidopsis. We also demonstrated that the Golgi apparatus ARM, ARM13, functions in the autophagy process by module clustering and functional analysis. To verify the in silico analysis, the Atg candidates in ARM13 that are functionally similar to the core Atg proteins were selected for experimental validation. Interestingly, two of the previously uncharacterized proteins identified from the ARM analysis, AGD1 and Sec14, exhibited bona fide association with the autophagy protein complex in plant cells, which provides evidence for a cross-talk between intracellular pathways and autophagy. Thus, the computational framework has facilitated the identification and characterization of plant-specific autophagy-related proteins and novel autophagy proteins/regulators in higher eukaryotes.

Assuntos

Arabidopsis/metabolismo , Proteínas Relacionadas à Autofagia/metabolismo , Autofagia/fisiologia , Algoritmos , Arabidopsis/citologia , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Família da Proteína 8 Relacionada à Autofagia/genética , Família da Proteína 8 Relacionada à Autofagia/metabolismo , Proteínas Relacionadas à Autofagia/genética , Proteína Beclina-1/genética , Proteína Beclina-1/metabolismo , Biologia Computacional/métodos , Regulação da Expressão Gênica de Plantas , Reprodutibilidade dos Testes

8.

Long non-coding RNA pairs to assist in diagnosing sepsis.

Zheng, Xubin; Leung, Kwong-Sak; Wong, Man-Hon; Cheng, Lixin.

BMC Genomics ; 22(1): 275, 2021 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-33863291

RESUMO

BACKGROUND: Sepsis is the major cause of death in Intensive Care Unit (ICU) globally. Molecular detection enables rapid diagnosis that allows early intervention to minimize the death rate. Recent studies showed that long non-coding RNAs (lncRNAs) regulate proinflammatory genes and are related to the dysfunction of organs in sepsis. Identifying lncRNA signature with absolute abundance is challenging because of the technical variation and the systematic experimental bias. RESULTS: Cohorts (n = 768) containing whole blood lncRNA profiling of sepsis patients in the Gene Expression Omnibus (GEO) database were included. We proposed a novel diagnostic strategy that made use of the relative expressions of lncRNA pairs, which are reversed between sepsis patients and normal controls (eg. lncRNAi > lncRNAj in sepsis patients and lncRNAi < lncRNAj in normal controls), to identify 14 lncRNA pairs as a sepsis diagnostic signature. The signature was then applied to independent cohorts (n = 644) to evaluate its predictive performance across different ages and normalization methods. Comparing to common machine learning models and existing signatures, SepSigLnc consistently attains better performance on the validation cohorts from the same age group (AUC = 0.990 & 0.995 in two cohorts) and across different groups (AUC = 0.878 on average), as well as cohorts processed by an alternative normalization method (AUC = 0.953 on average). Functional analysis demonstrates that the lncRNA pairs in SepsigLnc are functionally similar and tend to implicate in the same biological processes including cell fate commitment and cellular response to steroid hormone stimulus. CONCLUSION: Our study identified 14 lncRNA pairs as signature that can facilitate the diagnosis of septic patients at an intervenable point when clinical manifestations are not dramatic. Also, the computational procedure can be generalized to a standard procedure for discovering diagnostic molecule signatures.

Assuntos

Neoplasias Encefálicas/mortalidade , RNA Longo não Codificante/genética , Sepse/diagnóstico , Estudos de Coortes , Humanos , Sepse/genética

9.

Discovery of vanoxerine dihydrochloride as a CDK2/4/6 triple-inhibitor for the treatment of human hepatocellular carcinoma.

Zhu, Ying; Ke, Kun-Bin; Xia, Zhong-Kun; Li, Hong-Jian; Su, Rong; Dong, Chao; Zhou, Feng-Mei; Wang, Lin; Chen, Rong; Wu, Shi-Guo; Zhao, Hui; Gu, Peng; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Zhang, Jian-Ying; Jiang, Bing-Hua; Qiu, Jian-Ge; Shi, Xi-Nan; Lin, Marie Chia-Mi.

Mol Med ; 27(1): 15, 2021 02 12.

Artigo em Inglês | MEDLINE | ID: mdl-33579185

RESUMO

BACKGROUND: Cyclin-dependent kinases 2/4/6 (CDK2/4/6) play critical roles in cell cycle progression, and their deregulations are hallmarks of hepatocellular carcinoma (HCC). METHODS: We used the combination of computational and experimental approaches to discover a CDK2/4/6 triple-inhibitor from FDA approved small-molecule drugs for the treatment of HCC. RESULTS: We identified vanoxerine dihydrochloride as a new CDK2/4/6 inhibitor, and a strong cytotoxicdrugin human HCC QGY7703 and Huh7 cells (IC50: 3.79 µM for QGY7703and 4.04 µM for Huh7 cells). In QGY7703 and Huh7 cells, vanoxerine dihydrochloride treatment caused G1-arrest, induced apoptosis, and reduced the expressions of CDK2/4/6, cyclin D/E, retinoblastoma protein (Rb), as well as the phosphorylation of CDK2/4/6 and Rb. Drug combination study indicated that vanoxerine dihydrochloride and 5-Fu produced synergistic cytotoxicity in vitro in Huh7 cells. Finally, in vivo study in BALB/C nude mice subcutaneously xenografted with Huh7 cells, vanoxerine dihydrochloride (40 mg/kg, i.p.) injection for 21 days produced significant anti-tumor activity (p < 0.05), which was comparable to that achieved by 5-Fu (10 mg/kg, i.p.), with the combination treatment resulted in synergistic effect. Immunohistochemistry staining of the tumor tissues also revealed significantly reduced expressions of Rb and CDK2/4/6in vanoxerinedihydrochloride treatment group. CONCLUSIONS: The present study isthe first report identifying a new CDK2/4/6 triple inhibitor vanoxerine dihydrochloride, and demonstrated that this drug represents a novel therapeutic strategy for HCC treatment.

Assuntos

Carcinoma Hepatocelular/tratamento farmacológico , Quinase 2 Dependente de Ciclina/antagonistas & inibidores , Quinase 4 Dependente de Ciclina/antagonistas & inibidores , Quinase 6 Dependente de Ciclina/antagonistas & inibidores , Fluoruracila/administração & dosagem , Neoplasias Hepáticas/tratamento farmacológico , Piperazinas/administração & dosagem , Animais , Carcinoma Hepatocelular/metabolismo , Linhagem Celular Tumoral , Proliferação de Células/efeitos dos fármacos , Sobrevivência Celular/efeitos dos fármacos , Quinase 2 Dependente de Ciclina/metabolismo , Quinase 4 Dependente de Ciclina/metabolismo , Quinase 6 Dependente de Ciclina/metabolismo , Regulação para Baixo , Sinergismo Farmacológico , Feminino , Fluoruracila/farmacologia , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Injeções Subcutâneas , Neoplasias Hepáticas/metabolismo , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Nus , Fosforilação/efeitos dos fármacos , Piperazinas/farmacologia , Ensaios Antitumorais Modelo de Xenoenxerto

10.

Probabilistic Contextual and Structural Dependencies Learning in Grammar-Based Genetic Programming.

Wong, Pak-Kan; Wong, Man-Leung; Leung, Kwong-Sak.

Evol Comput ; 29(2): 239-268, 2021 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-33047611

RESUMO

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create suboptimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This article presents Grammar-Based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.

Assuntos

Algoritmos , Redes Neurais de Computação , Teorema de Bayes , Aprendizado de Máquina , Software

11.

Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data.

Li, Hongjian; Peng, Jiangjun; Sidorov, Pavel; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J.

Bioinformatics ; 35(20): 3989-3995, 2019 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-30873528

RESUMO

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Ligantes , Ligação Proteica , Proteínas

12.

Predicting associations among drugs, targets and diseases by tensor decomposition for drug repositioning.

Wang, Ran; Li, Shuai; Cheng, Lixin; Wong, Man Hon; Leung, Kwong Sak.

BMC Bioinformatics ; 20(Suppl 26): 628, 2019 Dec 16.

Artigo em Inglês | MEDLINE | ID: mdl-31839008

RESUMO

BACKGROUND: Development of new drugs is a time-consuming and costly process, and the cost is still increasing in recent years. However, the number of drugs approved by FDA every year per dollar spent on development is declining. Drug repositioning, which aims to find new use of existing drugs, attracts attention of pharmaceutical researchers due to its high efficiency. A variety of computational methods for drug repositioning have been proposed based on machine learning approaches, network-based approaches, matrix decomposition approaches, etc. RESULTS: We propose a novel computational method for drug repositioning. We construct and decompose three-dimensional tensors, which consist of the associations among drugs, targets and diseases, to derive latent factors reflecting the functional patterns of the three kinds of entities. The proposed method outperforms several baseline methods in recovering missing associations. Most of the top predictions are validated by literature search and computational docking. Latent factors are used to cluster the drugs, targets and diseases into functional groups. Topological Data Analysis (TDA) is applied to investigate the properties of the clusters. We find that the latent factors are able to capture the functional patterns and underlying molecular mechanisms of drugs, targets and diseases. In addition, we focus on repurposing drugs for cancer and discover not only new therapeutic use but also adverse effects of the drugs. In the in-depth study of associations among the clusters of drugs, targets and cancer subtypes, we find there exist strong associations between particular clusters. CONCLUSIONS: The proposed method is able to recover missing associations, discover new predictions and uncover functional clusters of drugs, targets and diseases. The clustering of drugs, targets and diseases, as well as the associations among the clusters, provides a new guiding framework for drug repositioning.

Assuntos

Biologia Computacional , Reposicionamento de Medicamentos , Análise por Conglomerados , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Humanos , Aprendizado de Máquina

13.

Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network.

Liu, Pengfei; Li, Hongjian; Li, Shuai; Leung, Kwong-Sak.

BMC Bioinformatics ; 20(1): 408, 2019 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-31357929

RESUMO

BACKGROUND: Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. RESULTS: In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347-9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. CONCLUSIONS: Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test.

Assuntos

Antineoplásicos/uso terapêutico , Aprendizado Profundo , Neoplasias/tratamento farmacológico , Redes Neurais de Computação , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Bases de Dados Factuais , Genômica , Humanos , Concentração Inibidora 50 , Especificidade de Órgãos/efeitos dos fármacos , Fenótipo , Análise de Regressão

14.

Exploiting locational and topological overlap model to identify modules in protein interaction networks.

Cheng, Lixin; Liu, Pengfei; Wang, Dong; Leung, Kwong-Sak.

BMC Bioinformatics ; 20(1): 23, 2019 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-30642247

RESUMO

BACKGROUND: Clustering molecular network is a typical method in system biology, which is effective in predicting protein complexes or functional modules. However, few studies have realized that biological molecules are spatial-temporally regulated to form a dynamic cellular network and only a subset of interactions take place at the same location in cells. RESULTS: In this study, considering the subcellular localization of proteins, we first construct a co-localization human protein interaction network (PIN) and systematically investigate the relationship between subcellular localization and biological functions. After that, we propose a Locational and Topological Overlap Model (LTOM) to preprocess the co-localization PIN to identify functional modules. LTOM requires the topological overlaps, the common partners shared by two proteins, to be annotated in the same localization as the two proteins. We observed the model has better correspondence with the reference protein complexes and shows more relevance to cancers based on both human and yeast datasets and two clustering algorithms, ClusterONE and MCL. CONCLUSION: Taking into consideration of protein localization and topological overlap can improve the performance of module detection from protein interaction networks.

Assuntos

Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas de Neoplasias/metabolismo , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Humanos , Proteínas de Neoplasias/química , Proteínas de Saccharomyces cerevisiae/química

15.

Identification and characterization of moonlighting long non-coding RNAs based on RNA and protein interactome.

Cheng, Lixin; Leung, Kwong-Sak.

Bioinformatics ; 34(20): 3519-3528, 2018 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-29771280

RESUMO

Motivation: Moonlighting proteins are a class of proteins having multiple distinct functions, which play essential roles in a variety of cellular and enzymatic functioning systems. Although there have long been calls for computational algorithms for the identification of moonlighting proteins, research on approaches to identify moonlighting long non-coding RNAs (lncRNAs) has never been undertaken. Here, we introduce a novel methodology, MoonFinder, for the identification of moonlighting lncRNAs. MoonFinder is a statistical algorithm identifying moonlighting lncRNAs without a priori knowledge through the integration of protein interactome, RNA-protein interactions and functional annotation of proteins. Results: We identify 155 moonlighting lncRNA candidates and uncover that they are a distinct class of lncRNAs characterized by specific sequence and cellular localization features. The non-coding genes that transcript moonlighting lncRNAs tend to have shorter but more exons and the moonlighting lncRNAs have a variable localization pattern with a high chance of residing in the cytoplasmic compartment in comparison to the other lncRNAs. Moreover, moonlighting lncRNAs and moonlighting proteins are rather mutually exclusive in terms of both their direct interactions and interacting partners. Our results also shed light on how the moonlighting candidates and their interacting proteins implicated in the formation and development of cancers and other diseases. Availability and implementation: The code implementing MoonFinder is supplied as an R package in the supplementary material. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Proteínas/genética , RNA Longo não Codificante/genética , Éxons , Genômica , Humanos , Análise de Sequência de RNA/métodos

16.

Full Characterization of Localization Diversity in the Human Protein Interactome.

Cheng, Lixin; Fan, Kaili; Huang, Yan; Wang, Dong; Leung, Kwong-Sak.

J Proteome Res ; 16(8): 3019-3029, 2017 08 04.

Artigo em Inglês | MEDLINE | ID: mdl-28707887

RESUMO

Spatial-temporal regulation among proteins forms dynamic networks in cells. Coexistence in common cell compartments can improve biological reliability of the protein-protein interactions. However, this is usually overlooked by most proteomic studies and leads to unrealistic discoveries. In this paper, we systematically characterize the interaction localization diversity in the human protein interactome using the localization coefficient, a novel metric proposed for assessing how diversely the interactions localize among cell compartments. Our analysis reveals the following: (1) the subcellular networks of the nucleus, cytosol, and mitochondrion are dense but the interactions tend to localize in specific cell compartments, whereas the subnetworks of the secretory-pathway, membrane, and extracellular region are sparse but the interactions are diversely localized; (2) the housekeeping proteins tend to appear in multiple compartments, while the tissue-specific proteins present a relatively flat profile of localization breadth; (3) the autophagy proteins tend to diversely localize in multiple compartments, especially those with high connectivity, compared with the apoptosis proteins; (4) the proteins targeted by small-molecule drugs show no preference for compartments, whereas the proteins directed by antibody-based drugs tend to belong to transmembrane regions with a strong diversity. In summary, our analysis provides a comprehensive view of the subcellular localization for interacting proteins, demonstrates that localization diversity is an important feature of protein interactions, and shows its ability to highlight meaningful biological functions.

Assuntos

Compartimento Celular , Mapas de Interação de Proteínas/fisiologia , Proteoma/análise , Frações Subcelulares/química , Humanos , Espaço Intracelular/química , Mapeamento de Interação de Proteínas , Proteômica/métodos , Análise Espaço-Temporal , Frações Subcelulares/fisiologia

17.

Old drug, new indication: Olsalazine sodium reduced serum uric acid levels in mice via inhibiting xanthine oxidoreductase activity.

Niu, Yanfen; Li, Hongjian; Gao, Lihui; Lin, Hua; Kung, Hsiangfu; Lin, Marie Chia-Mi; Leung, Kwong-Sak; Wong, Man-Hon; Xiong, Wenyong; Li, Ling.

J Pharmacol Sci ; 135(3): 114-120, 2017 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-29132796

RESUMO

Hyperuricemia, a long-term purine metabolic disorder, is a well-known risk factor for gout, hypertension and diabetes. In maintaining normal whole-body purine levels, xanthine oxidase (XOD) is a key enzyme in the purine metabolic pathway, as it catalyzes the oxidation of hypoxanthine to xanthine and finally to uric acid. Here we used the protein-ligand docking software idock to virtually screen potential XOD inhibitors from 3167 approved small compounds/drugs. The inhibitory activities of the ten compounds with the highest scores were tested on XOD in vitro. Interestingly, all the ten compounds inhibited the activity of XOD at certain degrees. Particularly, the anti-ulcerative-colitis drug olsalazine sodium demonstrated a great inhibitory activity for XOD (IC50 = 3.4 mg/L). Enzymatic kinetic studies revealed that the drug was a hybrid-type inhibitor of xanthine oxidase. Furthermore, the drug strikingly decreased serum urate levels, serum/hepatic activities of XOD at a dose-dependent manner in vivo. Thus, we demonstrated a successful hunting process of compounds/drugs for hyperuricemia through virtual screening, supporting a potential usage of olsalazine sodium in the treatment of hyperuricemia.

Assuntos

Ácidos Aminossalicílicos/farmacologia , Antiulcerosos/farmacologia , Ácido Úrico/sangue , Xantina Desidrogenase/antagonistas & inibidores , Xantina Desidrogenase/metabolismo , Ácidos Aminossalicílicos/uso terapêutico , Animais , Relação Dose-Resposta a Droga , Avaliação Pré-Clínica de Medicamentos , Hiperuricemia/tratamento farmacológico , Técnicas In Vitro , Masculino , Camundongos , Relação Estrutura-Atividade

18.

ViRBase: a resource for virus-host ncRNA-associated interactions.

Li, Yanhui; Wang, Changliang; Miao, Zhengqiang; Bi, Xiaoman; Wu, Deng; Jin, Nana; Wang, Liqiang; Wu, Hao; Qian, Kun; Li, Chunhua; Zhang, Ting; Zhang, Chunrui; Yi, Ying; Lai, Hongyan; Hu, Yongfei; Cheng, Lixin; Leung, Kwong-Sak; Li, Xiaobo; Zhang, Fengmin; Li, Kongning; Li, Xia; Wang, Dong.

Nucleic Acids Res ; 43(Database issue): D578-82, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25274736

RESUMO

Increasing evidence reveals that diverse non-coding RNAs (ncRNAs) play critically important roles in viral infection. Viruses can use diverse ncRNAs to manipulate both cellular and viral gene expression to establish a host environment conducive to the completion of the viral life cycle. Many host cellular ncRNAs can also directly or indirectly influence viral replication and even target virus genomes. ViRBase (http://www.rna-society.org/virbase) aims to provide the scientific community with a resource for efficient browsing and visualization of virus-host ncRNA-associated interactions and interaction networks in viral infection. The current version of ViRBase documents more than 12,000 viral and cellular ncRNA-associated virus-virus, virus-host, host-virus and host-host interactions involving more than 460 non-redundant ncRNAs and 4400 protein-coding genes from between more than 60 viruses and 20 hosts. Users can query, browse and manipulate these virus-host ncRNA-associated interactions. ViRBase will be of help in uncovering the generic organizing principles of cellular virus-host ncRNA-associated interaction networks in viral infection.

Assuntos

Bases de Dados Genéticas , RNA não Traduzido/metabolismo , Viroses/genética , Viroses/virologia , Sítios de Ligação , Internet , Proteínas/metabolismo , Viroses/metabolismo , Vírus/metabolismo

19.

A Modular Plug-And-Play Sensor System for Urban Air Pollution Monitoring: Design, Implementation and Evaluation.

Yi, Wei-Ying; Leung, Kwong-Sak; Leung, Yee.

Sensors (Basel) ; 18(1)2017 Dec 22.

Artigo em Inglês | MEDLINE | ID: mdl-29271952

RESUMO

Urban air pollution has caused public concern globally because it seriously affects human life. Modern monitoring systems providing pollution information with high spatio-temporal resolution have been developed to identify personal exposures. However, these systems' hardware specifications and configurations are usually fixed according to the applications. They can be inconvenient to maintain, and difficult to reconfigure and expand with respect to sensing capabilities. This paper aims at tackling these issues by adopting the proposed Modular Sensor System (MSS) architecture and Universal Sensor Interface (USI), and modular design in a sensor node. A compact MSS sensor node is implemented and evaluated. It has expandable sensor modules with plug-and-play feature and supports multiple Wireless Sensor Networks (WSNs). Evaluation results show that MSS sensor nodes can easily fit in different scenarios, adapt to reconfigurations dynamically, and detect low concentration air pollution with high energy efficiency and good data accuracy. We anticipate that the efforts on system maintenance, adaptation, and evolution can be significantly reduced when deploying the system in the field.

Assuntos

Poluição do Ar/análise , Computadores , Tecnologia sem Fio

20.

Correcting the impact of docking pose generation error on binding affinity prediction.

Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J.

BMC Bioinformatics ; 17(Suppl 11): 308, 2016 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-28185549

RESUMO

BACKGROUND: Pose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes. RESULTS: Against commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand. CONCLUSIONS: Binding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz .

Assuntos

Simulação de Acoplamento Molecular/normas , Proteínas Nucleares/metabolismo , Pirazinas/metabolismo , Software , Fatores de Transcrição/metabolismo , Humanos , Ligantes , Proteínas Nucleares/química , Ligação Proteica , Conformação Proteica , Pirazinas/química , Fatores de Transcrição/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA