Pesquisa | Prevenção e Controle de Câncer

1.

HGSMDA: miRNA-Disease Association Prediction Based on HyperGCN and Sørensen-Dice Loss.

Chang, Zhenghua; Zhu, Rong; Liu, Jinxing; Shang, Junliang; Dai, Lingyun.

Noncoding RNA ; 10(1)2024 Jan 26.

Artigo em Inglês | MEDLINE | ID: mdl-38392964

RESUMO

Biological research has demonstrated the significance of identifying miRNA-disease associations in the context of disease prevention, diagnosis, and treatment. However, the utilization of experimental approaches involving biological subjects to infer these associations is both costly and inefficient. Consequently, there is a pressing need to devise novel approaches that offer enhanced accuracy and effectiveness. Presently, the predominant methods employed for predicting disease associations rely on Graph Convolutional Network (GCN) techniques. However, the Graph Convolutional Network algorithm, which is locally aggregated, solely incorporates information from the immediate neighboring nodes of a given node at each layer. Consequently, GCN cannot simultaneously aggregate information from multiple nodes. This constraint significantly impacts the predictive efficacy of the model. To tackle this problem, we propose a novel approach, based on HyperGCN and Sørensen-Dice loss (HGSMDA), for predicting associations between miRNAs and diseases. In the initial phase, we developed multiple networks to represent the similarity between miRNAs and diseases and employed GCNs to extract information from diverse perspectives. Subsequently, we draw into HyperGCN to construct a miRNA-disease heteromorphic hypergraph using hypernodes and train GCN on the graph to aggregate information. Finally, we utilized the Sørensen-Dice loss function to evaluate the degree of similarity between the predicted outcomes and the ground truth values, thereby enabling the prediction of associations between miRNAs and diseases. In order to assess the soundness of our methodology, an extensive series of experiments was conducted employing the Human MicroRNA Disease Database (HMDD v3.2) as the dataset. The experimental outcomes unequivocally indicate that HGSMDA exhibits remarkable efficacy when compared to alternative methodologies. Furthermore, the predictive capacity of HGSMDA was corroborated through a case study focused on colon cancer. These findings strongly imply that HGSMDA represents a dependable and valid framework, thereby offering a novel avenue for investigating the intricate association between miRNAs and diseases.

2.

The lower He-sea points playing a significant role in postoperative ileus in colorectal cancer treated with acupuncture: based on machine-learning.

Zhang, Xu; Yang, Wenjing; Shang, Junliang; Dan, Wenchao; Shi, Lin; Tong, Li; Yang, Guowang.

Front Oncol ; 13: 1206196, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37564931

RESUMO

Background: Postoperative ileus (POI) is a common complication following abdominal surgery, which can lead to significant negative impacts on patients' well-being and healthcare costs. However, the efficacy of current treatments is not satisfactory. The purpose of this study was to evaluate the therapeutic effects of acupuncture intervention and explore the regulation of acupoint selection for treating POI in colorectal cancer (CRC) patients. Methods: We searched eight electronic databases to identify randomized controlled trials (RCTs) on acupuncture for POI in CRC and conducted a meta-analysis. Subsequently, we utilized the Apriori algorithm and the Frequent pattern growth algorithm, in conjunction with complex network and cluster analysis, to identify association rules of acupoints. Results: The meta-analysis showed that acupuncture led to significant reductions in time to first defecation (MD=-20.93, 95%CI: -25.35, -16.51; I2 = 93.0%; p < 0.01; n=2805), first flatus (MD=-15.08, 95%CI: -18.39, -11.76; I2 = 96%; p < 0.01; n=3284), and bowel sounds recovery (MD=-10.96, 95%CI: -14.20, -7.72; I2 = 94%; p < 0.01; n=2043). A subgroup analysis revealed that acupuncture not only reduced the duration of POI when administered alongside conventional care but also further expedited the recovery of gut function after colorectal surgery when integrated into the enhanced recovery after surgery (ERAS) pathway. The studies included in the analysis reported no instances of serious adverse events associated with acupuncture. We identified Zusanli (ST36), Shangjuxu (ST37), Neiguan (PC6), Sanyinjiao (SP6), Xiajuxu (ST39), Hegu (LI4), Tianshu (ST25), and Zhongwan (RN12) as primary acupoints for treating POI. Association rule mining suggested potential acupoint combinations including {ST37, ST39}≥{ST36}, {PC6, ST37}≥{ST36}, {SP6, ST37}≥{ST36}, and {ST25, ST37}≥{ST36}. Conclusion: Meta-analysis indicates acupuncture's safety and superior effectiveness over postoperative care alone in facilitating gastrointestinal recovery. Machine-learning approaches highlight the importance of the lower He-sea points, including Zusanli (ST36) and Shangjuxu (ST37), in treating POI in CRC patients. Incorporating additional acupoints such as Neiguan (PC6) (for pain and vomiting) and Sanyinjiao (SP6) (for abdominal distension and poor appetite) can optimize treatment outcomes. These findings offer valuable insights for refining treatment protocols in both clinical and experimental settings, ultimately enhancing patient care.

3.

Network embedding framework for driver gene discovery by combining functional and structural information.

Chu, Xin; Guan, Boxin; Dai, Lingyun; Liu, Jin-Xing; Li, Feng; Shang, Junliang.

BMC Genomics ; 24(1): 426, 2023 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-37516822

RESUMO

Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Estudos de Associação Genética , Aprendizado de Máquina , Mapeamento de Interação de Proteínas

4.

Multi-View Enhanced Tensor Nuclear Norm and Local Constraint Model for Cancer Clustering and Feature Gene Selection.

Qiao, Qian; Yuan, Sha-Sha; Shang, Junliang; Liu, Jin-Xing.

J Comput Biol ; 30(8): 889-899, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37471239

RESUMO

The analysis of cancer data from multi-omics can effectively promote cancer research. The main focus of this article is to cluster cancer samples and identify feature genes to reveal the correlation between cancers and genes, with the primary approach being the analysis of multi-view cancer omics data. Our proposed solution, the Multi-View Enhanced Tensor Nuclear Norm and Local Constraint (MVET-LC) model, aims to utilize the consistency and complementarity of omics data to support biological research. The model is designed to maximize the utilization of multi-view data and incorporates a nuclear norm and local constraint to achieve this goal. The first step involves introducing the concept of enhanced partial sum of tensor nuclear norm, which significantly enhances the flexibility of the tensor nuclear norm. After that, we incorporate total variation regularization into the MVET-LC model to further augment its performance. It enables MVET-LC to make use of the relationship between tensor data structures and sparse data while paying attention to the feature details of the tensor data. To tackle the iterative optimization problem of MVET-LC, the alternating direction method of multipliers is utilized. Through experimental validation, it is demonstrated that our proposed model outperforms other comparison models.

Assuntos

Algoritmos , Neoplasias , Humanos , Neoplasias/genética , Análise por Conglomerados

5.

ETGPDA: identification of piRNA-disease associations based on embedding transformation graph convolutional network.

Meng, Xianghan; Shang, Junliang; Ge, Daohui; Yang, Yi; Zhang, Tongdui; Liu, Jin-Xing.

BMC Genomics ; 24(1): 279, 2023 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-37226081

RESUMO

BACKGROUND: Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS: In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS: Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS: Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.

Assuntos

Doença de Alzheimer , Neoplasias de Cabeça e Pescoço , Humanos , RNA de Interação com Piwi , Doença de Alzheimer/genética , Aprendizagem , Projetos de Pesquisa

6.

DM-MOGA: a multi-objective optimization genetic algorithm for identifying disease modules of non-small cell lung cancer.

Shang, Junliang; Zhu, Xuhui; Sun, Yan; Li, Feng; Kong, Xiangzhen; Liu, Jin-Xing.

BMC Bioinformatics ; 24(1): 13, 2023 Jan 09.

Artigo em Inglês | MEDLINE | ID: mdl-36624376

RESUMO

BACKGROUND: Constructing molecular interaction networks from microarray data and then identifying disease module biomarkers can provide insight into the underlying pathogenic mechanisms of non-small cell lung cancer. A promising approach for identifying disease modules in the network is community detection. RESULTS: In order to identify disease modules from gene co-expression networks, a community detection method is proposed based on multi-objective optimization genetic algorithm with decomposition. The method is named DM-MOGA and possesses two highlights. First, the boundary correction strategy is designed for the modules obtained in the process of local module detection and pre-simplification. Second, during the evolution, we introduce Davies-Bouldin index and clustering coefficient as fitness functions which are improved and migrated to weighted networks. In order to identify modules that are more relevant to diseases, the above strategies are designed to consider the network topology of genes and the strength of connections with other genes at the same time. Experimental results of different gene expression datasets of non-small cell lung cancer demonstrate that the core modules obtained by DM-MOGA are more effective than those obtained by several other advanced module identification methods. CONCLUSIONS: The proposed method identifies disease-relevant modules by optimizing two novel fitness functions to simultaneously consider the local topology of each gene and its connection strength with other genes. The association of the identified core modules with lung cancer has been confirmed by pathway and gene ontology enrichment analysis.

Assuntos

Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/genética , Neoplasias Pulmonares/genética , Redes Reguladoras de Genes , Análise em Microsséries , Algoritmos , Perfilação da Expressão Gênica/métodos

7.

A network-based method for identifying cancer driver genes based on node control centrality.

Li, Feng; Li, Han; Shang, Junliang; Liu, Jin-Xing; Dai, Lingyun; Liu, Xikui; Li, Yan.

Exp Biol Med (Maywood) ; 248(3): 232-241, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36573462

RESUMO

Cancer is one of the major contributors to human mortality and has a serious influence on human survival and health. In biomedical research, the identification of cancer driver genes (cancer drivers for short) is an important task; cancer drivers can promote the progression and generation of cancer. To identify cancer drivers, many methods have been developed. These computational models only identify coding cancer drivers; however, non-coding drivers likewise play significant roles in the progression of cancer. Hence, we propose a Network-based Method for identifying cancer Driver Genes based on node Control Centrality (NMDGCC), which can identify coding and non-coding cancer driver genes. The process of NMDGCC for identifying driver genes mainly includes the following two steps. In the first step, we construct a gene interaction network by using mRNAs and miRNAs expression data in the cancer state. In the second step, the control centrality of the node is used to identify cancer drivers in the constructed network. We use the breast cancer dataset from The Cancer Genome Atlas (TCGA) to verify the effectiveness of NMDGCC. Compared with the existing methods of cancer driver genes identification, NMDGCC has a better performance. NMDGCC also identifies 295 miRNAs as non-coding cancer drivers, of which 158 are related to tumorigenesis of BRCA. We also apply NMDGCC to identify driver genes related to the different breast cancer subtypes. The result shows that NMDGCC detects many cancer drivers of specific cancer subtypes.

Assuntos

Neoplasias da Mama , MicroRNAs , Humanos , Feminino , Oncogenes , Neoplasias da Mama/genética , MicroRNAs/genética , Carcinogênese/genética , Transformação Celular Neoplásica

8.

NESM: a network embedding method for tumor stratification by integrating multi-omics data.

Li, Feng; Sun, Zhensheng; Liu, Jin-Xing; Shang, Junliang; Dai, Lingyun; Liu, Xikui; Li, Yan.

G3 (Bethesda) ; 12(11)2022 11 04.

Artigo em Inglês | MEDLINE | ID: mdl-36124952

RESUMO

Tumor stratification plays an important role in cancer diagnosis and individualized treatment. Recent developments in high-throughput sequencing technologies have produced huge amounts of multi-omics data, making it possible to stratify cancer types using multiple molecular datasets. We introduce a Network Embedding method for tumor Stratification by integrating Multi-omics data. Network Embedding method for tumor Stratification by integrating Multi-omics pregroup the samples, integrate the gene features and somatic mutation corresponding to cancer types within each group to construct patient features, and then integrate all groups to obtain comprehensive patient information. The gene features contain network topology information, because it is extracted by integrating deoxyribonucleic acid methylation, messenger ribonucleic acid expression data, and protein-protein interactions through network embedding method. On the one hand, a supervised learning method Light Gradient Boosting Machine is used to classify cancer types based on patient features. When compared with other 3 methods, Network Embedding method for tumor Stratification by integrating Multi-omics has the highest AUC in most cancer types. The average AUC for stratifying cancer types is 0.91, indicating that the patient features extracted by Network Embedding method for tumor Stratification by integrating Multi-omics are effective for tumor stratification. On the other hand, an unsupervised clustering algorithm Density-Based Spatial Clustering of Applications with Noise is utilized to divide single cancer subtypes. The vast majority of the subtypes identified by Network Embedding method for tumor Stratification by integrating Multi-omics are significantly associated with patient survival.

Assuntos

Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Análise por Conglomerados , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala

9.

NEXGB: A Network Embedding Framework for Anticancer Drug Combination Prediction.

Meng, Fanjie; Li, Feng; Liu, Jin-Xing; Shang, Junliang; Liu, Xikui; Li, Yan.

Int J Mol Sci ; 23(17)2022 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-36077236

RESUMO

Compared to single-drug therapy, drug combinations have shown great potential in cancer treatment. Most of the current methods employ genomic data and chemical information to construct drug-cancer cell line features, but there is still a need to explore methods to combine topological information in the protein interaction network (PPI). Therefore, we propose a network-embedding-based prediction model, NEXGB, which integrates the corresponding protein modules of drug-cancer cell lines with PPI network information. NEXGB extracts the topological features of each protein node in a PPI network by struc2vec. Then, we combine the topological features with the target protein information of drug-cancer cell lines, to generate drug features and cancer cell line features, and utilize extreme gradient boosting (XGBoost) to predict the synergistic relationship between drug combinations and cancer cell lines. We apply our model on two recently developed datasets, the Oncology-Screen dataset (Oncology-Screen) and the large drug combination dataset (DrugCombDB). The experimental results show that NEXGB outperforms five current methods, and it effectively improves the predictive power in discovering relationships between drug combinations and cancer cell lines. This further demonstrates that the network information is valid for detecting combination therapies for cancer and other complex diseases.

Assuntos

Protocolos de Quimioterapia Combinada Antineoplásica , Neoplasias , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Combinação de Medicamentos , Genômica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Mapas de Interação de Proteínas , Proteínas/uso terapêutico

10.

Effects of Multi-Omics Characteristics on Identification of Driver Genes Using Machine Learning Algorithms.

Li, Feng; Chu, Xin; Dai, Lingyun; Wang, Juan; Liu, Jinxing; Shang, Junliang.

Genes (Basel) ; 13(5)2022 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-35627101

RESUMO

Cancer is a complex disease caused by genomic and epigenetic alterations; hence, identifying meaningful cancer drivers is an important and challenging task. Most studies have detected cancer drivers with mutated traits, while few studies consider multiple omics characteristics as important factors. In this study, we present a framework to analyze the effects of multi-omics characteristics on the identification of driver genes. We utilize four machine learning algorithms within this framework to detect cancer driver genes in pan-cancer data, including 75 characteristics among 19,636 genes. The 75 features are divided into four types and analyzed using Kullback-Leibler divergence based on CGC genes and non-CGC genes. We detect cancer driver genes in two different ways. One is to detect driver genes from a single feature type, while the other is from the top N features. The first analysis denotes that the mutational features are the best characteristics. The second analysis reveals that the top 45 features are the most effective feature combinations and superior to the mutational features. The top 45 features not only contain mutational features but also three other types of features. Therefore, our study extends the detection of cancer driver genes and provides a more comprehensive understanding of cancer mechanisms.

Assuntos

Aprendizado de Máquina , Neoplasias , Algoritmos , Genômica , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Oncogenes

11.

NMFNA: A Non-negative Matrix Factorization Network Analysis Method for Identifying Modules and Characteristic Genes of Pancreatic Cancer.

Ding, Qian; Sun, Yan; Shang, Junliang; Li, Feng; Zhang, Yuanyuan; Liu, Jin-Xing.

Front Genet ; 12: 678642, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34367241

RESUMO

Pancreatic cancer (PC) is a highly fatal disease, yet its causes remain unclear. Comprehensive analysis of different types of PC genetic data plays a crucial role in understanding its pathogenic mechanisms. Currently, non-negative matrix factorization (NMF)-based methods are widely used for genetic data analysis. Nevertheless, it is a challenge for them to integrate and decompose different types of genetic data simultaneously. In this paper, a non-NMF network analysis method, NMFNA, is proposed, which introduces a graph-regularized constraint to the NMF, for identifying modules and characteristic genes from two-type PC data of methylation (ME) and copy number variation (CNV). Firstly, three PC networks, i.e., ME network, CNV network, and ME-CNV network, are constructed using the Pearson correlation coefficient (PCC). Then, modules are detected from these three PC networks effectively due to the introduced graph-regularized constraint, which is the highlight of the NMFNA. Finally, both gene ontology (GO) and pathway enrichment analyses are performed, and characteristic genes are detected by the multimeasure score, to deeply understand biological functions of PC core modules. Experimental results demonstrated that the NMFNA facilitates the integration and decomposition of two types of PC data simultaneously and can further serve as an alternative method for detecting modules and characteristic genes from multiple genetic data of complex diseases.

12.

Multiscale part mutual information for quantifying nonlinear direct associations in networks.

Shang, Junliang; Wang, Jing; Sun, Yan; Li, Feng; Liu, Jin-Xing; Zhang, Honghai.

Bioinformatics ; 37(18): 2920-2929, 2021 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-33730153

RESUMO

MOTIVATION: For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. RESULTS: In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. AVAILABILITY AND IMPLEMENTATION: The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Glioblastoma , Software , Humanos

13.

Sparse robust graph-regularized non-negative matrix factorization based on correntropy.

Wang, Chuan-Yuan; Gao, Ying-Lian; Liu, Jin-Xing; Dai, Ling-Yun; Shang, Junliang.

J Bioinform Comput Biol ; 19(1): 2050047, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33410727

RESUMO

Non-negative Matrix Factorization (NMF) is a popular data dimension reduction method in recent years. The traditional NMF method has high sensitivity to data noise. In the paper, we propose a model called Sparse Robust Graph-regularized Non-negative Matrix Factorization based on Correntropy (SGNMFC). The maximized correntropy replaces the traditional minimized Euclidean distance to improve the robustness of the algorithm. Through the kernel function, correntropy can give less weight to outliers and noise in data but give greater weight to meaningful data. Meanwhile, the geometry structure of the high-dimensional data is completely preserved in the low-dimensional manifold through the graph regularization. Feature selection and sample clustering are commonly used methods for analyzing genes. Sparse constraints are applied to the loss function to reduce matrix complexity and analysis difficulty. Comparing the other five similar methods, the effectiveness of the SGNMFC model is proved by selection of differentially expressed genes and sample clustering experiments in three The Cancer Genome Atlas (TCGA) datasets.

Assuntos

Algoritmos , Biologia Computacional/métodos , Expressão Gênica , Neoplasias/genética , Análise por Conglomerados , Gráficos por Computador , Interpretação Estatística de Dados , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Humanos

14.

DSTPCA: Double-Sparse Constrained Tensor Principal Component Analysis Method for Feature Selection.

Hu, Yue; Liu, Jin-Xing; Gao, Ying-Lian; Shang, Junliang.

IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1481-1491, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-31562100

RESUMO

The identification of differentially expressed genes plays an increasingly important role biologically. Therefore, the feature selection approach has attracted much attention in the field of bioinformatics. The most popular method of principal component analysis studies two-dimensional data without considering the spatial geometric structure of the data. The recently proposed tensor robust principal component analysis method performs sparse and low-rank decomposition on three-dimensional tensors and effectively preserves the spatial structure. Based on this approach, the L2,1- norm regularization term is introduced into the DSTPCA (Double-Sparse Constrained Tensor Principal Component Analysis) method. The DSTPCA method removes the redundant noise by double sparse constraints on the objective function to obtain sufficiently sparse results. After the regularization norm is introduced into the model, the ADMM (alternating direction method of multipliers) algorithm is used to solve the optimal problem. In the experiment of feature selection, while the more redundant genes were filtered out, the more genes closely associated with disease were screened. Experimental results using different datasets indicate that our method outperforms other methods.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Componente Principal , Algoritmos , Humanos , Neoplasias/genética , Neoplasias/metabolismo

15.

CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data.

Yuan, Xiguo; Yu, Jiaao; Xi, Jianing; Yang, Liying; Shang, Junliang; Li, Zhe; Duan, Junbo.

IEEE/ACM Trans Comput Biol Bioinform ; 18(2): 539-549, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-31180897

RESUMO

Accurate detection of copy number variations (CNVs) from short-read sequencing data is challenging due to the uneven distribution of reads and the unbalanced amplitudes of gains and losses. The direct use of read depths to measure CNVs tends to limit performance. Thus, robust computational approaches equipped with appropriate statistics are required to detect CNV regions and boundaries. This study proposes a new method called CNV_IFTV to address this need. CNV_IFTV assigns an anomaly score to each genome bin through a collection of isolation trees. The trees are trained based on isolation forest algorithm through conducting subsampling from measured read depths. With the anomaly scores, CNV_IFTV uses a total variation model to smooth adjacent bins, leading to a denoised score profile. Finally, a statistical model is established to test the denoised scores for calling CNVs. CNV_IFTV is tested on both simulated and real data in comparison to several peer methods. The results indicate that the proposed method outperforms the peer methods. CNV_IFTV is a reliable tool for detecting CNVs from short-read sequencing data even for low-level coverage and tumor purity. The detection results on tumor samples can aid to evaluate known cancer genes and to predict target drugs for disease diagnosis.

Assuntos

Algoritmos , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Modelos Estatísticos , Bases de Dados Genéticas , Árvores de Decisões , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

16.

Correntropy induced loss based sparse robust graph regularized extreme learning machine for cancer classification.

Ren, Liang-Rui; Gao, Ying-Lian; Liu, Jin-Xing; Shang, Junliang; Zheng, Chun-Hou.

BMC Bioinformatics ; 21(1): 445, 2020 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-33028187

RESUMO

BACKGROUND: As a machine learning method with high performance and excellent generalization ability, extreme learning machine (ELM) is gaining popularity in various studies. Various ELM-based methods for different fields have been proposed. However, the robustness to noise and outliers is always the main problem affecting the performance of ELM. RESULTS: In this paper, an integrated method named correntropy induced loss based sparse robust graph regularized extreme learning machine (CSRGELM) is proposed. The introduction of correntropy induced loss improves the robustness of ELM and weakens the negative effects of noise and outliers. By using the L2,1-norm to constrain the output weight matrix, we tend to obtain a sparse output weight matrix to construct a simpler single hidden layer feedforward neural network model. By introducing the graph regularization to preserve the local structural information of the data, the classification performance of the new method is further improved. Besides, we design an iterative optimization method based on the idea of half quadratic optimization to solve the non-convex problem of CSRGELM. CONCLUSIONS: The classification results on the benchmark dataset show that CSRGELM can obtain better classification results compared with other methods. More importantly, we also apply the new method to the classification problems of cancer samples and get a good classification effect.

Assuntos

Aprendizado de Máquina , Neoplasias/classificação , Benchmarking , Biologia Computacional/métodos , Bases de Dados Factuais , Humanos , Neoplasias/patologia

17.

IDSSIM: an lncRNA functional similarity calculation model based on an improved disease semantic similarity method.

Fan, Wenwen; Shang, Junliang; Li, Feng; Sun, Yan; Yuan, Shasha; Liu, Jin-Xing.

BMC Bioinformatics ; 21(1): 339, 2020 Jul 31.

Artigo em Inglês | MEDLINE | ID: mdl-32736513

RESUMO

BACKGROUND: It has been widely accepted that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human diseases. Many association prediction models have been proposed for predicting lncRNA functions and identifying potential lncRNA-disease associations. Nevertheless, among them, little effort has been attempted to measure lncRNA functional similarity, which is an essential part of association prediction models. RESULTS: In this study, we presented an lncRNA functional similarity calculation model, IDSSIM for short, based on an improved disease semantic similarity method, highlight of which is the introduction of information content contribution factor into the semantic value calculation to take into account both the hierarchical structures of disease directed acyclic graphs and the disease specificities. IDSSIM and three state-of-the-art models, i.e., LNCSIM1, LNCSIM2, and ILNCSIM, were evaluated by applying their disease semantic similarity matrices and the lncRNA functional similarity matrices, as well as corresponding matrices of human lncRNA-disease associations coming from either lncRNADisease database or MNDR database, into an association prediction method WKNKN for lncRNA-disease association prediction. In addition, case studies of breast cancer and adenocarcinoma were also performed to validate the effectiveness of IDSSIM. CONCLUSIONS: Results demonstrated that in terms of ROC curves and AUC values, IDSSIM is superior to compared models, and can improve accuracy of disease semantic similarity effectively, leading to increase the association prediction ability of the IDSSIM-WKNKN model; in terms of case studies, most of potential disease-associated lncRNAs predicted by IDSSIM can be confirmed by databases and literatures, implying that IDSSIM can serve as a promising tool for predicting lncRNA functions, identifying potential lncRNA-disease associations, and pre-screening candidate lncRNAs to perform biological experiments. The IDSSIM code, all experimental data and prediction results are available online at https://github.com/CDMB-lab/IDSSIM .

Assuntos

Algoritmos , Biologia Computacional/métodos , Doença/genética , Modelos Genéticos , RNA Longo não Codificante/genética , Semântica , Adenocarcinoma/genética , Área Sob a Curva , Neoplasias da Mama/genética , Bases de Dados Genéticas , Feminino , Humanos , Curva ROC

18.

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data.

Yu, Na; Gao, Ying-Lian; Liu, Jin-Xing; Wang, Juan; Shang, Junliang.

Hum Genomics ; 13(Suppl 1): 46, 2019 10 22.

Artigo em Inglês | MEDLINE | ID: mdl-31639067

RESUMO

BACKGROUND: As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. RESULTS: To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. CONCLUSIONS: Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.

Assuntos

Algoritmos , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Análise por Conglomerados , Humanos , Neoplasias/genética

19.

PSO-CFDP: A Particle Swarm Optimization-Based Automatic Density Peaks Clustering Method for Cancer Subtyping.

Zhu, Xuhui; Shang, Junliang; Sun, Yan; Li, Feng; Liu, Jin-Xing; Yuan, Shasha.

Hum Hered ; 84(1): 9-20, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31412348

RESUMO

Cancer subtyping is of great importance for the prediction, diagnosis, and precise treatment of cancer patients. Many clustering methods have been proposed for cancer subtyping. In 2014, a clustering algorithm named Clustering by Fast Search and Find of Density Peaks (CFDP) was proposed and published in Science, which has been applied to cancer subtyping and achieved attractive results. However, CFDP requires to set two key parameters (cluster centers and cutoff distance) manually, while their optimal values are difficult to be determined. To overcome this limitation, an automatic clustering method named PSO-CFDP is proposed in this paper, in which cluster centers and cutoff distance are automatically determined by running an improved particle swarm optimization (PSO) algorithm multiple times. Experiments using PSO-CFDP, as well as LR-CFDP, STClu, CH-CCFDAC, and CFDP, were performed on four benchmark data-sets and two real cancer gene expression datasets. The results show that PSO-CFDP can determine cluster centers and cutoff distance automatically within controllable time/cost and, therefore, improve the accuracy of cancer subtyping.

Assuntos

Algoritmos , Análise por Conglomerados , Neoplasias/classificação , Expressão Gênica , Humanos , Neoplasias/genética

20.

Network Analyses of Integrated Differentially Expressed Genes in Papillary Thyroid Carcinoma to Identify Characteristic Genes.

Shang, Junliang; Ding, Qian; Yuan, Shasha; Liu, Jin-Xing; Li, Feng; Zhang, Honghai.

Genes (Basel) ; 10(1)2019 01 14.

Artigo em Inglês | MEDLINE | ID: mdl-30646607

RESUMO

Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Identifying characteristic genes of PTC are of great importance to reveal its potential genetic mechanisms. In this paper, we proposed a framework, as well as a measure named Normalized Centrality Measure (NCM), to identify characteristic genes of PTC. The framework consisted of four steps. First, both up-regulated genes and down-regulated genes, collectively called differentially expressed genes (DEGs), were screened and integrated together from four datasets, that is, GSE3467, GSE3678, GSE33630, and GSE58545; second, an interaction network of DEGs was constructed, where each node represented a gene and each edge represented an interaction between linking nodes; third, both traditional measures and the NCM measure were used to analyze the topological properties of each node in the network. Compared with traditional measures, more genes related to PTC were identified by the NCM measure; fourth, by mining the high-density subgraphs of this network and performing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, several meaningful results were captured, most of which were demonstrated to be associated with PTC. The experimental results proved that this network framework and the NCM measure are useful for identifying more characteristic genes of PTC.

Assuntos

Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Câncer Papilífero da Tireoide/genética , Neoplasias da Glândula Tireoide/genética , Humanos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA