Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
Más filtros

Bases de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36585783

RESUMEN

The inference of gene regulatory networks (GRNs) is of great importance for understanding the complex regulatory mechanisms within cells. The emergence of single-cell RNA-sequencing (scRNA-seq) technologies enables the measure of gene expression levels for individual cells, which promotes the reconstruction of GRNs at single-cell resolution. However, existing network inference methods are mainly designed for data collected from a single data source, which ignores the information provided by multiple related data sources. In this paper, we propose a multi-view contrastive learning (DeepMCL) model to infer GRNs from scRNA-seq data collected from multiple data sources or time points. We first represent each gene pair as a set of histogram images, and then introduce a deep Siamese convolutional neural network with contrastive loss to learn the low-dimensional embedding for each gene pair. Moreover, an attention mechanism is introduced to integrate the embeddings extracted from different data sources and different neighbor gene pairs. Experimental results on synthetic and real-world datasets validate the effectiveness of our contrastive learning and attention mechanisms, demonstrating the effectiveness of our model in integrating multiple data sources for GRN inference.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Redes Neurales de la Computación , Secuenciación del Exoma , Expresión Génica
2.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38547401

RESUMEN

MOTIVATION: Single-cell clustering plays a crucial role in distinguishing between cell types, facilitating the analysis of cell heterogeneity mechanisms. While many existing clustering methods rely solely on gene expression data obtained from single-cell RNA sequencing techniques to identify cell clusters, the information contained in mono-omic data is often limited, leading to suboptimal clustering performance. The emergence of single-cell multi-omics sequencing technologies enables the integration of multiple omics data for identifying cell clusters, but how to integrate different omics data effectively remains challenging. In addition, designing a clustering method that performs well across various types of multi-omics data poses a persistent challenge due to the data's inherent characteristics. RESULTS: In this paper, we propose a graph-regularized multi-view ensemble clustering (GRMEC-SC) model for single-cell clustering. Our proposed approach can adaptively integrate multiple omics data and leverage insights from multiple base clustering results. We extensively evaluate our method on five multi-omics datasets through a series of rigorous experiments. The results of these experiments demonstrate that our GRMEC-SC model achieves competitive performance across diverse multi-omics datasets with varying characteristics. AVAILABILITY AND IMPLEMENTATION: Implementation of GRMEC-SC, along with examples, can be found on the GitHub repository: https://github.com/polarisChen/GRMEC-SC.


Asunto(s)
Aprendizaje Automático , Multiómica , Análisis por Conglomerados , Análisis de la Célula Individual , Algoritmos
3.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38426338

RESUMEN

MOTIVATION: Retrosynthesis is a critical task in drug discovery, aimed at finding a viable pathway for synthesizing a given target molecule. Many existing approaches frame this task as a graph-generating problem. Specifically, these methods first identify the reaction center, and break a targeted molecule accordingly to generate the synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or by directly adding appropriate leaving groups. However, both of these strategies have limitations. Adding atoms results in a long prediction sequence that increases the complexity of generation, while adding leaving groups only considers those in the training set, which leads to poor generalization. RESULTS: In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Given that chemically meaningful motifs fall between the size of atoms and leaving groups, our model achieves lower prediction complexity than adding atoms and demonstrates superior performance than adding leaving groups. We evaluate our proposed model on a benchmark dataset and show that it significantly outperforms previous state-of-the-art models. Furthermore, we conduct ablation studies to investigate the contribution of each component of our proposed model to the overall performance on benchmark datasets. Experiment results demonstrate the effectiveness of our model in predicting retrosynthesis pathways and suggest its potential as a valuable tool in drug discovery. AVAILABILITY AND IMPLEMENTATION: All code and data are available at https://github.com/szu-ljh2020/MARS.


Asunto(s)
Benchmarking , Descubrimiento de Drogas , Sistemas de Lectura
4.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36047285

RESUMEN

Advances in single-cell RNA sequencing (scRNA-seq) technologies has provided an unprecedent opportunity for cell-type identification. As clustering is an effective strategy towards cell-type identification, various computational approaches have been proposed for clustering scRNA-seq data. Recently, with the emergence of cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), the cell surface expression of specific proteins and the RNA expression on the same cell can be captured, which provides more comprehensive information for cell analysis. However, existing single cell clustering algorithms are mainly designed for single-omic data, and have difficulties in handling multi-omics data with diverse characteristics efficiently. In this study, we propose a novel deep embedded multi-omics clustering with collaborative training (DEMOC) model to perform joint clustering on CITE-seq data. Our model can take into account the characteristics of transcriptomic and proteomic data, and make use of the consistent and complementary information provided by different data sources effectively. Experiment results on two real CITE-seq datasets demonstrate that our DEMOC model not only outperforms state-of-the-art single-omic clustering methods, but also achieves better and more stable performance than existing multi-omics clustering methods. We also apply our model on three scRNA-seq datasets to assess the performance of our model in rare cell-type identification, novel cell-subtype detection and cellular heterogeneity analysis. Experiment results illustrate the effectiveness of our model in discovering the underlying patterns of data.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Algoritmos , Análisis por Conglomerados , Epítopos , Perfilación de la Expresión Génica/métodos , Proteómica , ARN , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
5.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34864871

RESUMEN

Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.


Asunto(s)
Análisis de Datos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
6.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34571530

RESUMEN

The identification of differentially expressed genes between different cell groups is a crucial step in analyzing single-cell RNA-sequencing (scRNA-seq) data. Even though various differential expression analysis methods for scRNA-seq data have been proposed based on different model assumptions and strategies recently, the differentially expressed genes identified by them are quite different from each other, and the performances of them depend on the underlying data structures. In this paper, we propose a new ensemble learning-based differential expression analysis method, scDEA, to produce a more stable and accurate result. scDEA integrates the P-values obtained from 12 individual differential expression analysis methods for each gene using a P-value combination method. Comprehensive experiments show that scDEA outperforms the state-of-the-art individual methods with different experimental settings and evaluation metrics. We expect that scDEA will serve a wide range of users, including biologists, bioinformaticians and data scientists, who need to detect differentially expressed genes in scRNA-seq data.


Asunto(s)
ARN , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Aprendizaje Automático , ARN/genética , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
7.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33975339

RESUMEN

The mechanisms controlling biological process, such as the development of disease or cell differentiation, can be investigated by examining changes in the networks of gene dependencies between states in the process. High-throughput experimental methods, like microarray and RNA sequencing, have been widely used to gather gene expression data, which paves the way to infer gene dependencies based on computational methods. However, most differential network analysis methods are designed to deal with fully observed data, but missing values, such as the dropout events in single-cell RNA-sequencing data, are frequent. New methods are needed to take account of these missing values. Moreover, since the changes of gene dependencies may be driven by certain perturbed genes, considering the changes in gene expression levels may promote the identification of gene network rewiring. In this study, a novel weighted differential network estimation (WDNE) model is proposed to handle multi-platform gene expression data with missing values and take account of changes in gene expression levels. Simulation studies demonstrate that WDNE outperforms state-of-the-art differential network estimation methods. When applied WDNE to infer differential gene networks associated with drug resistance in ovarian tumors, cell differentiation and breast tumor heterogeneity, the hub genes in the estimated differential gene networks can provide important insights into the underlying mechanisms. Furthermore, a Matlab toolbox, differential network analysis toolbox, was developed to implement the WDNE model and visualize the estimated differential networks.


Asunto(s)
Algoritmos , Neoplasias de la Mama , Resistencia a Antineoplásicos/genética , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Modelos Genéticos , Neoplasias Ováricas , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Perfilación de la Expresión Génica , Humanos , Neoplasias Ováricas/genética , Neoplasias Ováricas/metabolismo
8.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33276376

RESUMEN

Disease-gene association through genome-wide association study (GWAS) is an arduous task for researchers. Investigating single nucleotide polymorphisms that correlate with specific diseases needs statistical analysis of associations. Considering the huge number of possible mutations, in addition to its high cost, another important drawback of GWAS analysis is the large number of false positives. Thus, researchers search for more evidence to cross-check their results through different sources. To provide the researchers with alternative and complementary low-cost disease-gene association evidence, computational approaches come into play. Since molecular networks are able to capture complex interplay among molecules in diseases, they become one of the most extensively used data for disease-gene association prediction. In this survey, we aim to provide a comprehensive and up-to-date review of network-based methods for disease gene prediction. We also conduct an empirical analysis on 14 state-of-the-art methods. To summarize, we first elucidate the task definition for disease gene prediction. Secondly, we categorize existing network-based efforts into network diffusion methods, traditional machine learning methods with handcrafted graph features and graph representation learning methods. Thirdly, an empirical analysis is conducted to evaluate the performance of the selected methods across seven diseases. We also provide distinguishing findings about the discussed methods based on our empirical analysis. Finally, we highlight potential research directions for future studies on disease gene prediction.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Enfermedad/genética , Aprendizaje Automático , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo , Humanos
9.
Bioinformatics ; 37(23): 4414-4423, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34245246

RESUMEN

MOTIVATION: Differential network analysis is an important tool to investigate the rewiring of gene interactions under different conditions. Several computational methods have been developed to estimate differential networks from gene expression data, but most of them do not consider that gene network rewiring may be driven by the differential expression of individual genes. New differential network analysis methods that simultaneously take account of the changes in gene interactions and changes in expression levels are needed. RESULTS: : In this article, we propose a differential network analysis method that considers the differential expression of individual genes when identifying differential edges. First, two hypothesis test statistics are used to quantify changes in partial correlations between gene pairs and changes in expression levels for individual genes. Then, an optimization framework is proposed to combine the two test statistics so that the resulting differential network has a hierarchical property, where a differential edge can be considered only if at least one of the two involved genes is differentially expressed. Simulation results indicate that our method outperforms current state-of-the-art methods. We apply our method to identify the differential networks between the luminal A and basal-like subtypes of breast cancer and those between acute myeloid leukemia and normal samples. Hub nodes in the differential networks estimated by our method, including both differentially and nondifferentially expressed genes, have important biological functions. AVAILABILITY AND IMPLEMENTATION: All the datasets underlying this article are publicly available. Processed data and source code can be accessed through the Github repository at https://github.com/Zhangxf-ccnu/chNet. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama , Programas Informáticos , Humanos , Femenino , Simulación por Computador , Redes Reguladoras de Genes , Neoplasias de la Mama/genética , Expresión Génica
10.
Bioinformatics ; 36(10): 3131-3138, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32073600

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) methods make it possible to reveal gene expression patterns at single-cell resolution. Due to technical defects, dropout events in scRNA-seq will add noise to the gene-cell expression matrix and hinder downstream analysis. Therefore, it is important for recovering the true gene expression levels before carrying out downstream analysis. RESULTS: In this article, we develop an imputation method, called scTSSR, to recover gene expression for scRNA-seq. Unlike most existing methods that impute dropout events by borrowing information across only genes or cells, scTSSR simultaneously leverages information from both similar genes and similar cells using a two-side sparse self-representation model. We demonstrate that scTSSR can effectively capture the Gini coefficients of genes and gene-to-gene correlations observed in single-molecule RNA fluorescence in situ hybridization (smRNA FISH). Down-sampling experiments indicate that scTSSR performs better than existing methods in recovering the true gene expression levels. We also show that scTSSR has a competitive performance in differential expression analysis, cell clustering and cell trajectory inference. AVAILABILITY AND IMPLEMENTATION: The R package is available at https://github.com/Zhangxf-ccnu/scTSSR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Hibridación Fluorescente in Situ , Análisis de Secuencia de ARN , Análisis de la Célula Individual
11.
Bioinformatics ; 36(9): 2755-2762, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31971577

RESUMEN

MOTIVATION: Reconstruction of cancer gene networks from gene expression data is important for understanding the mechanisms underlying human cancer. Due to heterogeneity, the tumor tissue samples for a single cancer type can be divided into multiple distinct subtypes (inter-tumor heterogeneity) and are composed of non-cancerous and cancerous cells (intra-tumor heterogeneity). If tumor heterogeneity is ignored when inferring gene networks, the edges specific to individual cancer subtypes and cell types cannot be characterized. However, most existing network reconstruction methods do not simultaneously take inter-tumor and intra-tumor heterogeneity into account. RESULTS: In this article, we propose a new Gaussian graphical model-based method for jointly estimating multiple cancer gene networks by simultaneously capturing inter-tumor and intra-tumor heterogeneity. Given gene expression data of heterogeneous samples for different cancer subtypes, a non-cancerous network shared across different cancer subtypes and multiple subtype-specific cancerous networks are estimated jointly. Tumor heterogeneity can be revealed by the difference in the estimated networks. The performance of our method is first evaluated using simulated data, and the results indicate that our method outperforms other state-of-the-art methods. We also apply our method to The Cancer Genome Atlas breast cancer data to reconstruct non-cancerous and subtype-specific cancerous gene networks. Hub nodes in the networks estimated by our method perform important biological functions associated with breast cancer development and subtype classification. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/Zhangxf-ccnu/NETI2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama , Redes Reguladoras de Genes , Genoma , Humanos , Distribución Normal , Programas Informáticos
12.
Bioinformatics ; 36(11): 3474-3481, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32145009

RESUMEN

MOTIVATION: Predicting potential links in biomedical bipartite networks can provide useful insights into the diagnosis and treatment of complex diseases and the discovery of novel drug targets. Computational methods have been proposed recently to predict potential links for various biomedical bipartite networks. However, existing methods are usually rely on the coverage of known links, which may encounter difficulties when dealing with new nodes without any known link information. RESULTS: In this study, we propose a new link prediction method, named graph regularized generalized matrix factorization (GRGMF), to identify potential links in biomedical bipartite networks. First, we formulate a generalized matrix factorization model to exploit the latent patterns behind observed links. In particular, it can take into account the neighborhood information of each node when learning the latent representation for each node, and the neighborhood information of each node can be learned adaptively. Second, we introduce two graph regularization terms to draw support from affinity information of each node derived from external databases to enhance the learning of latent representations. We conduct extensive experiments on six real datasets. Experiment results show that GRGMF can achieve competitive performance on all these datasets, which demonstrate the effectiveness of GRGMF in prediction potential links in biomedical bipartite networks. AVAILABILITY AND IMPLEMENTATION: The package is available at https://github.com/happyalfred2016/GRGMF. CONTACT: leouyang@szu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

13.
Bioinformatics ; 35(17): 3184-3186, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30689728

RESUMEN

SUMMARY: To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. AVAILABILITY AND IMPLEMENTATION: R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Expresión Génica , Distribución Normal
14.
Bioinformatics ; 35(22): 4827-4829, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31125056

RESUMEN

SUMMARY: Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. AVAILABILITY AND IMPLEMENTATION: The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia de ARN , Análisis de la Célula Individual , ARN , Programas Informáticos , Secuenciación del Exoma
15.
BMC Bioinformatics ; 20(Suppl 19): 657, 2019 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-31870274

RESUMEN

BACKGROUND: Synthetic lethality has attracted a lot of attentions in cancer therapeutics due to its utility in identifying new anticancer drug targets. Identifying synthetic lethal (SL) interactions is the key step towards the exploration of synthetic lethality in cancer treatment. However, biological experiments are faced with many challenges when identifying synthetic lethal interactions. Thus, it is necessary to develop computational methods which could serve as useful complements to biological experiments. RESULTS: In this paper, we propose a novel graph regularized self-representative matrix factorization (GRSMF) algorithm for synthetic lethal interaction prediction. GRSMF first learns the self-representations from the known SL interactions and further integrates the functional similarities among genes derived from Gene Ontology (GO). It can then effectively predict potential SL interactions by leveraging the information provided by known SL interactions and functional annotations of genes. Extensive experiments on the synthetic lethal interaction data downloaded from SynLethDB database demonstrate the superiority of our GRSMF in predicting potential synthetic lethal interactions, compared with other competing methods. Moreover, case studies of novel interactions are conducted in this paper for further evaluating the effectiveness of GRSMF in synthetic lethal interaction prediction. CONCLUSIONS: In this paper, we demonstrate that by adaptively exploiting the self-representation of original SL interaction data, and utilizing functional similarities among genes to enhance the learning of self-representation matrix, our GRSMF could predict potential SL interactions more accurately than other state-of-the-art SL interaction prediction methods.


Asunto(s)
Neoplasias/genética , Algoritmos , Antineoplásicos/uso terapéutico , Humanos , Neoplasias/tratamiento farmacológico
16.
Bioinformatics ; 34(9): 1571-1573, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29309511

RESUMEN

Summary: We develop DiffGraph, an R package that integrates four influential differential graphical models for identifying gene network rewiring under two different conditions from gene expression data. The input and output of different models are packaged in the same format, making it convenient for users to compare different models using a wide range of datasets and carry out follow-up analysis. Furthermore, the inferred differential networks can be visualized both non-interactively and interactively. The package is useful for identifying gene network rewiring from input datasets, comparing the predictions of different methods and visualizing the results. Availability and implementation: The package is available at https://github.com/Zhangxf-ccnu/DiffGraph. Contact: leouyang@szu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Visualización de Datos , Redes Reguladoras de Genes , Modelos Genéticos , Programas Informáticos
17.
Bioinformatics ; 34(7): 1099-1107, 2018 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-29126180

RESUMEN

Motivation: The identification of repetitive elements is important in genome assembly and phylogenetic analyses. The existing de novo repeat identification methods exploiting the use of short reads are impotent in identifying long repeats. Since long reads are more likely to cover repeat regions completely, using long reads is more favorable for recognizing long repeats. Results: In this study, we propose a novel de novo repeat elements identification method namely RepLong based on PacBio long reads. Given that the reads mapped to the repeat regions are highly overlapped with each other, the identification of repeat elements is equivalent to the discovery of consensus overlaps between reads, which can be further cast into a community detection problem in the network of read overlaps. In RepLong, we first construct a network of read overlaps based on pair-wise alignment of the reads, where each vertex indicates a read and an edge indicates a substantial overlap between the corresponding two reads. Secondly, the communities whose intra connectivity is greater than the inter connectivity are extracted based on network modularity optimization. Finally, representative reads in each community are extracted to form the repeat library. Comparison studies on Drosophila melanogaster and human long read sequencing data with genome-based and short-read-based methods demonstrate the efficiency of RepLong in identifying long repeats. RepLong can handle lower coverage data and serve as a complementary solution to the existing methods to promote the repeat identification performance on long-read sequencing data. Availability and implementation: The software of RepLong is freely available at https://github.com/ruiguo-bio/replong. Contact: ywsun@szu.edu.cn or zhuzx@szu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Filogenia , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Animales , Drosophila melanogaster/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
18.
Bioinformatics ; 33(16): 2436-2445, 2017 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-28407042

RESUMEN

MOTIVATION: Understanding how gene regulatory networks change under different cellular states is important for revealing insights into network dynamics. Gaussian graphical models, which assume that the data follow a joint normal distribution, have been used recently to infer differential networks. However, the distributions of the omics data are non-normal in general. Furthermore, although much biological knowledge (or prior information) has been accumulated, most existing methods ignore the valuable prior information. Therefore, new statistical methods are needed to relax the normality assumption and make full use of prior information. RESULTS: We propose a new differential network analysis method to address the above challenges. Instead of using Gaussian graphical models, we employ a non-paranormal graphical model that can relax the normality assumption. We develop a principled model to take into account the following prior information: (i) a differential edge less likely exists between two genes that do not participate together in the same pathway; (ii) changes in the networks are driven by certain regulator genes that are perturbed across different cellular states and (iii) the differential networks estimated from multi-view gene expression data likely share common structures. Simulation studies demonstrate that our method outperforms other graphical model-based algorithms. We apply our method to identify the differential networks between platinum-sensitive and platinum-resistant ovarian tumors, and the differential networks between the proneural and mesenchymal subtypes of glioblastoma. Hub nodes in the estimated differential networks rediscover known cancer-related regulator genes and contain interesting predictions. AVAILABILITY AND IMPLEMENTATION: The source code is at https://github.com/Zhangxf-ccnu/pDNA. CONTACT: szuouyl@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Programas Informáticos , Algoritmos , Simulación por Computador , Femenino , Glioblastoma/genética , Humanos , Distribución Normal , Neoplasias Ováricas/genética
19.
Methods ; 129: 41-49, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-28579401

RESUMEN

Recovering gene regulatory networks and exploring the network rewiring between two different disease states are important for revealing the mechanisms behind disease progression. The advent of high-throughput experimental techniques has enabled the possibility of inferring gene regulatory networks and differential networks using computational methods. However, most of existing differential network analysis methods are designed for single-platform data analysis and assume that differences between networks are driven by individual edges. Therefore, they cannot take into account the common information shared across different data platforms and may fail in identifying driver genes that lead to the change of network. In this study, we develop a node-based multi-view differential network analysis model to simultaneously estimate multiple gene regulatory networks and their differences from multi-platform gene expression data. Our model can leverage the strength across multiple data platforms to improve the accuracy of network inference and differential network estimation. Simulation studies demonstrate that our model can obtain more accurate estimations of gene regulatory networks and differential networks than other existing state-of-the-art models. We apply our model on TCGA ovarian cancer samples to identify network rewiring associated with drug resistance. We observe from our experiments that the hub nodes of our identified differential networks include known drug resistance-related genes and potential targets that are useful to improve the treatment of drug resistant tumors.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Expresión Génica/genética , Redes Reguladoras de Genes/genética , Algoritmos , Humanos , Modelos Genéticos
20.
BMC Bioinformatics ; 18(Suppl 13): 463, 2017 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-29219066

RESUMEN

BACKGROUND: The accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks. RESULTS: In this paper, we introduce a novel multi-network clustering algorithm to detect protein complexes from multiple heterogeneous networks. Unlike existing protein complex identification algorithms that focus on the analysis of a single PPI network, our model can jointly exploit the information inherent in PPI and DDI data to achieve more reliable prediction results. Extensive experiment results on real-world data sets demonstrate that our method can predict protein complexes more accurately than other state-of-the-art protein complex identification algorithms. CONCLUSIONS: In this work, we demonstrate that the joint analysis of PPI network and DDI network can help to improve the accuracy of protein complex detection.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Análisis por Conglomerados , Mapas de Interacción de Proteínas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA