Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 99
Filter
1.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975895

ABSTRACT

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.


Subject(s)
Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods , Computational Biology/methods , Algorithms , Humans , Animals , Software , Machine Learning
2.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38221904

ABSTRACT

Identifying the binding affinity between a drug and its target is essential in drug discovery and repurposing. Numerous computational approaches have been proposed for understanding these interactions. However, most existing methods only utilize either the molecular structure information of drugs and targets or the interaction information of drug-target bipartite networks. They may fail to combine the molecule-scale and network-scale features to obtain high-quality representations. In this study, we propose CSCo-DTA, a novel cross-scale graph contrastive learning approach for drug-target binding affinity prediction. The proposed model combines features learned from the molecular scale and the network scale to capture information from both local and global perspectives. We conducted experiments on two benchmark datasets, and the proposed model outperformed existing state-of-art methods. The ablation experiment demonstrated the significance and efficacy of multi-scale features and cross-scale contrastive learning modules in improving the prediction performance. Moreover, we applied the CSCo-DTA to predict the novel potential targets for Erlotinib and validated the predicted targets with the molecular docking analysis.


Subject(s)
Benchmarking , Learning , Molecular Docking Simulation , Drug Delivery Systems , Drug Discovery
3.
Brief Bioinform ; 24(2)2023 03 19.
Article in English | MEDLINE | ID: mdl-36847701

ABSTRACT

Emerging studies have shown that circular RNAs (circRNAs) are involved in a variety of biological processes and play a key role in disease diagnosing, treating and inferring. Although many methods, including traditional machine learning and deep learning, have been developed to predict associations between circRNAs and diseases, the biological function of circRNAs has not been fully exploited. Some methods have explored disease-related circRNAs based on different views, but how to efficiently use the multi-view data about circRNA is still not well studied. Therefore, we propose a computational model to predict potential circRNA-disease associations based on collaborative learning with circRNA multi-view functional annotations. First, we extract circRNA multi-view functional annotations and build circRNA association networks, respectively, to enable effective network fusion. Then, a collaborative deep learning framework for multi-view information is designed to get circRNA multi-source information features, which can make full use of the internal relationship among circRNA multi-view information. We build a network consisting of circRNAs and diseases by their functional similarity and extract the consistency description information of circRNAs and diseases. Last, we predict potential associations between circRNAs and diseases based on graph auto encoder. Our computational model has better performance in predicting candidate disease-related circRNAs than the existing ones. Furthermore, it shows the high practicability of the method that we use several common diseases as case studies to find some unknown circRNAs related to them. The experiments show that CLCDA can efficiently predict disease-related circRNAs and are helpful for the diagnosis and treatment of human disease.


Subject(s)
Deep Learning , Interdisciplinary Placement , Humans , RNA, Circular/genetics , Machine Learning , Computational Biology/methods
4.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37903416

ABSTRACT

The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the identification of cell types and the study of cellular states at a single-cell level. Despite its significant potential, scRNA-seq data analysis is plagued by the issue of missing values. Many existing imputation methods rely on simplistic data distribution assumptions while ignoring the intrinsic gene expression distribution specific to cells. This work presents a novel deep-learning model, named scMultiGAN, for scRNA-seq imputation, which utilizes multiple collaborative generative adversarial networks (GAN). Unlike traditional GAN-based imputation methods that generate missing values based on random noises, scMultiGAN employs a two-stage training process and utilizes multiple GANs to achieve cell-specific imputation. Experimental results show the efficacy of scMultiGAN in imputation accuracy, cell clustering, differential gene expression analysis and trajectory analysis, significantly outperforming existing state-of-the-art techniques. Additionally, scMultiGAN is scalable to large scRNA-seq datasets and consistently performs well across sequencing platforms. The scMultiGAN code is freely available at https://github.com/Galaxy8172/scMultiGAN.


Subject(s)
Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Cluster Analysis , Exome Sequencing , Data Analysis , Sequence Analysis, RNA , Gene Expression Profiling
5.
Bioinformatics ; 40(2)2024 02 01.
Article in English | MEDLINE | ID: mdl-38290765

ABSTRACT

SUMMARY: Single-cell multi-omics technologies provide a unique platform for characterizing cell states and reconstructing developmental process by simultaneously quantifying and integrating molecular signatures across various modalities, including genome, transcriptome, epigenome, and other omics layers. However, there is still an urgent unmet need for novel computational tools in this nascent field, which are critical for both effective and efficient interrogation of functionality across different omics modalities. Scbean represents a user-friendly Python library, designed to seamlessly incorporate a diverse array of models for the examination of single-cell data, encompassing both paired and unpaired multi-omics data. The library offers uniform and straightforward interfaces for tasks, such as dimensionality reduction, batch effect elimination, cell label transfer from well-annotated scRNA-seq data to scATAC-seq data, and the identification of spatially variable genes. Moreover, Scbean's models are engineered to harness the computational power of GPU acceleration through Tensorflow, rendering them capable of effortlessly handling datasets comprising millions of cells. AVAILABILITY AND IMPLEMENTATION: Scbean is released on the Python Package Index (PyPI) (https://pypi.org/project/scbean/) and GitHub (https://github.com/jhu99/scbean) under the MIT license. The documentation and example code can be found at https://scbean.readthedocs.io/en/latest/.


Subject(s)
Multiomics , Software , Genome , Transcriptome , Single-Cell Analysis , Data Analysis
6.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34585247

ABSTRACT

Single-cell technologies provide us new ways to profile transcriptomic landscape, chromatin accessibility, spatial expression patterns in heterogeneous tissues at the resolution of single cell. With enormous generated single-cell datasets, a key analytic challenge is to integrate these datasets to gain biological insights into cellular compositions. Here, we developed a domain-adversarial and variational approximation, DAVAE, which can integrate multiple single-cell datasets across samples, technologies and modalities with a single strategy. Besides, DAVAE can also integrate paired data of ATAC profile and transcriptome profile that are simultaneously measured from a same cell. With a mini-batch stochastic gradient descent strategy, it is scalable for large-scale data and can be accelerated by GPUs. Results on seven real data integration applications demonstrated the effectiveness and scalability of DAVAE in batch-effect removing, transfer learning and cell-type predictions for multiple single-cell datasets across samples, technologies and modalities. Availability: DAVAE has been implemented in a toolkit package "scbean" in the pypi repository, and the source code can be also freely accessible at https://github.com/jhu99/scbean. All our data and source code for reproducing the results of this paper can be accessible at https://github.com/jhu99/davae_paper.


Subject(s)
Single-Cell Analysis , Software , Algorithms , Chromatin , Transcriptome
7.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36305456

ABSTRACT

Long non-coding RNAs (lncRNAs) can disrupt the biological functions of protein-coding genes (PCGs) to cause cancer. However, the relationship between lncRNAs and PCGs remains unclear and difficult to predict. Machine learning has achieved a satisfactory performance in association prediction, but to our knowledge, it is currently less used in lncRNA-PCG association prediction. Therefore, we introduce GAE-LGA, a powerful deep learning model with graph autoencoders as components, to recognize potential lncRNA-PCG associations. GAE-LGA jointly explored lncRNA-PCG learning and cross-omics correlation learning for effective lncRNA-PCG association identification. The functional similarity and multi-omics similarity of lncRNAs and PCGs were accumulated and encoded by graph autoencoders to extract feature representations of lncRNAs and PCGs, which were subsequently used for decoding to obtain candidate lncRNA-PCG pairs. Comprehensive evaluation demonstrated that GAE-LGA can successfully capture lncRNA-PCG associations with strong robustness and outperformed other machine learning-based identification methods. Furthermore, multi-omics features were shown to improve the performance of lncRNA-PCG association identification. In conclusion, GAE-LGA can act as an efficient application for lncRNA-PCG association prediction with the following advantages: It fuses multi-omics information into the similarity network, making the feature representation more accurate; it can predict lncRNA-PCG associations for new lncRNAs and identify potential lncRNA-PCG associations with high accuracy.


Subject(s)
Neoplasms , RNA, Long Noncoding , Humans , Computational Biology/methods , Machine Learning , Neoplasms/genetics , RNA, Long Noncoding/genetics , Proteins/genetics
8.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34727570

ABSTRACT

Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.


Subject(s)
Alzheimer Disease , Connectome , Depressive Disorder, Major , Brain/diagnostic imaging , Connectome/methods , Humans , Magnetic Resonance Imaging/methods
9.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34545927

ABSTRACT

Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Genome-Wide Association Study/methods , Genotype , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide , Sample Size
10.
Brief Bioinform ; 23(6)2022 11 19.
Article in English | MEDLINE | ID: mdl-36151714

ABSTRACT

The three-dimensional genome structure plays a key role in cellular function and gene regulation. Single-cell Hi-C (high-resolution chromosome conformation capture) technology can capture genome structure information at the cell level, which provides the opportunity to study how genome structure varies among different cell types. Recently, a few methods are well designed for single-cell Hi-C clustering. In this manuscript, we perform an in-depth benchmark study of available single-cell Hi-C data clustering methods to implement an evaluation system for multiple clustering frameworks based on both human and mouse datasets. We compare eight methods in terms of visualization and clustering performance. Performance is evaluated using four benchmark metrics including adjusted rand index, normalized mutual information, homogeneity and Fowlkes-Mallows index. Furthermore, we also evaluate the eight methods for the task of separating cells at different stages of the cell cycle based on single-cell Hi-C data.


Subject(s)
Chromatin , Chromosomes , Humans , Mice , Animals , Cluster Analysis , Genome , Molecular Conformation
11.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36622018

ABSTRACT

MOTIVATION: Single-cell multimodal assays allow us to simultaneously measure two different molecular features of the same cell, enabling new insights into cellular heterogeneity, cell development and diseases. However, most existing methods suffer from inaccurate dimensionality reduction for the joint-modality data, hindering their discovery of novel or rare cell subpopulations. RESULTS: Here, we present VIMCCA, a computational framework based on variational-assisted multi-view canonical correlation analysis to integrate paired multimodal single-cell data. Our statistical model uses a common latent variable to interpret the common source of variances in two different data modalities. Our approach jointly learns an inference model and two modality-specific non-linear models by leveraging variational inference and deep learning. We perform VIMCCA and compare it with 10 existing state-of-the-art algorithms on four paired multi-modal datasets sequenced by different protocols. Results demonstrate that VIMCCA facilitates integrating various types of joint-modality data, thus leading to more reliable and accurate downstream analysis. VIMCCA improves our ability to identify novel or rare cell subtypes compared to existing widely used methods. Besides, it can also facilitate inferring cell lineage based on joint-modality profiles. AVAILABILITY AND IMPLEMENTATION: The VIMCCA algorithm has been implemented in our toolkit package scbean (≥0.5.0), and its code has been archived at https://github.com/jhu99/scbean under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Models, Statistical , Cell Differentiation , Cell Lineage
12.
Int J Mol Sci ; 25(8)2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38673997

ABSTRACT

The pathogenesis of carcinoma is believed to come from the combined effect of polygenic variation, and the initiation and progression of malignant tumors are closely related to the dysregulation of biological pathways. Quantifying the alteration in pathway activation and identifying coordinated patterns of pathway dysfunction are the imperative part of understanding the malignancy process and distinguishing different tumor stages or clinical outcomes of individual patients. In this study, we have conducted in silico pathway activation analysis using Riemannian manifold (RiePath) toward pan-cancer personalized characterization, which is the first attempt to apply the Riemannian manifold theory to measure the extent of pathway dysregulation in individual patient on the tangent space of the Riemannian manifold. RiePath effectively integrates pathway and gene expression information, not only generating a relatively low-dimensional and biologically relevant representation, but also identifying a robust panel of biologically meaningful pathway signatures as biomarkers. The pan-cancer analysis across 16 cancer types reveals the capability of RiePath to evaluate pathway activation accurately and identify clinical outcome-related pathways. We believe that RiePath has the potential to provide new prospects in understanding the molecular mechanisms of complex diseases and may find broader applications in predicting biomarkers for other intricate diseases.


Subject(s)
Neoplasms , Precision Medicine , Humans , Neoplasms/genetics , Neoplasms/metabolism , Precision Medicine/methods , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Gene Expression Regulation, Neoplastic , Signal Transduction , Gene Expression Profiling/methods , Algorithms , Computational Biology/methods , Gene Regulatory Networks , Computer Simulation
13.
BMC Bioinformatics ; 24(1): 213, 2023 May 23.
Article in English | MEDLINE | ID: mdl-37221476

ABSTRACT

BACKGROUND: Structural variations (SVs) refer to variations in an organism's chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. RESULT: We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2-8% compared with the second-best method when the read depth is greater than 5×. More importantly, SVcnn has better performance for detecting multi-allelic SVs. CONCLUSIONS: SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn .


Subject(s)
Deep Learning , Biological Evolution
14.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-33963834

ABSTRACT

Different subtypes of the same cancer often show distinct genomic signatures and require targeted treatments. The differences at the cellular and molecular levels of tumor microenvironment in different cancer subtypes have significant effects on tumor pathogenesis and prognostic outcomes. Although there have been significant researches on the prognostic association of tumor infiltrating lymphocytes in selected histological subtypes, few investigations have systemically reported the prognostic impacts of immune cells in molecular subtypes, as quantified by machine learning approaches on multi-omics datasets. This paper describes a new computational framework, ProTICS, to quantify the differences in the proportion of immune cells in tumor microenvironment and estimate their prognostic effects in different subtypes. First, we stratified patients into molecular subtypes based on gene expression and methylation profiles by applying nonnegative tensor factorization technique. Then we quantified the proportion of cell types in each specimen using an mRNA-based deconvolution method. For tumors in each subtype, we estimated the prognostic effects of immune cell types by applying Cox proportional hazard regression. At the molecular level, we also predicted the prognosis of signature genes for each subtype. Finally, we benchmarked the performance of ProTICS on three TCGA datasets and another independent METABRIC dataset. ProTICS successfully stratified tumors into different molecular subtypes manifested by distinct overall survival. Furthermore, the different immune cell types showed distinct prognostic patterns with respect to molecular subtypes. This study provides new insights into the prognostic association between immune cells and molecular subtypes, showing the utility of immune cells as potential prognostic markers. Availability: R code is available at https://github.com/liu-shuhui/ProTICS.


Subject(s)
Biomarkers, Tumor , Computational Biology/methods , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/metabolism , Neoplasms/etiology , Neoplasms/mortality , Tumor Microenvironment , Algorithms , Databases, Genetic , Gene Expression Profiling/methods , Humans , Immunophenotyping , Lymphocytes, Tumor-Infiltrating/pathology , Neoplasms/diagnosis , Prognosis , Proportional Hazards Models , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology
15.
Brief Bioinform ; 22(2): 2096-2105, 2021 03 22.
Article in English | MEDLINE | ID: mdl-32249297

ABSTRACT

MOTIVATION: The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS: Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY: DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT: jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.


Subject(s)
Deep Learning , Neural Networks, Computer , Algorithms , Gene Regulatory Networks , Genes, Fungal , Humans , Molecular Sequence Annotation , Yeasts/genetics
16.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33517357

ABSTRACT

Accurately identifying potential drug-target interactions (DTIs) is a key step in drug discovery. Although many related experimental studies have been carried out for identifying DTIs in the past few decades, the biological experiment-based DTI identification is still timeconsuming and expensive. Therefore, it is of great significance to develop effective computational methods for identifying DTIs. In this paper, we develop a novel 'end-to-end' learning-based framework based on heterogeneous 'graph' convolutional networks for 'DTI' prediction called end-to-end graph (EEG)-DTI. Given a heterogeneous network containing multiple types of biological entities (i.e. drug, protein, disease, side-effect), EEG-DTI learns the low-dimensional feature representation of drugs and targets using a graph convolutional networks-based model and predicts DTIs based on the learned features. During the training process, EEG-DTI learns the feature representation of nodes in an end-to-end mode. The evaluation test shows that EEG-DTI performs better than existing state-of-art methods. The data and source code are available at: https://github.com/MedicineBiology-AI/EEG-DTI.


Subject(s)
Computer Simulation , Drug Development , Drug Discovery , Machine Learning , Pharmaceutical Preparations/chemistry , Software , Drug-Related Side Effects and Adverse Reactions , Humans , Proteins/chemistry , Proteins/metabolism
17.
Bioinformatics ; 38(9): 2536-2543, 2022 04 28.
Article in English | MEDLINE | ID: mdl-35199150

ABSTRACT

MOTIVATION: Biomarkers with prognostic ability and biological interpretability can be used to support decision-making in the survival analysis. Genes usually form functional modules to play synergistic roles, such as pathways. Predicting significant features from the functional level can effectively reduce the adverse effects of heterogeneity and obtain more reproducible and interpretable biomarkers. Personalized pathway activation inference can quantify the dysregulation of essential pathways involved in the initiation and progression of cancers, and can contribute to the development of personalized medical treatments. RESULTS: In this study, we propose a novel method to evaluate personalized pathway activation based on signaling entropy for survival analysis (SEPA), which is a new attempt to introduce the information-theoretic entropy in generating pathway representation for each patient. SEPA effectively integrates pathway-level information into gene expression data, converting the high-dimensional gene expression data into the low-dimensional biological pathway activation scores. SEPA shows its classification power on the prognostic pan-cancer genomic data, and the potential pathway markers identified based on SEPA have statistical significance in the discrimination of high- and low-risk cohorts and are likely to be associated with the initiation and progress of cancers. The results show that SEPA scores can be used as an indicator to precisely distinguish cancer patients with different clinical outcomes, and identify important pathway features with strong discriminative power and biological interpretability. AVAILABILITY AND IMPLEMENTATION: The MATLAB-package for SEPA is freely available from https://github.com/xingyili/SEPA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neoplasms , Humans , Entropy , Neoplasms/genetics , Survival Analysis , Algorithms , Biomarkers
18.
Methods ; 192: 77-84, 2021 08.
Article in English | MEDLINE | ID: mdl-32946974

ABSTRACT

Analyzing disease-disease relationships plays an important role for understanding disease mechanisms and finding alternative uses for a drug. A disease is usually the result of abnormal state of multiple molecular process. Since biological networks can model the interplay of multiple molecular processes, network-based methods have been proposed to uncover the disease-disease relationships recently. Given a disease and a network, the disease could be represented as a subnetwork constructed by the disease genes involved in the given network, named disease subnetwork. Because it is difficult to learn the feature representation of disease subnetworks, most existing methods are unsupervised ones without using labeled information. To fill this gap, we propose a novel method named SubNet2vec to learn the feature vectors of diseases from their corresponding subnetwork in the biological network. By utilizing the feature representation of disease subnetwork, we can analyze disease-disease relationships in a supervised fashion. The evaluation results show that the proposed framework outperforms some state-of-the-art approaches in a large margin on disease-disease/disease-drug association prediction. The source code and data are available athttps://github.com/MedicineBiology-AI/SubNet2vec.git.


Subject(s)
Software , Pharmaceutical Preparations
19.
BMC Bioinformatics ; 22(1): 5, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33407064

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) enables the possibility of many in-depth transcriptomic analyses at a single-cell resolution. It's already widely used for exploring the dynamic development process of life, studying the gene regulation mechanism, and discovering new cell types. However, the low RNA capture rate, which cause highly sparse expression with dropout, makes it difficult to do downstream analyses. RESULTS: We propose a new method SCC to impute the dropouts of scRNA-seq data. Experiment results show that SCC gives competitive results compared to two existing methods while showing superiority in reducing the intra-class distance of cells and improving the clustering accuracy in both simulation and real data. CONCLUSIONS: SCC is an effective tool to resolve the dropout noise in scRNA-seq data. The code is freely accessible at https://github.com/nwpuzhengyan/SCC .


Subject(s)
Gene Expression Profiling/methods , RNA, Small Cytoplasmic/genetics , Single-Cell Analysis/methods , Gene Expression Regulation/genetics , Genomics/methods , Models, Genetic
20.
BMC Bioinformatics ; 22(Suppl 9): 281, 2021 Aug 25.
Article in English | MEDLINE | ID: mdl-34433409

ABSTRACT

BACKGROUND: It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue. RESULTS: By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods. CONCLUSION: DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.


Subject(s)
Gene Expression Profiling , RNA , Humans , RNA/genetics , RNA-Seq , Sequence Analysis, RNA , Exome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL