Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 6.474
Filter
1.
J Bioinform Comput Biol ; 22(3): 2450015, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39036845

ABSTRACT

The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , RNA-Seq/methods , Computational Biology/methods , Software , Humans , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Algorithms , Deep Learning , Single-Cell Gene Expression Analysis
2.
J Bioinform Comput Biol ; 22(3): 2450007, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39036848

ABSTRACT

For sequencing-based spatial transcriptomics data, the gene-spot count matrix is highly sparse. This feature is similar to scRNA-seq. The goal of this paper is to identify whether there exist genes that are frequently under-detected in Visium compared to bulk RNA-seq, and the underlying potential mechanism of under-detection in Visium. We collected paired Visium and bulk RNA-seq data for 28 human samples and 19 mouse samples, which covered diverse tissue sources. We compared the two data types and observed that there indeed exists a collection of genes frequently under-detected in Visium compared to bulk RNA-seq. We performed a motif search to examine the last 350 bp of the frequently under-detected genes, and we observed that the poly (T) motif was significantly enriched in genes identified from both human and mouse data, which matches with our previous finding about frequently under-detected genes in scRNA-seq. We hypothesized that the poly (T) motif may be able to form a hairpin structure with the poly (A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during Visium library preparation.


Subject(s)
Gene Expression Profiling , Transcriptome , Mice , Humans , Animals , Gene Expression Profiling/methods , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , RNA-Seq/methods , Computational Biology/methods , Nucleotide Motifs
3.
Nat Commun ; 15(1): 6167, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39039053

ABSTRACT

Translating RNA-seq into clinical diagnostics requires ensuring the reliability and cross-laboratory consistency of detecting clinically relevant subtle differential expressions, such as those between different disease subtypes or stages. As part of the Quartet project, we present an RNA-seq benchmarking study across 45 laboratories using the Quartet and MAQC reference samples spiked with ERCC controls. Based on multiple types of 'ground truth', we systematically assess the real-world RNA-seq performance and investigate the influencing factors involved in 26 experimental processes and 140 bioinformatics pipelines. Here we show greater inter-laboratory variations in detecting subtle differential expressions among the Quartet samples. Experimental factors including mRNA enrichment and strandedness, and each bioinformatics step, emerge as primary sources of variations in gene expression. We underscore the profound influence of experimental execution, and provide best practice recommendations for experimental designs, strategies for filtering low-expression genes, and the optimal gene annotation and analysis pipelines. In summary, this study lays the foundation for developing and quality control of RNA-seq for clinical diagnostic purposes.


Subject(s)
Benchmarking , Computational Biology , Quality Control , RNA-Seq , Reference Standards , Benchmarking/methods , Humans , RNA-Seq/methods , RNA-Seq/standards , Computational Biology/methods , Reproducibility of Results , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/standards , Gene Expression Profiling/methods , Gene Expression Profiling/standards , RNA, Messenger/genetics , RNA, Messenger/metabolism
4.
PLoS One ; 19(7): e0300565, 2024.
Article in English | MEDLINE | ID: mdl-39018275

ABSTRACT

The mRNA-seq data analysis is a powerful technology for inferring information from biological systems of interest. Specifically, the sequenced RNA fragments are aligned with genomic reference sequences, and we count the number of sequence fragments corresponding to each gene for each condition. A gene is identified as differentially expressed (DE) if the difference in its count numbers between conditions is statistically significant. Several statistical analysis methods have been developed to detect DE genes based on RNA-seq data. However, the existing methods could suffer decreasing power to identify DE genes arising from overdispersion and limited sample size, where overdispersion refers to the empirical phenomenon that the variance of read counts is larger than the mean of read counts. We propose a new differential expression analysis procedure: heterogeneous overdispersion genes testing (DEHOGT) based on heterogeneous overdispersion modeling and a post-hoc inference procedure. DEHOGT integrates sample information from all conditions and provides a more flexible and adaptive overdispersion modeling for the RNA-seq read count. DEHOGT adopts a gene-wise estimation scheme to enhance the detection power of differentially expressed genes when the number of replicates is limited as long as the number of conditions is large. DEHOGT is tested on the synthetic RNA-seq read count data and outperforms two popular existing methods, DESeq2 and EdgeR, in detecting DE genes. We apply the proposed method to a test dataset using RNAseq data from microglial cells. DEHOGT tends to detect more differently expressed genes potentially related to microglial cells under different stress hormones treatments.


Subject(s)
Gene Expression Profiling , Gene Expression Profiling/methods , Animals , Sequence Analysis, RNA/methods , Humans , RNA-Seq/methods , Algorithms , Mice , RNA, Messenger/genetics
5.
PLoS One ; 19(7): e0305907, 2024.
Article in English | MEDLINE | ID: mdl-39052586

ABSTRACT

The mechanisms governing gene regulation in domestic Yuzhong pigeon breast muscle development remain largely elusive. Here, we conducted a comparative analysis using Iso-seq and RNA-seq data from domestic Yuzhong pigeons and European meat pigeons to uncover signaling pathways and genes involved in breast muscle development. The Iso-seq data from domestic Yuzhong pigeons yielded 131,377,075 subreads, resulting in 16,587 non-redundant high-quality full-length transcripts post-correction. Furthermore, utilizing pfam, CPC, PLEK, and CPAT, we predicted 5575, 4973, 2333, and 4336 lncRNAs, respectively. Notably, several genes potentially implicated in breast muscle development were identified, including tropomyosin beta chain, myosin regulatory light chain 2, and myosin binding protein C. KEGG enrichment analysis revealed critical signaling pathways in breast muscle development, spanning carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis, estrogen signaling, PI3K-AKT signaling, protein processing in the endoplasmic reticulum, oxidative phosphorylation, pentose phosphate pathway, fructose and mannose metabolism, and tight junctions. These findings offer insights into the biological processes driving breast muscle development in domestic Yuzhong pigeon, contributing to our understanding of this complex phenomenon.


Subject(s)
Columbidae , Muscle Development , RNA-Seq , Animals , Columbidae/genetics , Columbidae/growth & development , Columbidae/metabolism , Muscle Development/genetics , Signal Transduction/genetics , Sequence Analysis, RNA , RNA, Long Noncoding/genetics
6.
Genes (Basel) ; 15(7)2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39062661

ABSTRACT

In recent years, there has been a growing interest in profiling multiomic modalities within individual cells simultaneously. One such example is integrating combined single-cell RNA sequencing (scRNA-seq) data and single-cell transposase-accessible chromatin sequencing (scATAC-seq) data. Integrated analysis of diverse modalities has helped researchers make more accurate predictions and gain a more comprehensive understanding than with single-modality analysis. However, generating such multimodal data is technically challenging and expensive, leading to limited availability of single-cell co-assay data. Here, we propose a model for cross-modal prediction between the transcriptome and chromatin profiles in single cells. Our model is based on a deep neural network architecture that learns the latent representations from the source modality and then predicts the target modality. It demonstrates reliable performance in accurately translating between these modalities across multiple paired human scATAC-seq and scRNA-seq datasets. Additionally, we developed CrossMP, a web-based portal allowing researchers to upload their single-cell modality data through an interactive web interface and predict the other type of modality data, using high-performance computing resources plugged at the backend.


Subject(s)
Chromatin Immunoprecipitation Sequencing , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Chromatin Immunoprecipitation Sequencing/methods , Software , Internet , Transcriptome/genetics , Sequence Analysis, RNA/methods , Chromatin/genetics , Chromatin/metabolism , Single-Cell Gene Expression Analysis
7.
Brief Bioinform ; 25(5)2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39060167

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) enables the exploration of biological heterogeneity among different cell types within tissues at a resolution. Inferring cell types within tissues is foundational for downstream research. Most existing methods for cell type inference based on scRNA-seq data primarily utilize highly variable genes (HVGs) with higher expression levels as clustering features, overlooking the contribution of HVGs with lower expression levels. To address this, we have designed a novel cell type inference method for scRNA-seq data, termed scLEGA. scLEGA employs a novel zero-inflated negative binomial (ZINB) loss function that fully considers the contribution of genes with lower expression levels and combines two distinct scRNA-seq clustering strategies through a multi-head attention mechanism. It utilizes a low-expression optimized denoising autoencoder, based on the novel ZINB model, to extract low-dimensional features and handle dropout events, and a GCN-based graph autoencoder (GAE) that leverages neighbor information to guide dimensionality reduction. The iterative fusion of denoising and topological embedding in scLEGA facilitates the acquisition of cluster-friendly cell representations in the hidden embedding, where similar cells are brought closer together. Compared to 12 state-of-the-art cell type inference methods on 15 scRNA-seq datasets, scLEGA demonstrates superior performance in clustering accuracy, scalability, and stability. Our scLEGA model codes are freely available at https://github.com/Masonze/scLEGA-main.


Subject(s)
RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , RNA-Seq/methods , Humans , Software , Algorithms , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Computational Biology/methods , Single-Cell Gene Expression Analysis
8.
Biomolecules ; 14(7)2024 Jun 27.
Article in English | MEDLINE | ID: mdl-39062480

ABSTRACT

Understanding the dynamics of gene regulatory networks (GRNs) across diverse cell types poses a challenge yet holds immense value in unraveling the molecular mechanisms governing cellular processes. Current computational methods, which rely solely on expression changes from bulk RNA-seq and/or scRNA-seq data, often result in high rates of false positives and low precision. Here, we introduce an advanced computational tool, DeepIMAGER, for inferring cell-specific GRNs through deep learning and data integration. DeepIMAGER employs a supervised approach that transforms the co-expression patterns of gene pairs into image-like representations and leverages transcription factor (TF) binding information for model training. It is trained using comprehensive datasets that encompass scRNA-seq profiles and ChIP-seq data, capturing TF-gene pair information across various cell types. Comprehensive validations on six cell lines show DeepIMAGER exhibits superior performance in ten popular GRN inference tools and has remarkable robustness against dropout-zero events. DeepIMAGER was applied to scRNA-seq datasets of multiple myeloma (MM) and detected potential GRNs for TFs of RORC, MITF, and FOXD2 in MM dendritic cells. This technical innovation, combined with its capability to accurately decode GRNs from scRNA-seq, establishes DeepIMAGER as a valuable tool for unraveling complex regulatory networks in various cell types.


Subject(s)
Gene Regulatory Networks , RNA-Seq , Humans , RNA-Seq/methods , Transcription Factors/metabolism , Transcription Factors/genetics , Deep Learning , Multiple Myeloma/genetics , Computational Biology/methods , Sequence Analysis, RNA/methods , Software , Single-Cell Analysis/methods , Single-Cell Gene Expression Analysis
9.
Biomolecules ; 14(7)2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39062554

ABSTRACT

In studying the molecular underpinning of spermatogenesis, we expect to understand the fundamental biological processes better and potentially identify genes that may lead to novel diagnostic and therapeutic strategies toward precision medicine in male infertility. In this review, we emphasized our perspective that the path forward necessitates integrative studies that rely on complementary approaches and types of data. To comprehensively analyze spermatogenesis, this review proposes four axes of integration. First, spanning the analysis of spermatogenesis in the healthy state alongside pathologies. Second, the experimental analysis of model systems (in which we can deploy treatments and perturbations) alongside human data. Third, the phenotype is measured alongside its underlying molecular profiles using known markers augmented with unbiased profiles. Finally, the testicular cells are studied as ecosystems, analyzing the germ cells alongside the states observed in the supporting somatic cells. Recently, the study of spermatogenesis has been advancing using single-cell RNA sequencing, where scientists have uncovered the unique stages of germ cell development in mice, revealing new regulators of spermatogenesis and previously unknown cell subtypes in the testis. An in-depth analysis of meiotic and postmeiotic stages led to the discovery of marker genes for spermatogonia, Sertoli and Leydig cells and further elucidated all the other germline and somatic cells in the testis microenvironment in normal and pathogenic conditions. The outcome of an integrative analysis of spermatogenesis using advanced molecular profiling technologies such as scRNA-seq has already propelled our biological understanding, with additional studies expected to have clinical implications for the study of male fertility. By uncovering new genes and pathways involved in abnormal spermatogenesis, we may gain insights into subfertility or sterility.


Subject(s)
RNA-Seq , Single-Cell Analysis , Spermatogenesis , Spermatogenesis/genetics , Humans , Male , Animals , Single-Cell Analysis/methods , Mice , RNA-Seq/methods , Germ Cells/metabolism , Testis/metabolism , Infertility, Male/genetics , Single-Cell Gene Expression Analysis
10.
Genome Biol ; 25(1): 169, 2024 07 01.
Article in English | MEDLINE | ID: mdl-38956606

ABSTRACT

BACKGROUND: Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. RESULTS: In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. CONCLUSIONS: Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package https://github.com/humengying0907/deconvBenchmarking and https://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods.


Subject(s)
Benchmarking , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Computer Simulation , RNA-Seq/methods , Computational Biology/methods
11.
Sci Adv ; 10(27): eadj7402, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38959321

ABSTRACT

The study of the tumor microbiome has been garnering increased attention. We developed a computational pipeline (CSI-Microbes) for identifying microbial reads from single-cell RNA sequencing (scRNA-seq) data and for analyzing differential abundance of taxa. Using a series of controlled experiments and analyses, we performed the first systematic evaluation of the efficacy of recovering microbial unique molecular identifiers by multiple scRNA-seq technologies, which identified the newer 10x chemistries (3' v3 and 5') as the best suited approach. We analyzed patient esophageal and colorectal carcinomas and found that reads from distinct genera tend to co-occur in the same host cells, testifying to possible intracellular polymicrobial interactions. Microbial reads are disproportionately abundant within myeloid cells that up-regulate proinflammatory cytokines like IL1Β and CXCL8, while infected tumor cells up-regulate antigen processing and presentation pathways. These results show that myeloid cells with bacteria engulfed are a major source of bacterial RNA within the tumor microenvironment (TME) and may inflame the TME and influence immunotherapy response.


Subject(s)
Bacteria , RNA-Seq , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , RNA-Seq/methods , Bacteria/genetics , Tumor Microenvironment , Myeloid Cells/metabolism , Myeloid Cells/microbiology , Sequence Analysis, RNA/methods , Colorectal Neoplasms/microbiology , Colorectal Neoplasms/genetics , Computational Biology/methods , RNA, Bacterial/genetics , Esophageal Neoplasms/microbiology , Esophageal Neoplasms/genetics , Microbiota , Single-Cell Gene Expression Analysis
12.
Front Immunol ; 15: 1399856, 2024.
Article in English | MEDLINE | ID: mdl-38962008

ABSTRACT

Objective: Rheumatoid arthritis (RA) is a systemic disease that attacks the joints and causes a heavy economic burden on humans worldwide. T cells regulate RA progression and are considered crucial targets for therapy. Therefore, we aimed to integrate multiple datasets to explore the mechanisms of RA. Moreover, we established a T cell-related diagnostic model to provide a new method for RA immunotherapy. Methods: scRNA-seq and bulk-seq datasets for RA were obtained from the Gene Expression Omnibus (GEO) database. Various methods were used to analyze and characterize the T cell heterogeneity of RA. Using Mendelian randomization (MR) and expression quantitative trait loci (eQTL), we screened for potential pathogenic T cell marker genes in RA. Subsequently, we selected an optimal machine learning approach by comparing the nine types of machine learning in predicting RA to identify T cell-related diagnostic features to construct a nomogram model. Patients with RA were divided into different T cell-related clusters using the consensus clustering method. Finally, we performed immune cell infiltration and clinical correlation analyses of T cell-related diagnostic features. Results: By analyzing the scRNA-seq dataset, we obtained 10,211 cells that were annotated into 7 different subtypes based on specific marker genes. By integrating the eQTL from blood and RA GWAS, combined with XGB machine learning, we identified a total of 8 T cell-related diagnostic features (MIER1, PPP1CB, ICOS, GADD45A, CD3D, SLFN5, PIP4K2A, and IL6ST). Consensus clustering analysis showed that RA could be classified into two different T-cell patterns (Cluster 1 and Cluster 2), with Cluster 2 having a higher T-cell score than Cluster 1. The two clusters involved different pathways and had different immune cell infiltration states. There was no difference in age or sex between the two different T cell patterns. In addition, ICOS and IL6ST were negatively correlated with age in RA patients. Conclusion: Our findings elucidate the heterogeneity of T cells in RA and the communication role of these cells in an RA immune microenvironment. The construction of T cell-related diagnostic models provides a resource for guiding RA immunotherapeutic strategies.


Subject(s)
Arthritis, Rheumatoid , Mendelian Randomization Analysis , Quantitative Trait Loci , RNA-Seq , Single-Cell Analysis , Humans , Arthritis, Rheumatoid/genetics , Arthritis, Rheumatoid/immunology , Arthritis, Rheumatoid/diagnosis , Single-Cell Analysis/methods , Nomograms , Machine Learning , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , Gene Expression Profiling , Single-Cell Gene Expression Analysis
13.
Nat Commun ; 15(1): 5665, 2024 Jul 06.
Article in English | MEDLINE | ID: mdl-38969631

ABSTRACT

The paradigm for macrophage characterization has evolved from the simple M1/M2 dichotomy to a more complex model that encompasses the broad spectrum of macrophage phenotypic diversity, due to differences in ontogeny and/or local stimuli. We currently lack an in-depth pan-cancer single cell RNA-seq (scRNAseq) atlas of tumour-associated macrophages (TAMs) that fully captures this complexity. In addition, an increased understanding of macrophage diversity could help to explain the variable responses of cancer patients to immunotherapy. Our atlas includes well established macrophage subsets as well as a number of additional ones. We associate macrophage composition with tumour phenotype and show macrophage subsets can vary between primary and metastatic tumours growing in sites like the liver. We also examine macrophage-T cell functional cross talk and identify two subsets of TAMs associated with T cell activation. Analysis of TAM signatures in a large cohort of immune checkpoint inhibitor-treated patients (CPI1000 + ) identify multiple TAM subsets associated with response, including the presence of a subset of TAMs that upregulate collagen-related genes. Finally, we demonstrate the utility of our data as a resource and reference atlas for mapping of novel macrophage datasets using projection. Overall, these advances represent an important step in both macrophage classification and overcoming resistance to immunotherapies in cancer.


Subject(s)
Immunotherapy , Neoplasms , Tumor-Associated Macrophages , Humans , Immunotherapy/methods , Tumor-Associated Macrophages/immunology , Tumor-Associated Macrophages/metabolism , Neoplasms/immunology , Neoplasms/therapy , Neoplasms/pathology , Neoplasms/genetics , Tumor Microenvironment/immunology , Single-Cell Analysis , T-Lymphocytes/immunology , RNA-Seq , Immune Checkpoint Inhibitors/therapeutic use , Immune Checkpoint Inhibitors/pharmacology , Macrophages/immunology , Gene Expression Regulation, Neoplastic
14.
Eur J Med Res ; 29(1): 358, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38970067

ABSTRACT

Ovarian cancer (OC) was the fifth leading cause of cancer death and the deadliest gynecological cancer in women. This was largely attributed to its late diagnosis, high therapeutic resistance, and a dearth of effective treatments. Clinical and preclinical studies have revealed that tumor-infiltrating CD8+T cells often lost their effector function, the dysfunctional state of CD8+T cells was known as exhaustion. Our objective was to identify genes associated with exhausted CD8+T cells (CD8TEXGs) and their prognostic significance in OC. We downloaded the RNA-seq and clinical data from the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. CD8TEXGs were initially identified from single-cell RNA-seq (scRNA-seq) datasets, then univariate Cox regression, the least absolute shrinkage and selection operator (LASSO), and multivariate Cox regression were utilized to calculate risk score and to develop the CD8TEXGs risk signature. Kaplan-Meier analysis, univariate Cox regression, multivariate Cox regression, time-dependent receiver operating characteristics (ROC), nomogram, and calibration were conducted to verify and evaluate the risk signature. Gene set enrichment analyses (GSEA) in the risk groups were used to figure out the closely correlated pathways with the risk group. The role of risk score has been further explored in the homologous recombination repair deficiency (HRD), BRAC1/2 gene mutations and tumor mutation burden (TMB). A risk signature with 4 CD8TEXGs in OC was finally built in the TCGA database and further validated in large GEO cohorts. The signature also demonstrated broad applicability across various types of cancer in the pan-cancer analysis. The high-risk score was significantly associated with a worse prognosis and the risk score was proven to be an independent prognostic biomarker. The 1-, 3-, and 5-years ROC values, nomogram, calibration, and comparison with the previously published models confirmed the excellent prediction power of this model. The low-risk group patients tended to exhibit a higher HRD score, BRCA1/2 gene mutation ratio and TMB. The low-risk group patients were more sensitive to Poly-ADP-ribose polymerase inhibitors (PARPi). Our findings of the prognostic value of CD8TEXGs in prognosis and drug response provided valuable insights into the molecular mechanisms and clinical management of OC.


Subject(s)
CD8-Positive T-Lymphocytes , Ovarian Neoplasms , Humans , Female , Ovarian Neoplasms/genetics , CD8-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/metabolism , Prognosis , RNA-Seq/methods , Biomarkers, Tumor/genetics , Single-Cell Analysis/methods , Gene Expression Regulation, Neoplastic , Single-Cell Gene Expression Analysis
15.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38960404

ABSTRACT

Recent advances in microfluidics and sequencing technologies allow researchers to explore cellular heterogeneity at single-cell resolution. In recent years, deep learning frameworks, such as generative models, have brought great changes to the analysis of transcriptomic data. Nevertheless, relying on the potential space of these generative models alone is insufficient to generate biological explanations. In addition, most of the previous work based on generative models is limited to shallow neural networks with one to three layers of latent variables, which may limit the capabilities of the models. Here, we propose a deep interpretable generative model called d-scIGM for single-cell data analysis. d-scIGM combines sawtooth connectivity techniques and residual networks, thereby constructing a deep generative framework. In addition, d-scIGM incorporates hierarchical prior knowledge of biological domains to enhance the interpretability of the model. We show that d-scIGM achieves excellent performance in a variety of fundamental tasks, including clustering, visualization, and pseudo-temporal inference. Through topic pathway studies, we found that d-scIGM-learned topics are better enriched for biologically meaningful pathways compared to the baseline models. Furthermore, the analysis of drug response data shows that d-scIGM can capture drug response patterns in large-scale experiments, which provides a promising way to elucidate the underlying biological mechanisms. Lastly, in the melanoma dataset, d-scIGM accurately identified different cell types and revealed multiple melanin-related driver genes and key pathways, which are critical for understanding disease mechanisms and drug development.


Subject(s)
Deep Learning , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Computational Biology/methods , Algorithms , Sequence Analysis, RNA/methods , Neural Networks, Computer , Single-Cell Gene Expression Analysis
16.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38960406

ABSTRACT

Spatial transcriptomics data play a crucial role in cancer research, providing a nuanced understanding of the spatial organization of gene expression within tumor tissues. Unraveling the spatial dynamics of gene expression can unveil key insights into tumor heterogeneity and aid in identifying potential therapeutic targets. However, in many large-scale cancer studies, spatial transcriptomics data are limited, with bulk RNA-seq and corresponding Whole Slide Image (WSI) data being more common (e.g. TCGA project). To address this gap, there is a critical need to develop methodologies that can estimate gene expression at near-cell (spot) level resolution from existing WSI and bulk RNA-seq data. This approach is essential for reanalyzing expansive cohort studies and uncovering novel biomarkers that have been overlooked in the initial assessments. In this study, we present STGAT (Spatial Transcriptomics Graph Attention Network), a novel approach leveraging Graph Attention Networks (GAT) to discern spatial dependencies among spots. Trained on spatial transcriptomics data, STGAT is designed to estimate gene expression profiles at spot-level resolution and predict whether each spot represents tumor or non-tumor tissue, especially in patient samples where only WSI and bulk RNA-seq data are available. Comprehensive tests on two breast cancer spatial transcriptomics datasets demonstrated that STGAT outperformed existing methods in accurately predicting gene expression. Further analyses using the TCGA breast cancer dataset revealed that gene expression estimated from tumor-only spots (predicted by STGAT) provides more accurate molecular signatures for breast cancer sub-type and tumor stage prediction, and also leading to improved patient survival and disease-free analysis. Availability: Code is available at https://github.com/compbiolabucf/STGAT.


Subject(s)
Gene Expression Profiling , RNA-Seq , Transcriptome , Humans , RNA-Seq/methods , Gene Expression Profiling/methods , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Gene Expression Regulation, Neoplastic , Computational Biology/methods , Female , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism
17.
Nat Commun ; 15(1): 5600, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38961061

ABSTRACT

ezSingleCell is an interactive and easy-to-use application for analysing various single-cell and spatial omics data types without requiring prior programing knowledge. It combines the best-performing publicly available methods for in-depth data analysis, integration, and interactive data visualization. ezSingleCell consists of five modules, each designed to be a comprehensive workflow for one data type or task. In addition, ezSingleCell allows crosstalk between different modules within a unified interface. Acceptable input data can be in a variety of formats while the output consists of publication ready figures and tables. In-depth manuals and video tutorials are available to guide users on the analysis workflows and parameter adjustments to suit their study aims. ezSingleCell's streamlined interface can analyse a standard scRNA-seq dataset of 3000 cells in less than five minutes. ezSingleCell is available in two forms: an installation-free web application ( https://immunesinglecell.org/ezsc/ ) or a software package with a shinyApp interface ( https://github.com/JinmiaoChenLab/ezSingleCell2 ) for offline analysis.


Subject(s)
Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Workflow , Computational Biology/methods , User-Computer Interface , RNA-Seq/methods
18.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975891

ABSTRACT

Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Gene Expression Profiling/methods , Algorithms , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods
19.
CNS Neurosci Ther ; 30(7): e14850, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39021287

ABSTRACT

INTRODUCTION: Glioma is the most frequent and lethal form of primary brain tumor. The molecular mechanism of oncogenesis and progression of glioma still remains unclear, rendering the therapeutic effect of conventional radiotherapy, chemotherapy, and surgical resection insufficient. In this study, we sought to explore the function of HEC1 (highly expressed in cancer 1) in glioma; a component of the NDC80 complex in glioma is crucial in the regulation of kinetochore. METHODS: Bulk RNA and scRNA-seq analyses were used to infer HEC1 function, and in vitro experiments validated its function. RESULTS: HEC1 overexpression was observed in glioma and was indicative of poor prognosis and malignant clinical features, which was confirmed in human glioma tissues. High HEC1 expression was correlated with more active cell cycle, DNA-associated activities, and the formation of immunosuppressive tumor microenvironment, including interaction with immune cells, and correlated strongly with infiltrating immune cells and enhanced expression of immune checkpoints. In vitro experiments and RNA-seq further confirmed the role of HEC1 in promoting cell proliferation, and the expression of DNA replication and repair pathways in glioma. Coculture assay confirmed that HEC1 promotes microglial migration and the transformation of M1 phenotype macrophage to M2 phenotype. CONCLUSION: Altogether, these findings demonstrate that HEC1 may be a potential prognostic marker and an immunotherapeutic target in glioma.


Subject(s)
Brain Neoplasms , Glioma , Macrophages , RNA-Seq , Humans , Glioma/genetics , Glioma/pathology , Glioma/metabolism , Brain Neoplasms/genetics , Brain Neoplasms/pathology , Brain Neoplasms/metabolism , Prognosis , Macrophages/metabolism , Single-Cell Analysis , Male , Female , Tumor Microenvironment/genetics , Cell Line, Tumor , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Middle Aged , Cell Proliferation , Single-Cell Gene Expression Analysis , Cytoskeletal Proteins
20.
PLoS Comput Biol ; 20(7): e1011620, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38976751

ABSTRACT

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.


Subject(s)
Computational Biology , Single-Cell Analysis , Computational Biology/methods , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , Humans , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Algorithms , Gene Regulatory Networks/genetics , Models, Statistical , Software , Single-Cell Gene Expression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL