Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.014
Filter
1.
Nat Commun ; 15(1): 6541, 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39095360

ABSTRACT

Recent advances in spatial omics have expanded the spectrum of profiled molecular categories beyond transcriptomics. However, many of these technologies are constrained by limited spatial resolution, hindering our ability to deeply characterize intricate tissue architectures. Existing computational methods primarily focus on the resolution enhancement of transcriptomics data, lacking the adaptability to address the emerging spatial omics technologies that profile various omics types. Here, we introduce soScope, a unified generative framework designed to enhance data quality and spatial resolution for molecular profiles obtained from diverse spatial technologies. soScope aggregates multimodal tissue information from omics, spatial relations and images, and jointly infers omics profiles at enhanced resolutions with omics-specific modeling through distribution priors. With comprehensive evaluations on diverse spatial omics platforms, including Visium, Xenium, spatial-CUT&Tag, and slide-DNA/RNA-seq, soScope improves performances in identifying biologically meaningful intestine and kidney architectures, revealing embryonic heart structure that cannot be resolved at the original resolution and correcting sample and technical biases arising from sequencing and sample processing. Furthermore, soScope extends to spatial multiomics technology spatial-CITE-seq and spatial ATAC-RNA-seq, leveraging cross-omics reference for simultaneous multiomics enhancement. soScope provides a versatile tool to improve the utilization of continually expanding spatial omics technologies and resources.


Subject(s)
Transcriptome , Animals , Mice , Gene Expression Profiling/methods , Computational Biology/methods , Genomics/methods , Humans , Kidney/metabolism , RNA-Seq/methods
2.
Article in English | MEDLINE | ID: mdl-39049508

ABSTRACT

Gene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA sequencing (RNA-seq) data, which helps to decipher single-cell heterogeneity and cell type-specific variability by incorporating prior knowledge from functional gene sets. Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a powerful technique for interrogating single-cell chromatin-based gene regulation, and genes or gene sets with dynamic regulatory potentials can be regarded as cell type-specific markers as if in single-cell RNA-seq (scRNA-seq). However, there are few GSS tools specifically designed for scATAC-seq, and the applicability and performance of RNA-seq GSS tools on scATAC-seq data remain to be investigated. Here, we systematically benchmarked ten GSS tools, including four bulk RNA-seq tools, five scRNA-seq tools, and one scATAC-seq method. First, using matched scATAC-seq and scRNA-seq datasets, we found that the performance of GSS tools on scATAC-seq data was comparable to that on scRNA-seq, suggesting their applicability to scATAC-seq. Then, the performance of different GSS tools was extensively evaluated using up to ten scATAC-seq datasets. Moreover, we evaluated the impact of gene activity conversion, dropout imputation, and gene set collections on the results of GSS. Results show that dropout imputation can significantly promote the performance of almost all GSS tools, while the impact of gene activity conversion methods or gene set collections on GSS performance is more dependent on GSS tools or datasets. Finally, we provided practical guidelines for choosing appropriate preprocessing methods and GSS tools in different application scenarios.


Subject(s)
Algorithms , Benchmarking , Chromatin Immunoprecipitation Sequencing , Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/standards , Humans , Chromatin Immunoprecipitation Sequencing/methods , RNA-Seq/methods , RNA-Seq/standards , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/standards , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Chromatin/genetics , Chromatin/metabolism
3.
PLoS One ; 19(7): e0305857, 2024.
Article in English | MEDLINE | ID: mdl-39037985

ABSTRACT

Traditional differential expression genes (DEGs) identification models have limitations in small sample size datasets because they require meeting distribution assumptions, otherwise resulting high false positive/negative rates due to sample variation. In contrast, tabular data model based on deep learning (DL) frameworks do not need to consider the data distribution types and sample variation. However, applying DL to RNA-Seq data is still a challenge due to the lack of proper labeling and the small sample size compared to the number of genes. Data augmentation (DA) extracts data features using different methods and procedures, which can significantly increase complementary pseudo-values from limited data without significant additional cost. Based on this, we combine DA and DL framework-based tabular data model, propose a model TabDEG, to predict DEGs and their up-regulation/down-regulation directions from gene expression data obtained from the Cancer Genome Atlas database. Compared to five counterpart methods, TabDEG has high sensitivity and low misclassification rates. Experiment shows that TabDEG is robust and effective in enhancing data features to facilitate classification of high-dimensional small sample size datasets and validates that TabDEG-predicted DEGs are mapped to important gene ontology terms and pathways associated with cancer.


Subject(s)
Deep Learning , RNA-Seq , Humans , RNA-Seq/methods , Gene Expression Profiling/methods , Neoplasms/genetics , Computational Biology/methods , Databases, Genetic , Gene Expression Regulation, Neoplastic
4.
J Bioinform Comput Biol ; 22(3): 2450015, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39036845

ABSTRACT

The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , RNA-Seq/methods , Computational Biology/methods , Software , Humans , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Algorithms , Deep Learning , Single-Cell Gene Expression Analysis
5.
J Bioinform Comput Biol ; 22(3): 2450007, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39036848

ABSTRACT

For sequencing-based spatial transcriptomics data, the gene-spot count matrix is highly sparse. This feature is similar to scRNA-seq. The goal of this paper is to identify whether there exist genes that are frequently under-detected in Visium compared to bulk RNA-seq, and the underlying potential mechanism of under-detection in Visium. We collected paired Visium and bulk RNA-seq data for 28 human samples and 19 mouse samples, which covered diverse tissue sources. We compared the two data types and observed that there indeed exists a collection of genes frequently under-detected in Visium compared to bulk RNA-seq. We performed a motif search to examine the last 350 bp of the frequently under-detected genes, and we observed that the poly (T) motif was significantly enriched in genes identified from both human and mouse data, which matches with our previous finding about frequently under-detected genes in scRNA-seq. We hypothesized that the poly (T) motif may be able to form a hairpin structure with the poly (A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during Visium library preparation.


Subject(s)
Gene Expression Profiling , Transcriptome , Mice , Humans , Animals , Gene Expression Profiling/methods , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , RNA-Seq/methods , Computational Biology/methods , Nucleotide Motifs
6.
Nat Commun ; 15(1): 6167, 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39039053

ABSTRACT

Translating RNA-seq into clinical diagnostics requires ensuring the reliability and cross-laboratory consistency of detecting clinically relevant subtle differential expressions, such as those between different disease subtypes or stages. As part of the Quartet project, we present an RNA-seq benchmarking study across 45 laboratories using the Quartet and MAQC reference samples spiked with ERCC controls. Based on multiple types of 'ground truth', we systematically assess the real-world RNA-seq performance and investigate the influencing factors involved in 26 experimental processes and 140 bioinformatics pipelines. Here we show greater inter-laboratory variations in detecting subtle differential expressions among the Quartet samples. Experimental factors including mRNA enrichment and strandedness, and each bioinformatics step, emerge as primary sources of variations in gene expression. We underscore the profound influence of experimental execution, and provide best practice recommendations for experimental designs, strategies for filtering low-expression genes, and the optimal gene annotation and analysis pipelines. In summary, this study lays the foundation for developing and quality control of RNA-seq for clinical diagnostic purposes.


Subject(s)
Benchmarking , Computational Biology , Quality Control , RNA-Seq , Reference Standards , Benchmarking/methods , Humans , RNA-Seq/methods , RNA-Seq/standards , Computational Biology/methods , Reproducibility of Results , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/standards , Gene Expression Profiling/methods , Gene Expression Profiling/standards , RNA, Messenger/genetics , RNA, Messenger/metabolism
7.
BMC Res Notes ; 17(1): 204, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39049055

ABSTRACT

OBJECTIVE: In 2004, after consuming angel-wing mushrooms, Pleurocybella porrigens, 59 incidents of food poisoning were reported in Japan. Consequently, 17 individuals died of acute encephalopathy. In 2023, we proved that a lectin, pleurocybelline, and pleurocybellaziridine from this mushroom caused damage to the brains of mice. Although we reported genomic and transcriptomic data of P. porrigens in 2013, the assembly quality of the transcriptomic data was inadequate for accurate functional annotation. Thus, we obtained detailed transcriptomic data on the fruiting bodies and mycelia of this mushroom using Illumina NovaSeq 6000. RESULTS: De novo assembly data indicated that the N50 lengths for the fruiting bodies and mycelia were improved compared with those previously reported. The differential expression analysis between the fruiting bodies and the mycelia revealed that 1,937 and 1,555 genes were significantly up-regulated in the fruiting bodies and the mycelia, respectively. The biological functions of P. porrigens transcripts, including PA biosynthetic pathways, were investigated using BLAST search, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes pathway analysis. The obtained results revealed L-valine, a predicted precursor of PA, is biosynthesized in the fruiting bodies and mycelia. Furthermore, real-time RT-PCR was performed to evaluate the accuracy of the results of differential expression analysis.


Subject(s)
Fruiting Bodies, Fungal , Mycelium , Fruiting Bodies, Fungal/genetics , Mycelium/genetics , Mice , Animals , Agaricales/genetics , Agaricales/metabolism , RNA-Seq/methods , Brain Diseases/genetics , Brain Diseases/metabolism , Transcriptome/genetics , Gene Expression Regulation, Fungal/drug effects , Mushroom Poisoning
8.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975891

ABSTRACT

Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Gene Expression Profiling/methods , Algorithms , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods
9.
Physiol Plant ; 176(4): e14418, 2024.
Article in English | MEDLINE | ID: mdl-39004808

ABSTRACT

Plant organelle transcription has been studied for decades. As techniques advanced, so did the fields of mitochondrial and plastid transcriptomics. The current view is that organelle genomes are pervasively transcribed, irrespective of their size, content, structure, and taxonomic origin. However, little is known about the nature of organelle noncoding transcriptomes, including pervasively transcribed noncoding RNAs (ncRNAs). Next-generation sequencing data have uncovered small ncRNAs in the organelles of plants and other organisms, but long ncRNAs remain poorly understood. Here, we argue that publicly available third-generation long-read RNA sequencing data from plants can provide a fine-tuned picture of long ncRNAs within organelles. Indeed, given their bloated architectures, plant mitochondrial genomes are well suited for studying pervasive transcription of ncRNAs. Ultimately, we hope to showcase this new avenue of plant research while also underlining the limitations of the proposed approach.


Subject(s)
RNA, Antisense , RNA, Long Noncoding , RNA, Plant , High-Throughput Nucleotide Sequencing/methods , Organelles/genetics , Organelles/metabolism , Plants/genetics , RNA, Antisense/genetics , RNA, Long Noncoding/genetics , RNA, Plant/genetics , RNA-Seq/methods , Sequence Analysis, RNA/methods , Transcriptome/genetics
10.
Nat Commun ; 15(1): 5941, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39009595

ABSTRACT

Recent development of RNA velocity uses master equations to establish the kinetics of the life cycle of RNAs from unspliced RNA to spliced RNA (i.e., mature RNA) to degradation. To feed this kinetic analysis, simultaneous measurement of unspliced RNA and spliced RNA in single cells is greatly desired. However, the majority of single-cell RNA-seq chemistry primarily captures mature RNA species to measure gene expressions. Here, we develop a one-step total-RNA chemistry-based single-cell RNA-seq method: snapTotal-seq. We benchmark this method with multiple single-cell RNA-seq assays in their performance in kinetic analysis of cell cycle by RNA velocity. Next, with LASSO regression between transcription factors, we identify the critical regulatory hubs mediating the cell cycle dynamics. We also apply snapTotal-seq to profile the oncogene-induced senescence and identify the key regulatory hubs governing the entry of senescence. Furthermore, from the comparative analysis of unspliced RNA and spliced RNA, we identify a significant portion of genes whose expression changes occur in spliced RNA but not to the same degree in unspliced RNA, indicating these gene expression changes are mainly controlled by post-transcriptional regulation. Overall, we demonstrate that snapTotal-seq can provide enriched information about gene regulation, especially during the transition between cell states.


Subject(s)
Cell Cycle , RNA , Single-Cell Analysis , Transcription Factors , Single-Cell Analysis/methods , Transcription Factors/metabolism , Transcription Factors/genetics , Humans , Cell Cycle/genetics , RNA/metabolism , RNA/genetics , RNA Splicing , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Gene Expression Regulation , Cellular Senescence/genetics , RNA-Seq/methods , Kinetics
11.
Int J Mol Sci ; 25(13)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-39000413

ABSTRACT

Our study aims to address the methodological challenges frequently encountered in RNA-Seq data analysis within cancer studies. Specifically, it enhances the identification of key genes involved in axillary lymph node metastasis (ALNM) in breast cancer. We employ Generalized Linear Models with Quasi-Likelihood (GLMQLs) to manage the inherently discrete and overdispersed nature of RNA-Seq data, marking a significant improvement over conventional methods such as the t-test, which assumes a normal distribution and equal variances across samples. We utilize the Trimmed Mean of M-values (TMMs) method for normalization to address library-specific compositional differences effectively. Our study focuses on a distinct cohort of 104 untreated patients from the TCGA Breast Invasive Carcinoma (BRCA) dataset to maintain an untainted genetic profile, thereby providing more accurate insights into the genetic underpinnings of lymph node metastasis. This strategic selection paves the way for developing early intervention strategies and targeted therapies. Our analysis is exclusively dedicated to protein-coding genes, enriched by the Magnitude Altitude Scoring (MAS) system, which rigorously identifies key genes that could serve as predictors in developing an ALNM predictive model. Our novel approach has pinpointed several genes significantly linked to ALNM in breast cancer, offering vital insights into the molecular dynamics of cancer development and metastasis. These genes, including ERBB2, CCNA1, FOXC2, LEFTY2, VTN, ACKR3, and PTGS2, are involved in key processes like apoptosis, epithelial-mesenchymal transition, angiogenesis, response to hypoxia, and KRAS signaling pathways, which are crucial for tumor virulence and the spread of metastases. Moreover, the approach has also emphasized the importance of the small proline-rich protein family (SPRR), including SPRR2B, SPRR2E, and SPRR2D, recognized for their significant involvement in cancer-related pathways and their potential as therapeutic targets. Important transcripts such as H3C10, H1-2, PADI4, and others have been highlighted as critical in modulating the chromatin structure and gene expression, fundamental for the progression and spread of cancer.


Subject(s)
Breast Neoplasms , Gene Expression Regulation, Neoplastic , Lymphatic Metastasis , Humans , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Lymphatic Metastasis/genetics , Female , RNA-Seq/methods , Gene Expression Profiling/methods , Lymph Nodes/pathology , Axilla , Biomarkers, Tumor/genetics , Sequence Analysis, RNA/methods
12.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38980373

ABSTRACT

Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.


Subject(s)
Deep Learning , Gene Regulatory Networks , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Transcription Factors/genetics , Transcription Factors/metabolism , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods
13.
Nat Commun ; 15(1): 5983, 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39013860

ABSTRACT

Single-cell sequencing is frequently affected by "omission" due to limitations in sequencing throughput, yet bulk RNA-seq may contain these ostensibly "omitted" cells. Here, we introduce the single cell trajectory blending from Bulk RNA-seq (BulkTrajBlend) algorithm, a component of the OmicVerse suite that leverages a Beta-Variational AutoEncoder for data deconvolution and graph neural networks for the discovery of overlapping communities. This approach effectively interpolates and restores the continuity of "omitted" cells within single-cell RNA sequencing datasets. Furthermore, OmicVerse provides an extensive toolkit for both bulk and single cell RNA-seq analysis, offering seamless access to diverse methodologies, streamlining computational processes, fostering exquisite data visualization, and facilitating the extraction of significant biological insights to advance scientific research.


Subject(s)
Algorithms , Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Sequence Analysis, RNA/methods , Computational Biology/methods , RNA-Seq/methods , Neural Networks, Computer , Software , High-Throughput Nucleotide Sequencing/methods
14.
BMC Genomics ; 25(1): 697, 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39014352

ABSTRACT

BACKGROUND: Real-time quantitative PCR (RT-qPCR) is one of the most widely used gene expression analyses for validating RNA-seq data. This technique requires reference genes that are stable and highly expressed, at least across the different biological conditions present in the transcriptome. Reference and variable candidate gene selection is often neglected, leading to misinterpretation of the results. RESULTS: We developed a software named "Gene Selector for Validation" (GSV), which identifies the best reference and variable candidate genes for validation within a quantitative transcriptome. This tool also filters the candidate genes concerning the RT-qPCR assay detection limit. GSV was compared with other software using synthetic datasets and performed better, removing stable low-expression genes from the reference candidate list and creating the variable-expression validation list. GSV software was used on a real case, an Aedes aegypti transcriptome. The top GSV reference candidate genes were selected for RT-qPCR analysis, confirming that eiF1A and eiF3j were the most stable genes tested. The tool also confirmed that traditional mosquito reference genes were less stable in the analyzed samples, highlighting the possibility of inappropriate choices. A meta-transcriptome dataset with more than ninety thousand genes was also processed successfully. CONCLUSION: The GSV tool is a time and cost-effective tool that can be used to select reference and validation candidate genes from the biological conditions present in transcriptomic data.


Subject(s)
Real-Time Polymerase Chain Reaction , Reference Standards , Software , Real-Time Polymerase Chain Reaction/methods , Real-Time Polymerase Chain Reaction/standards , Animals , RNA-Seq/methods , RNA-Seq/standards , Gene Expression Profiling/methods , Transcriptome
15.
PLoS Comput Biol ; 20(7): e1011620, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38976751

ABSTRACT

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.


Subject(s)
Computational Biology , Single-Cell Analysis , Computational Biology/methods , Single-Cell Analysis/methods , Single-Cell Analysis/statistics & numerical data , Humans , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/statistics & numerical data , Algorithms , Gene Regulatory Networks/genetics , Models, Statistical , Software , Single-Cell Gene Expression Analysis
16.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38960406

ABSTRACT

Spatial transcriptomics data play a crucial role in cancer research, providing a nuanced understanding of the spatial organization of gene expression within tumor tissues. Unraveling the spatial dynamics of gene expression can unveil key insights into tumor heterogeneity and aid in identifying potential therapeutic targets. However, in many large-scale cancer studies, spatial transcriptomics data are limited, with bulk RNA-seq and corresponding Whole Slide Image (WSI) data being more common (e.g. TCGA project). To address this gap, there is a critical need to develop methodologies that can estimate gene expression at near-cell (spot) level resolution from existing WSI and bulk RNA-seq data. This approach is essential for reanalyzing expansive cohort studies and uncovering novel biomarkers that have been overlooked in the initial assessments. In this study, we present STGAT (Spatial Transcriptomics Graph Attention Network), a novel approach leveraging Graph Attention Networks (GAT) to discern spatial dependencies among spots. Trained on spatial transcriptomics data, STGAT is designed to estimate gene expression profiles at spot-level resolution and predict whether each spot represents tumor or non-tumor tissue, especially in patient samples where only WSI and bulk RNA-seq data are available. Comprehensive tests on two breast cancer spatial transcriptomics datasets demonstrated that STGAT outperformed existing methods in accurately predicting gene expression. Further analyses using the TCGA breast cancer dataset revealed that gene expression estimated from tumor-only spots (predicted by STGAT) provides more accurate molecular signatures for breast cancer sub-type and tumor stage prediction, and also leading to improved patient survival and disease-free analysis. Availability: Code is available at https://github.com/compbiolabucf/STGAT.


Subject(s)
Gene Expression Profiling , RNA-Seq , Transcriptome , Humans , RNA-Seq/methods , Gene Expression Profiling/methods , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Gene Expression Regulation, Neoplastic , Computational Biology/methods , Female , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism
17.
Nat Commun ; 15(1): 5600, 2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38961061

ABSTRACT

ezSingleCell is an interactive and easy-to-use application for analysing various single-cell and spatial omics data types without requiring prior programing knowledge. It combines the best-performing publicly available methods for in-depth data analysis, integration, and interactive data visualization. ezSingleCell consists of five modules, each designed to be a comprehensive workflow for one data type or task. In addition, ezSingleCell allows crosstalk between different modules within a unified interface. Acceptable input data can be in a variety of formats while the output consists of publication ready figures and tables. In-depth manuals and video tutorials are available to guide users on the analysis workflows and parameter adjustments to suit their study aims. ezSingleCell's streamlined interface can analyse a standard scRNA-seq dataset of 3000 cells in less than five minutes. ezSingleCell is available in two forms: an installation-free web application ( https://immunesinglecell.org/ezsc/ ) or a software package with a shinyApp interface ( https://github.com/JinmiaoChenLab/ezSingleCell2 ) for offline analysis.


Subject(s)
Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Workflow , Computational Biology/methods , User-Computer Interface , RNA-Seq/methods
18.
Genome Biol ; 25(1): 169, 2024 07 01.
Article in English | MEDLINE | ID: mdl-38956606

ABSTRACT

BACKGROUND: Computational cell type deconvolution enables the estimation of cell type abundance from bulk tissues and is important for understanding tissue microenviroment, especially in tumor tissues. With rapid development of deconvolution methods, many benchmarking studies have been published aiming for a comprehensive evaluation for these methods. Benchmarking studies rely on cell-type resolved single-cell RNA-seq data to create simulated pseudobulk datasets by adding individual cells-types in controlled proportions. RESULTS: In our work, we show that the standard application of this approach, which uses randomly selected single cells, regardless of the intrinsic difference between them, generates synthetic bulk expression values that lack appropriate biological variance. We demonstrate why and how the current bulk simulation pipeline with random cells is unrealistic and propose a heterogeneous simulation strategy as a solution. The heterogeneously simulated bulk samples match up with the variance observed in real bulk datasets and therefore provide concrete benefits for benchmarking in several ways. We demonstrate that conceptual classes of deconvolution methods differ dramatically in their robustness to heterogeneity with reference-free methods performing particularly poorly. For regression-based methods, the heterogeneous simulation provides an explicit framework to disentangle the contributions of reference construction and regression methods to performance. Finally, we perform an extensive benchmark of diverse methods across eight different datasets and find BayesPrism and a hybrid MuSiC/CIBERSORTx approach to be the top performers. CONCLUSIONS: Our heterogeneous bulk simulation method and the entire benchmarking framework is implemented in a user friendly package https://github.com/humengying0907/deconvBenchmarking and https://doi.org/10.5281/zenodo.8206516 , enabling further developments in deconvolution methods.


Subject(s)
Benchmarking , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Computer Simulation , RNA-Seq/methods , Computational Biology/methods
19.
Sci Adv ; 10(27): eadj7402, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38959321

ABSTRACT

The study of the tumor microbiome has been garnering increased attention. We developed a computational pipeline (CSI-Microbes) for identifying microbial reads from single-cell RNA sequencing (scRNA-seq) data and for analyzing differential abundance of taxa. Using a series of controlled experiments and analyses, we performed the first systematic evaluation of the efficacy of recovering microbial unique molecular identifiers by multiple scRNA-seq technologies, which identified the newer 10x chemistries (3' v3 and 5') as the best suited approach. We analyzed patient esophageal and colorectal carcinomas and found that reads from distinct genera tend to co-occur in the same host cells, testifying to possible intracellular polymicrobial interactions. Microbial reads are disproportionately abundant within myeloid cells that up-regulate proinflammatory cytokines like IL1Β and CXCL8, while infected tumor cells up-regulate antigen processing and presentation pathways. These results show that myeloid cells with bacteria engulfed are a major source of bacterial RNA within the tumor microenvironment (TME) and may inflame the TME and influence immunotherapy response.


Subject(s)
Bacteria , RNA-Seq , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , RNA-Seq/methods , Bacteria/genetics , Tumor Microenvironment , Myeloid Cells/metabolism , Myeloid Cells/microbiology , Sequence Analysis, RNA/methods , Colorectal Neoplasms/microbiology , Colorectal Neoplasms/genetics , Computational Biology/methods , RNA, Bacterial/genetics , Esophageal Neoplasms/microbiology , Esophageal Neoplasms/genetics , Microbiota , Single-Cell Gene Expression Analysis
20.
Biomolecules ; 14(7)2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39062554

ABSTRACT

In studying the molecular underpinning of spermatogenesis, we expect to understand the fundamental biological processes better and potentially identify genes that may lead to novel diagnostic and therapeutic strategies toward precision medicine in male infertility. In this review, we emphasized our perspective that the path forward necessitates integrative studies that rely on complementary approaches and types of data. To comprehensively analyze spermatogenesis, this review proposes four axes of integration. First, spanning the analysis of spermatogenesis in the healthy state alongside pathologies. Second, the experimental analysis of model systems (in which we can deploy treatments and perturbations) alongside human data. Third, the phenotype is measured alongside its underlying molecular profiles using known markers augmented with unbiased profiles. Finally, the testicular cells are studied as ecosystems, analyzing the germ cells alongside the states observed in the supporting somatic cells. Recently, the study of spermatogenesis has been advancing using single-cell RNA sequencing, where scientists have uncovered the unique stages of germ cell development in mice, revealing new regulators of spermatogenesis and previously unknown cell subtypes in the testis. An in-depth analysis of meiotic and postmeiotic stages led to the discovery of marker genes for spermatogonia, Sertoli and Leydig cells and further elucidated all the other germline and somatic cells in the testis microenvironment in normal and pathogenic conditions. The outcome of an integrative analysis of spermatogenesis using advanced molecular profiling technologies such as scRNA-seq has already propelled our biological understanding, with additional studies expected to have clinical implications for the study of male fertility. By uncovering new genes and pathways involved in abnormal spermatogenesis, we may gain insights into subfertility or sterility.


Subject(s)
RNA-Seq , Single-Cell Analysis , Spermatogenesis , Spermatogenesis/genetics , Humans , Male , Animals , Single-Cell Analysis/methods , Mice , RNA-Seq/methods , Germ Cells/metabolism , Testis/metabolism , Infertility, Male/genetics , Single-Cell Gene Expression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL