ABSTRACT
PURPOSE: This study aimed to investigate the relationship between the interferon-gamma (IFN-γ) pathway in different tumor microenvironments (TME) and patients' prognosis, as well as the regulatory mechanisms of this pathway in tumor cells. METHODS: Using RNA-seq data from the TCGA database, we analyzed the predictive value of the IFN-γ pathway across various tumors. We employed a univariate Cox regression model to assess the prognostic significance of IFN-γ signaling in different tumor types. Additionally, we analyzed single-cell RNA sequencing (scRNA-seq) data from the Gene Expression Omnibus (GEO) database to examine the distribution characteristics of the IFN-γ pathway and explore its regulatory mechanisms, highlighting how IFN-γ influenced cellular interactions within the TME. RESULTS: Our analysis revealed a significant association between the IFN-γ pathway and adverse prognosis in pan-cancer tissues (P < 0.001). Interestingly, this correlation varied regarding positive and negative regulation across different tumor types. Through a detailed examination of scRNA-seq data, we found that the IFN-γ pathway exerted substantial regulatory effects on stromal and immune cells. In contrast, its expression and regulatory patterns in tumor cells exhibited diversity and heterogeneity. Further analysis indicated that the IFN-γ pathway not only enhanced the immunogenicity of tumor cells but also inhibited their proliferation. Cell-cell interaction analysis confirmed the pivotal role of the IFN-γ pathway within the overall regulatory network. Moreover, we identified HMGB2 (high mobility group box 2) in T cells as a potential key regulator of tumor cell proliferation. CONCLUSIONS: The IFN-γ pathway exhibited a dual function by both suppressing tumor cell proliferation and enhancing their immunogenicity, positioning it as a pivotal target for refined cancer diagnosis and cancer strategies.
ABSTRACT
Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
ABSTRACT
BACKGROUND: Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY: The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS: According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.
Subject(s)
Single-Cell Analysis , Animals , Humans , Algorithms , Gene Expression Profiling/methods , Gene Expression Profiling/standards , RNA-Seq/methods , RNA-Seq/standards , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Transcriptome , Datasets as TopicABSTRACT
Pathway analysis is an important step in the interpretation of single cell transcriptomic data, as it provides powerful information to detect which cellular processes are active in each individual cell. We have recently developed a protein-protein interaction network-based framework to quantify pluripotency associated pathways from scRNA-seq data. On this occasion, we extend this approach to quantify the activity of a pathway associated with any biological process, or even any list of genes. A systems-level characterization of pathway activities across multiple cell types provides a broadly applicable tool for the analysis of pathways in both healthy and disease conditions. Dysregulated cellular functions are a hallmark of a wide spectrum of human disorders, including cancer and autoimmune diseases. Here, we illustrate our method by analyzing various biological processes in healthy and cancer breast samples. Using this approach we found that tumor breast cells, even when they form a single group in the UMAP space, keep diverse biological programs active in a differentiated manner within the cluster.â¢We implement a protein-protein interaction network-based approach to quantify the activity of different biological processes.â¢The methodology can be used for cell annotation in scRNA-seq studies and is freely available as R package.
ABSTRACT
CTCF is an architectonic protein that organizes the genome inside the nucleus in almost all eukaryotic cells. There is evidence that CTCF plays a critical role during spermatogenesis as its depletion produces abnormal sperm and infertility. However, defects produced by its depletion throughout spermatogenesis have not been fully characterized. In this work, we performed single cell RNA sequencing in spermatogenic cells with and without CTCF. We uncovered defects in transcriptional programs that explain the severity of the damage in the produced sperm. In the early stages of spermatogenesis, transcriptional alterations are mild. As germ cells go through the specialization stage or spermiogenesis, transcriptional profiles become more altered. We found morphology defects in spermatids that support the alterations in their transcriptional profiles. Altogether, our study sheds light on the contribution of CTCF to the phenotype of male gametes and provides a fundamental description of its role at different stages of spermiogenesis.
ABSTRACT
Artificial intelligence is revolutionizing all fields that affect people's lives and health. One of the most critical applications is in the study of tumors. It is the case of glioblastoma (GBM) that has behaviors that need to be understood to develop effective therapies. Due to advances in single-cell RNA sequencing (scRNA-seq), it is possible to understand the cellular and molecular heterogeneity in the GBM. Given that there are different cell groups in these tumors, there is a need to apply Machine Learning (ML) algorithms. It will allow extracting information to understand how cancer changes and broaden the search for effective treatments. We proposed multiple comparisons of ML algorithms to classify cell groups based on the GBM scRNA-seq data. This broad comparison spectrum can show the scientific-medical community which models can achieve the best performance in this task. In this work are classified the following cell groups: Tumor Core (TC), Tumor Periphery (TP) and Normal Periphery (NP), in binary and multi-class scenarios. This work presents the biomarker candidates found for the models with the best results. The analyses presented here allow us to verify the biomarker candidates to understand the genetic characteristics of GBM, which may be affected by a suitable identification of GBM heterogeneity. This work obtained for the four scenarios covered cross-validation results of $93.03\% \pm 5.37\%$, $97.42\% \pm 3.94\%$, $98.27\% \pm 1.81\%$ and $93.04\% \pm 6.88\%$ for the classification of TP versus TC, TP versus NP, NP versus TP and TC (TPC) and NP versus TP versus TC, respectively.
Subject(s)
Glioblastoma , Humans , Glioblastoma/genetics , Glioblastoma/pathology , Artificial Intelligence , Biomarkers , Machine Learning , Sequence Analysis, RNA/methods , Single-Cell Analysis/methodsABSTRACT
BACKGROUND: Triple-negative breast cancer (TNBC) is a subtype of breast cancer with high tumoral heterogeneity, while the detailed regulatory network is not well known. METHODS: Via single-cell RNA-sequencing (scRNA-seq) data analysis, we comprehensively investigated the transcriptional profile of different subtypes of TNBC epithelial cells with gene regulatory network (GRN) and alternative splicing (AS) event analysis, as well as the crosstalk between epithelial and non-epithelial cells. RESULTS: Of note, we found that luminal progenitor subtype exhibited the most complex GRN and splicing events. Besides, hnRNPs negatively regulates AS events in luminal progenitor subtype. In addition, we explored the cellular crosstalk among endothelial cells, stromal cells and immune cells in TNBC and discovered that NOTCH4 was a key receptor and prognostic marker in endothelial cells, which provide potential biomarker and target for TNBC intervention. CONCLUSIONS: In summary, our study elaborates on the cellular heterogeneity of TNBC, revealing that NOTCH4 in endothelial cells was critical for TNBC intervention. This in-depth understanding of epithelial cell and non-epithelial cell network would provide theoretical basis for the development of new drugs targeting this sophisticated network in TNBC.
Subject(s)
Triple Negative Breast Neoplasms , Humans , Triple Negative Breast Neoplasms/genetics , Endothelial Cells , Alternative Splicing , Computational Biology , Sequence Analysis, RNAABSTRACT
Trajectory inference is a common application of scRNA-seq data. However, it is often necessary to previously determine the origin of the trajectories, the stem or progenitor cells. In this work, we propose a computational tool to quantify pluripotency from single cell transcriptomics data. This approach uses the protein-protein interaction (PPI) network associated with the differentiation process as a scaffold and the gene expression matrix to calculate a score that we call differentiation activity. This score reflects how active the differentiation network is in each cell. We benchmark the performance of our algorithm with two previously published tools, LandSCENT (Chen et al., 2019) and CytoTRACE (Gulati et al., 2020), for four healthy human data sets: breast, colon, hematopoietic and lung. We show that our algorithm is more efficient than LandSCENT and requires less RAM memory than the other programs. We also illustrate a complete workflow from the count matrix to trajectory inference using the breast data set.â¢ORIGINS is a methodology to quantify pluripotency from scRNA-seq data implemented as a freely available R package.â¢ORIGINS uses the protein-protein interaction network associated with differentiation and the data set expression matrix to calculate a score (differentiation activity) that quantifies pluripotency for each cell.
ABSTRACT
Dixenic parasites often encounter environmental extremes during the transition from vector to host. Preadapted transmission stages overcome these challenges to promote parasites' survival and ensure life cycle progression. Recently, Vigneron et al. and Briggs et al. used single-cell transcriptomics to investigate developmental stage specific gene expression patterns during parasite differentiation.
Subject(s)
Parasites , Trypanosoma brucei brucei , Animals , Life Cycle Stages/genetics , Parasites/genetics , Transcriptome , Trypanosoma brucei brucei/geneticsABSTRACT
The mammary gland is a highly dynamic organ which undergoes periods of expansion, differentiation and cell death in each reproductive cycle. Partly because of the dynamic nature of the gland, mammary epithelial cells (MECs) are extraordinarily heterogeneous. Single cell RNA-seq (scRNA-seq) analyses have contributed to understand the cellular and transcriptional heterogeneity of this complex tissue. Here, we integrate scRNA-seq data from three foundational reports that have explored the mammary gland cell populations throughout development at single-cell level using 10× Chromium Drop-Seq. We center our analysis on post-natal development of the mammary gland, from puberty to post-involution. The new integrated study corresponds to RNA sequences from 53,686 individual cells, which greatly outnumbers the three initial data sets. The large volume of information provides new insights, as a better resolution of the previously detected Procr+ stem-like cell subpopulation or the identification of a novel group of MECs expressing immune-like markers. Moreover, here we present new pseudo-temporal trajectories of MEC populations at two resolution levels, that is either considering all mammary cell subtypes or focusing specifically on the luminal lineages. Interestingly, the luminal-restricted analysis reveals distinct expression patterns of various genes that encode milk proteins, suggesting specific and non-redundant roles for each of them. In summary, our data show that the application of bioinformatic tools to integrate multiple scRNA-seq data-sets helps to describe and interpret the high level of plasticity involved in gene expression regulation throughout mammary gland post-natal development.