RESUMEN
Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.
Asunto(s)
Epigénesis Genética , Genómica , Genómica/métodos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Regulación de la Expresión Génica , Análisis por ConglomeradosRESUMEN
Identification of genes and molecular pathways with congruent profiles in the proteomic and transcriptomic datasets may result in the discovery of promising transcriptomic biomarkers that would be more relevant to phenotypic changes. In this study, we conducted comparative analysis of 943 paired RNA and proteomic profiles obtained for the same samples of seven human cancer types from The Cancer Genome Atlas (TCGA) and NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) [two major open human cancer proteomic and transcriptomic databases] that included 15,112 protein-coding genes and 1611 molecular pathways. Overall, our findings demonstrated statistically significant improvement of the congruence between RNA and proteomic profiles when performing analysis at the level of molecular pathways rather than at the level of individual gene products. Transition to the molecular pathway level of data analysis increased the correlation to 0.19-0.57 (Pearson) and 0.14-057 (Spearman), or 2-3-fold for some cancer types. Evaluating the gain of the correlation upon transition to the data analysis the pathway level can be used to refine the omics data by identifying outliers that can be excluded from the comparison of RNA and proteomic profiles. We suggest using sample- and gene-wise correlations for individual genes and molecular pathways as a measure of quality of RNA/protein paired molecular data. We also provide a database of human genes, molecular pathways, and samples related to the correlation between RNA and protein products to facilitate an exploration of new cancer transcriptomic biomarkers and molecular mechanisms at different levels of human gene expression.
Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Proteómica/métodos , Transcriptoma , Bases de Datos Genéticas , ARN/metabolismo , ARN/genética , Perfilación de la Expresión Génica , Exactitud de los Datos , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Regulación Neoplásica de la Expresión GénicaRESUMEN
Lapatinib is a targeted therapeutic inhibiting HER2 and EGFR proteins. It is used for the therapy of HER2-positive breast cancer, although not all the patients respond to it. Using human blood serum samples from 14 female donors (separately taken or combined), we found that human blood serum dramatically abolishes the lapatinib-mediated inhibition of growth of the human breast squamous carcinoma SK-BR-3 cell line. This antagonism between lapatinib and human serum was associated with cancelation of the drug induced G1/S cell cycle transition arrest. RNA sequencing revealed 308 differentially expressed genes in the presence of lapatinib. Remarkably, when combined with lapatinib, human blood serum showed the capacity of restoring both the rate of cell growth, and the expression of 96.1% of the genes expression of which were altered by the lapatinib treatment alone. Co-administration of EGF with lapatinib also restores the cell growth and cancels alteration of expression of 95.8% of the genes specific to lapatinib treatment of SK-BR-3 cells. Differential gene expression analysis also showed that in the presence of human serum or EGF, lapatinib was unable to inhibit the Toll-Like Receptor signaling pathway and alter expression of genes linked to the Gene Ontology term of Focal adhesion.
Asunto(s)
Proliferación Celular , Receptores ErbB , Lapatinib , Receptor ErbB-2 , Humanos , Lapatinib/farmacología , Receptor ErbB-2/metabolismo , Receptores ErbB/metabolismo , Femenino , Línea Celular Tumoral , Proliferación Celular/efectos de los fármacos , Carcinoma de Células Escamosas/tratamiento farmacológico , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/patología , Antineoplásicos/farmacología , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Suero/metabolismoRESUMEN
Multiple sclerosis (MS) is an autoimmune disease of the central nervous system still lacking a cure. Treatment typically focuses on slowing the progression and managing MS symptoms. Single-cell transcriptomics allows the investigation of the immune system-the key player in MS onset and development-in great detail increasing our understanding of MS mechanisms and stimulating the discovery of the targets for potential therapies. Still, de novo drug development takes decades; however, this can be reduced by drug repositioning. A promising approach is to select potential drugs based on activated or inhibited genes and pathways. In this study, we explored the public single-cell RNA data from an experiment with six patients on single-cell RNA peripheral blood mononuclear cells (PBMC) and cerebrospinal fluid cells (CSF) of patients with MS and idiopathic intracranial hypertension. We demonstrate that AIM2 inflammasome, SMAD2/3 signaling, and complement activation pathways are activated in MS in different CSF and PBMC immune cells. Using genes from top-activated pathways, we detected several promising small molecules to reverse MS immune cells' transcriptomic signatures, including AG14361, FGIN-1-27, CA-074, ARP 101, Flunisolide, and JAK3 Inhibitor VI. Among these molecules, we also detected an FDA-approved MS drug Mitoxantrone, supporting the reliability of our approach.
Asunto(s)
Esclerosis Múltiple , Humanos , Esclerosis Múltiple/tratamiento farmacológico , Esclerosis Múltiple/genética , Reposicionamiento de Medicamentos , Leucocitos Mononucleares/metabolismo , Reproducibilidad de los Resultados , Análisis de Expresión Génica de una Sola Célula , ARN/metabolismoRESUMEN
Single-cell RNA-seq data contains a lot of dropouts hampering downstream analyses due to the low number and inefficient capture of mRNAs in individual cells. Here, we present Epi-Impute, a computational method for dropout imputation by reconciling expression and epigenomic data. Epi-Impute leverages single-cell ATAC-seq data as an additional source of information about gene activity to reduce the number of dropouts. We demonstrate that Epi-Impute outperforms existing methods, especially for very sparse single-cell RNA-seq data sets, significantly reducing imputation error. At the same time, Epi-Impute accurately captures the primary distribution of gene expression across cells while preserving the gene-gene and cell-cell relationship in the data. Moreover, Epi-Impute allows for the discovery of functionally relevant cell clusters as a result of the increased resolution of scRNA-seq data due to imputation.
Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Perfilación de la Expresión GénicaRESUMEN
In 2021, the fifth edition of the WHO classification of tumors of the central nervous system (WHO CNS5) was published. Molecular features of tumors were directly incorporated into the diagnostic decision tree, thus affecting both the typing and staging of the tumor. It has changed the traditional approach, based solely on histopathological classification. The Cancer Genome Atlas project (TCGA) is one of the main sources of molecular information about gliomas, including clinically annotated transcriptomic and genomic profiles. Although TCGA itself has played a pivotal role in developing the WHO CNS5 classification, its proprietary databases still retain outdated diagnoses which frequently appear incorrect and misleading according to the WHO CNS5 standards. We aimed to define the up-to-date annotations for gliomas from TCGA's database that other scientists can use in their research. Based on WHO CNS5 guidelines, we developed an algorithm for the reclassification of TCGA glioma samples by molecular features. We updated tumor type and diagnosis for 828 out of a total of 1122 TCGA glioma cases, after which available transcriptomic and methylation data showed clustering features more consistent with the updated grouping. We also observed better stratification by overall survival for the updated diagnoses, yet WHO grade 3 IDH-mutant oligodendrogliomas and astrocytomas are still indistinguishable. We also detected altered performance in the previous diagnostic transcriptomic molecular biomarkers (expression of SPRY1, CRNDE and FREM2 genes and FREM2 molecular pathway) and prognostic gene signature (FN1, ITGA5, OSMR, and NGFR) after reclassification. Thus, we conclude that further efforts are needed to reconsider glioma molecular biomarkers.
Asunto(s)
Neoplasias Encefálicas , Neoplasias del Sistema Nervioso Central , Glioma , Humanos , Neoplasias Encefálicas/metabolismo , Transcriptoma , Glioma/metabolismo , Neoplasias del Sistema Nervioso Central/genética , Genómica , Biomarcadores de Tumor/genética , Epigénesis Genética , Organización Mundial de la Salud , Mutación , Isocitrato Deshidrogenasa/genéticaRESUMEN
Previously, we have shown that the aggregation of RNA-level gene expression profiles into quantitative molecular pathway activation metrics results in lesser batch effects and better agreement between different experimental platforms. Here, we investigate whether pathway level of data analysis provides any advantage when comparing transcriptomic and proteomic data. We compare the paired proteomic and transcriptomic gene expression and pathway activation profiles obtained for the same human cancer biosamples in The Cancer Genome Atlas (TCGA) and the NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) projects, for a total of 755 samples of glioblastoma, breast, liver, lung, ovarian, pancreatic, and uterine cancers. In a CPTAC assay, expression levels of 15,112 protein-coding genes were profiled using the Thermo QE series of mass spectrometers. In TCGA, RNA expression levels of the same genes were obtained using the Illumina HiSeq 4000 engine for the same biosamples. At the gene level, absolute gene expression values are compared, whereas pathway-grade comparisons are made between the pathway activation levels (PALs) calculated using average sample-normalized transcriptomic and proteomic profiles. We observed remarkably different average correlations between the primary RNA- and protein expression data for different cancer types: Spearman Rho between 0.017 (p = 1.7 × 10−13) and 0.27 (p < 2.2 × 10−16). However, at the pathway level we detected overall statistically significantly higher correlations: averaged Rho between 0.022 (p < 2.2 × 10−16) and 0.56 (p < 2.2 × 10−16). Thus, we conclude that data analysis at the PAL-level yields results of a greater similarity when comparing high-throughput RNA and protein expression profiles.
Asunto(s)
Neoplasias , Transcriptoma , Perfilación de la Expresión Génica/métodos , Humanos , Espectrometría de Masas , Neoplasias/genética , Neoplasias/metabolismo , Proteómica , ARNRESUMEN
EGFR, BRAF, PIK3CA, and KRAS genes play major roles in EGFR pathway, and accommodate activating mutations that predict response to many targeted therapeutics. However, connections between these mutations and EGFR pathway expression patterns remain unexplored. Here, we investigated transcriptomic associations with these activating mutations in three ways. First, we compared expressions of these genes in the mutant and wild type tumors, respectively, using RNA sequencing profiles from The Cancer Genome Atlas project database (n = 3660). Second, mutations were associated with the activation level of EGFR pathway. Third, they were associated with the gene signatures of differentially expressed genes from these pathways between the mutant and wild type tumors. We found that the upregulated EGFR pathway was linked with mutations in the BRAF (thyroid cancer, melanoma) and PIK3CA (breast cancer) genes. Gene signatures were associated with BRAF (thyroid cancer, melanoma), EGFR (squamous cell lung cancer), KRAS (colorectal cancer), and PIK3CA (breast cancer) mutations. However, only for the BRAF gene signature in the thyroid cancer we observed strong biomarker diagnostic capacity with AUC > 0.7 (0.809). Next, we validated this signature on the independent literature-based dataset (n = 127, fresh-frozen tissue samples, AUC 0.912), and on the experimental dataset (n = 42, formalin fixed, paraffin embedded tissue samples, AUC 0.822). Our results suggest that the RNA sequencing profiles can be used for robust identification of the replacement of Valine at position 600 with Glutamic acid in the BRAF gene in the papillary subtype of thyroid cancer, and evidence that the specific gene expression levels could provide information about the driver carcinogenic mutations.
Asunto(s)
Neoplasias de la Mama , Neoplasias Pulmonares , Melanoma , Mutación , Proteínas de Neoplasias , Transducción de Señal/genética , Neoplasias de la Tiroides , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Receptores ErbB/genética , Receptores ErbB/metabolismo , Femenino , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patología , Masculino , Melanoma/genética , Melanoma/metabolismo , Melanoma/patología , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias de la Tiroides/genética , Neoplasias de la Tiroides/metabolismo , Neoplasias de la Tiroides/patologíaRESUMEN
Regardless of the presence or absence of specific diagnostic mutations, many cancer patients fail to respond to EGFR-targeted therapeutics, and a personalized approach is needed to identify putative (non)responders. We found previously that human peripheral blood and EGF can modulate the activities of EGFR-specific drugs on inhibiting clonogenity in model EGFR-positive A431 squamous carcinoma cells. Here, we report that human serum can dramatically abolish the cell growth rate inhibition by EGFR-specific drugs cetuximab and erlotinib. We show that this phenomenon is linked with derepression of drug-induced G1S cell cycle transition arrest. Furthermore, A431 cell growth inhibition by cetuximab, erlotinib, and EGF correlates with a decreased activity of ERK1/2 proteins. In turn, the EGF- and human serum-mediated rescue of drug-treated A431 cells restores ERK1/2 activity in functional tests. RNA sequencing revealed 1271 and 1566 differentially expressed genes (DEGs) in the presence of cetuximab and erlotinib, respectively. Erlotinib- and cetuximab-specific DEGs significantly overlapped. Interestingly, the expression of 100% and 75% of these DEGs restores to the no-drug level when EGF or a mixed human serum sample, respectively, is added along with cetuximab. In the case of erlotinib, EGF and human serum restore the expression of 39% and 83% of DEGs, respectively. We further assessed differential molecular pathway activation levels and propose that EGF/human serum-mediated A431 resistance to EGFR drugs can be largely explained by reactivation of the MAPK signaling cascade.
Asunto(s)
Carcinoma de Células Escamosas , Suero , Humanos , Cetuximab/farmacología , Cetuximab/uso terapéutico , Factor de Crecimiento Epidérmico/farmacología , Clorhidrato de Erlotinib/farmacología , Clorhidrato de Erlotinib/uso terapéutico , Carcinoma de Células Escamosas/tratamiento farmacológico , Ciclo Celular , Receptores ErbBRESUMEN
Trastuzumab, a HER2-targeted antibody, is widely used for targeted therapy of HER2-positive breast cancer (BC) patients; yet, not all of them respond to this treatment. We investigated here whether trastuzumab activity on the growth of HER2-overexpressing BT474 cells may interfere with human peripheral blood endogenous factors. Among 33 individual BC patient blood samples supplemented to the media, BT474 sensitivity to trastuzumab varied up to 14 times. In the absence of trastuzumab, human peripheral blood serum samples could inhibit growth of BT474, and this effect varied ~10 times for 50 individual samples. In turn, the epidermal growth factor (EGF) suppressed the trastuzumab effect on BT474 cell growth. Trastuzumab treatment increased the proportion of BT474 cells in the G0/G1 phases of cell cycle, while simultaneous addition of EGF decreased it, yet not to the control level. We used RNA sequencing profiling of gene expression to elucidate the molecular mechanisms involved in EGF- and human-sera-mediated attenuation of the trastuzumab effect on BT474 cell growth. Bioinformatic analysis of the molecular profiles suggested that trastuzumab acts similarly to the inhibition of PI3K/Akt/mTOR signaling axis, and the mechanism of EGF suppression of trastuzumab activity may be associated with parallel activation of PKC and transcriptional factors ETV1-ETV5.
RESUMEN
In this study, we report 31 spinal intramedullary astrocytoma (SIA) RNA sequencing (RNA-seq) profiles for 25 adult patients with documented clinical annotations. To our knowledge, this is the first clinically annotated RNA-seq dataset of spinal astrocytomas derived from the intradural intramedullary compartment. We compared these tumor profiles with the previous healthy central nervous system (CNS) RNA-seq data for spinal cord and brain and identified SIA-specific gene sets and molecular pathways. Our findings suggest a trend for SIA-upregulated pathways governing interactions with the immune cells and downregulated pathways for the neuronal functioning in the context of normal CNS activity. In two patient tumor biosamples, we identified diagnostic KIAA1549-BRAF fusion oncogenes, and we also found 16 new SIA-associated fusion transcripts. In addition, we bioinformatically simulated activities of targeted cancer drugs in SIA samples and predicted that several tyrosine kinase inhibitory drugs and thalidomide analogs could be potentially effective as second-line treatment agents to aid in the prevention of SIA recurrence and progression.
RESUMEN
OncoboxPD (Oncobox pathway databank) available at https://open.oncobox.com is the collection of 51 672 uniformly processed human molecular pathways. Superposition of all pathways formed interactome graph of protein-protein interactions and metabolic reactions containing 361 654 interactions and 64 095 molecular participants. Pathways are uniformly classified by biological processes, and each pathway node is algorithmically functionally annotated by specific activator/repressor role. This enables online calculation of statistically supported pathway activation levels (PALs) with the built-in bioinformatic tool using custom RNA/protein expression profiles. Each pathway can be visualized as static or dynamic graph, where vertices are molecules participating in a pathway and edges are interactions or reactions between them. Differentially expressed nodes in a pathway can be visualized in two-color mode with user-defined color scale. For every comparison, OncoboxPD also generates a graph summarizing top up- and downregulated pathways.
RESUMEN
Many patients fail to respond to EGFR-targeted therapeutics, and personalized diagnostics is needed to identify putative responders. We investigated 1630 colorectal and lung squamous carcinomas and 1357 normal lung and colon samples and observed huge variation in EGFR pathway activation in both cancerous and healthy tissues, irrespectively on EGFR gene mutation status. We investigated whether human blood serum can affect squamous carcinoma cell growth and EGFR drug response. We demonstrate that human serum antagonizes the effects of EGFR-targeted drugs erlotinib and cetuximab on A431 squamous carcinoma cells by increasing IC50 by about 2- and 20-fold, respectively. The effects on clonogenicity varied significantly across the individual serum samples in every experiment, with up to 100% differences. EGF concentration could explain many effects of blood serum samples, and EGFR ligands-depleted serum showed lesser effect on drug sensitivity.
RESUMEN
Glioblastoma is the most common and malignant brain malignancy worldwide, with a 10-year survival of only 0.7%. Aggressive multimodal treatment is not enough to increase life expectancy and provide good quality of life for glioblastoma patients. In addition, despite decades of research, there are no established biomarkers for early disease diagnosis and monitoring of patient response to treatment. High throughput sequencing technologies allow for the identification of unique molecules from large clinically annotated datasets. Thus, the aim of our study was to identify significant molecular changes between short- and long-term glioblastoma survivors by transcriptome RNA sequencing profiling, followed by differential pathway-activation-level analysis. We used data from the publicly available repositories The Cancer Genome Atlas (TCGA; number of annotated cases = 135) and Chinese Glioma Genome Atlas (CGGA; number of annotated cases = 218), and experimental clinically annotated glioblastoma tissue samples from the Institute of Pathology, Faculty of Medicine in Ljubljana corresponding to 2-58 months overall survival (n = 16). We found one differential gene for long noncoding RNA CRNDE whose overexpression showed correlation to poor patient OS. Moreover, we identified overlapping sets of congruently regulated differential genes involved in cell growth, division, and migration, structure and dynamics of extracellular matrix, DNA methylation, and regulation through noncoding RNAs. Gene ontology analysis can provide additional information about the function of protein- and nonprotein-coding genes of interest and the processes in which they are involved. In the future, this can shape the design of more targeted therapeutic approaches.
RESUMEN
Multiple myeloma (MM) affects ~500,000 people and results in ~100,000 deaths annually, being currently considered treatable but incurable. There are several MM chemotherapy treatment regimens, among which eleven include bortezomib, a proteasome-targeted drug. MM patients respond differently to bortezomib, and new prognostic biomarkers are needed to personalize treatments. However, there is a shortage of clinically annotated MM molecular data that could be used to establish novel molecular diagnostics. We report new RNA sequencing profiles for 53 MM patients annotated with responses on two similar chemotherapy regimens: bortezomib, doxorubicin, dexamethasone (PAD), and bortezomib, cyclophosphamide, dexamethasone (VCD), or with responses to their combinations. Fourteen patients received both PAD and VCD; six received only PAD, and 33 received only VCD. We compared profiles for the good and poor responders and found five genes commonly regulated here and in the previous datasets for other bortezomib regimens (all upregulated in the good responders): FGFR3, MAF, IGHA2, IGHV1-69, and GRB14. Four of these genes are linked with known immunoglobulin locus rearrangements. We then used five machine learning (ML) methods to build a classifier distinguishing good and poor responders for two cohorts: PAD + VCD (53 patients), and separately VCD (47 patients). We showed that the application of FloWPS dynamic data trimming was beneficial for all ML methods tested in both cohorts, and also in the previous MM bortezomib datasets. However, the ML models build for the different datasets did not allow cross-transferring, which can be due to different treatment regimens, experimental profiling methods, and MM heterogeneity.