Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
Add more filters

Publication year range
1.
Proc Natl Acad Sci U S A ; 120(6): e2217868120, 2023 02 07.
Article in English | MEDLINE | ID: mdl-36719923

ABSTRACT

Single-cell RNA sequencing combined with genome-scale metabolic models (GEMs) has the potential to unravel the differences in metabolism across both cell types and cell states but requires new computational methods. Here, we present a method for generating cell-type-specific genome-scale models from clusters of single-cell RNA-Seq profiles. Specifically, we developed a method to estimate the minimum number of cells required to pool to obtain stable models, a bootstrapping strategy for estimating statistical inference, and a faster version of the task-driven integrative network inference for tissues algorithm for generating context-specific GEMs. In addition, we evaluated the effect of different RNA-Seq normalization methods on model topology and differences in models generated from single-cell and bulk RNA-Seq data. We applied our methods on data from mouse cortex neurons and cells from the tumor microenvironment of lung cancer and in both cases found that almost every cell subtype had a unique metabolic profile. In addition, our approach was able to detect cancer-associated metabolic differences between cancer cells and healthy cells, showcasing its utility. We also contextualized models from 202 single-cell clusters across 19 human organs using data from Human Protein Atlas and made these available in the web portal Metabolic Atlas, thereby providing a valuable resource to the scientific community. With the ever-increasing availability of single-cell RNA-Seq datasets and continuously improved GEMs, their combination holds promise to become an important approach in the study of human metabolism.


Subject(s)
Gene Expression Profiling , Single-Cell Gene Expression Analysis , Animals , Mice , Humans , Gene Expression Profiling/methods , Algorithms , RNA-Seq , Genome/genetics , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods
2.
Brief Bioinform ; 24(5)2023 09 20.
Article in English | MEDLINE | ID: mdl-37587790

ABSTRACT

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.


Subject(s)
Algorithms , DNA Methylation , Humans , Neural Networks, Computer , Epigenesis, Genetic , Risk Factors
3.
Mol Syst Biol ; 17(9): e10105, 2021 09.
Article in English | MEDLINE | ID: mdl-34528760

ABSTRACT

Tumor cell heterogeneity is a crucial characteristic of malignant brain tumors and underpins phenomena such as therapy resistance and tumor recurrence. Advances in single-cell analysis have enabled the delineation of distinct cellular states of brain tumor cells, but the time-dependent changes in such states remain poorly understood. Here, we construct quantitative models of the time-dependent transcriptional variation of patient-derived glioblastoma (GBM) cells. We build the models by sampling and profiling barcoded GBM cells and their progeny over the course of 3 weeks and by fitting a mathematical model to estimate changes in GBM cell states and their growth rates. Our model suggests a hierarchical yet plastic organization of GBM, where the rates and patterns of cell state switching are partly patient-specific. Therapeutic interventions produce complex dynamic effects, including inhibition of specific states and altered differentiation. Our method provides a general strategy to uncover time-dependent changes in cancer cells and offers a way to evaluate and predict how therapy affects cell state composition.


Subject(s)
Brain Neoplasms , Glioblastoma , Brain Neoplasms/genetics , Cell Line, Tumor , Glioblastoma/genetics , Humans , Neoplasm Recurrence, Local , Single-Cell Analysis
4.
PLoS Comput Biol ; 13(6): e1005608, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28640810

ABSTRACT

Recent technological advancements have made time-resolved, quantitative, multi-omics data available for many model systems, which could be integrated for systems pharmacokinetic use. Here, we present large-scale simulation modeling (LASSIM), which is a novel mathematical tool for performing large-scale inference using mechanistically defined ordinary differential equations (ODE) for gene regulatory networks (GRNs). LASSIM integrates structural knowledge about regulatory interactions and non-linear equations with multiple steady state and dynamic response expression datasets. The rationale behind LASSIM is that biological GRNs can be simplified using a limited subset of core genes that are assumed to regulate all other gene transcription events in the network. The LASSIM method is implemented as a general-purpose toolbox using the PyGMO Python package to make the most of multicore computers and high performance clusters, and is available at https://gitlab.com/Gustafsson-lab/lassim. As a method, LASSIM works in two steps, where it first infers a non-linear ODE system of the pre-specified core gene expression. Second, LASSIM in parallel optimizes the parameters that model the regulation of peripheral genes by core system genes. We showed the usefulness of this method by applying LASSIM to infer a large-scale non-linear model of naïve Th2 cell differentiation, made possible by integrating Th2 specific bindings, time-series together with six public and six novel siRNA-mediated knock-down experiments. ChIP-seq showed significant overlap for all tested transcription factors. Next, we performed novel time-series measurements of total T-cells during differentiation towards Th2 and verified that our LASSIM model could monitor those data significantly better than comparable models that used the same Th2 bindings. In summary, the LASSIM toolbox opens the door to a new type of model-based data analysis that combines the strengths of reliable mechanistic models with truly systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.


Subject(s)
Chromosome Mapping/methods , Models, Genetic , Proteome/metabolism , Signal Transduction/physiology , Software , Th2 Cells/metabolism , Algorithms , Cell Differentiation/physiology , Cells, Cultured , Computer Simulation , Gene Expression Regulation, Developmental/physiology , Humans , Programming Languages
5.
Acta Oncol ; 57(10): 1352-1358, 2018 Oct.
Article in English | MEDLINE | ID: mdl-29733238

ABSTRACT

PURPOSE: To find out what organs and doses are most relevant for 'radiation-induced urgency syndrome' in order to derive the corresponding dose-response relationships as an aid for avoiding the syndrome in the future. MATERIAL AND METHODS: From a larger group of gynecological cancer survivors followed-up 2-14 years, we identified 98 whom had undergone external beam radiation therapy but not brachytherapy and not having a stoma. Of those survivors, 24 developed urgency syndrome. Based on the loading factor from a factor analysis, and symptom frequency, 15 symptoms were weighted together to a score interpreted as the intensity of radiation-induced urgency symptom. On reactivated dose plans, we contoured the small intestine, sigmoid colon and the rectum (separate from the anal-sphincter region) and we exported the dose-volume histograms for each survivor. Dose-response relationships from respective risk organ and urgency syndrome were estimated by fitting the data to the Probit, RS, LKB and gEUD models. RESULTS: The rectum and sigmoid colon have steep dose-response relationships for urgency syndrome for Probit, RS and LKB. The dose-response parameters for the rectum were D50: 51.3, 51.4, and 51.3 Gy, γ50 = 1.19 for all models, s was 7.0e-09 for RS and n was 9.9 × 107 for LKB. For Sigmoid colon, D50 were 51.6, 51.6, and 51.5 Gy, γ50 were 1.20, 1.25, and 1.27, s was 2.8 for RS and n was 0.079 for LKB. CONCLUSIONS: Primarily the dose to sigmoid colon as well as the rectum is related to urgency syndrome among gynecological cancer survivors. Separate delineation of the rectum and sigmoid colon in order to incorporate the dose-response results may aid in reduction of the incidence of the urgency syndrome.


Subject(s)
Colon, Sigmoid/radiation effects , Genital Neoplasms, Female/radiotherapy , Radiation Injuries/etiology , Rectum/radiation effects , Aged , Dose-Response Relationship, Radiation , Female , Humans , Intestine, Small/radiation effects , Middle Aged , Organs at Risk , Radiotherapy Dosage
6.
Acta Oncol ; 56(5): 682-691, 2017 May.
Article in English | MEDLINE | ID: mdl-28366105

ABSTRACT

BACKGROUND: It is unknown whether smoking; age at time of radiotherapy or time since radiotherapy influence the intensity of late radiation-induced bowel syndromes. MATERIAL AND METHODS: We have previously identified 28 symptoms decreasing bowel health among 623 gynecological-cancer survivors (three to twelve years after radiotherapy) and 344 matched population-based controls. The 28 symptoms were grouped into five separate late bowel syndromes through factor analysis. Here, we related possible predictors of bowel health to syndrome intensity, by combining factor analysis weights and symptom frequency on a person-incidence scale. RESULTS: A strong (p < .001) association between smoking and radiation-induced urgency syndrome was found with a syndrome intensity (normalized factor score) of 0.4 (never smoker), 1.2 (former smoker) and 2.5 (current smoker). Excessive gas discharge was also related to smoking (p = .001). Younger age at treatment resulted in a higher intensity, except for the leakage syndrome. For the urgency syndrome, intensity decreased with time since treatment. CONCLUSIONS: Smoking aggravates the radiation-induced urgency syndrome and excessive gas discharge syndrome. Smoking cessation may promote bowel health among gynecological-cancer survivors. Furthermore, by understanding the mechanism for the decline in urgency-syndrome intensity over time, we may identify new strategies for prevention and alleviation.


Subject(s)
Cancer Survivors , Genital Neoplasms, Female/radiotherapy , Intestines/radiation effects , Irritable Bowel Syndrome/etiology , Radiation Injuries/etiology , Radiotherapy/adverse effects , Tobacco Smoking/adverse effects , Adolescent , Adult , Age Factors , Aged , Aged, 80 and over , Case-Control Studies , Female , Follow-Up Studies , Humans , Intestines/pathology , Male , Middle Aged , Prognosis , Young Adult
7.
Nucleic Acids Res ; 43(15): e98, 2015 Sep 03.
Article in English | MEDLINE | ID: mdl-25953855

ABSTRACT

Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets.


Subject(s)
Models, Genetic , Models, Statistical , Neoplasms/genetics , Antineoplastic Agents/pharmacology , Chromosome Deletion , Chromosomes, Human, Pair 11 , DNA Copy Number Variations , DNA Methylation , Genomics/methods , Glioma/genetics , Humans , Internet , Isocitrate Dehydrogenase/genetics , Kaplan-Meier Estimate , MicroRNAs/metabolism , Mutation , Neoplasms/mortality , RNA, Messenger/metabolism , Software
8.
PLoS Genet ; 10(1): e1004059, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24391521

ABSTRACT

Altered DNA methylation patterns in CD4(+) T-cells indicate the importance of epigenetic mechanisms in inflammatory diseases. However, the identification of these alterations is complicated by the heterogeneity of most inflammatory diseases. Seasonal allergic rhinitis (SAR) is an optimal disease model for the study of DNA methylation because of its well-defined phenotype and etiology. We generated genome-wide DNA methylation (N(patients) = 8, N(controls) = 8) and gene expression (N(patients) = 9, Ncontrols = 10) profiles of CD4(+) T-cells from SAR patients and healthy controls using Illumina's HumanMethylation450 and HT-12 microarrays, respectively. DNA methylation profiles clearly and robustly distinguished SAR patients from controls, during and outside the pollen season. In agreement with previously published studies, gene expression profiles of the same samples failed to separate patients and controls. Separation by methylation (N(patients) = 12, N(controls) = 12), but not by gene expression (N(patients) = 21, N(controls) = 21) was also observed in an in vitro model system in which purified PBMCs from patients and healthy controls were challenged with allergen. We observed changes in the proportions of memory T-cell populations between patients (N(patients) = 35) and controls (N(controls) = 12), which could explain the observed difference in DNA methylation. Our data highlight the potential of epigenomics in the stratification of immune disease and represents the first successful molecular classification of SAR using CD4(+) T cells.


Subject(s)
CD4-Positive T-Lymphocytes/metabolism , DNA Methylation/genetics , Epigenesis, Genetic , Rhinitis, Allergic, Seasonal/genetics , Adult , Allergens/genetics , Allergens/immunology , CD4-Positive T-Lymphocytes/immunology , Gene Expression , Genome, Human , Humans , Pathology, Molecular , Pollen/immunology , Rhinitis, Allergic, Seasonal/immunology , Rhinitis, Allergic, Seasonal/pathology
9.
Biostatistics ; 13(4): 748-61, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22699861

ABSTRACT

With the growing availability of omics data generated to describe different cells and tissues, the modeling and interpretation of such data has become increasingly important. Pathways are sets of reactions involving genes, metabolites, and proteins highlighting functional modules in the cell. Therefore, to discover activated or perturbed pathways when comparing two conditions, for example two different tissues, it is beneficial to use several types of omics data. We present a model that integrates transcriptomic and metabolomic data in order to make an informed pathway-level decision. Since metabolites can be seen as end-points of perturbations happening at the gene level, the gene expression data constitute the explanatory variables in a sparse regression model for the metabolite data. Sophisticated model selection procedures are developed to determine an appropriate model. We demonstrate that the transcript profiles can be used to informatively explain the metabolite data from cancer cell lines. Simulation studies further show that the proposed model offers a better performance in identifying active pathways than, for example, enrichment methods performed separately on the transcript and metabolite data.


Subject(s)
Data Interpretation, Statistical , Metabolomics , Models, Biological , Transcriptome , Computer Simulation , Models, Genetic
10.
Mol Syst Biol ; 7: 486, 2011 Apr 26.
Article in English | MEDLINE | ID: mdl-21525872

ABSTRACT

DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.


Subject(s)
Gene Dosage , Glioblastoma/genetics , Nerve Tissue Proteins/metabolism , Nervous System Neoplasms/genetics , Nuclear Proteins/metabolism , Transcriptional Activation/genetics , Tumor Suppressor Protein p53/metabolism , Cell Line, Tumor , Chromosome Aberrations , Databases, Factual , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Genome, Human , Genome-Wide Association Study , Glioblastoma/metabolism , Glioblastoma/mortality , Glioblastoma/pathology , Humans , Models, Genetic , Nerve Tissue Proteins/genetics , Nervous System Neoplasms/metabolism , Nervous System Neoplasms/mortality , Nervous System Neoplasms/pathology , Nuclear Proteins/genetics , Prognosis , Software , Tumor Suppressor Protein p53/genetics
11.
Adv Exp Med Biol ; 736: 617-43, 2012.
Article in English | MEDLINE | ID: mdl-22161356

ABSTRACT

One of the central problems of cancer systems biology is to understand the complex molecular changes of cancerous cells and tissues, and use this understanding to support the development of new targeted therapies. EPoC (Endogenous Perturbation analysis of Cancer) is a network modeling technique for tumor molecular profiles. EPoC models are constructed from combined copy number aberration (CNA) and mRNA data and aim to (1) identify genes whose copy number aberrations significantly affect target mRNA expression and (2) generate markers for long- and short-term survival of cancer patients. Models are constructed by a combination of regression and bootstrapping methods. Prognostic scores are obtained from a singular value decomposition of the networks. We have previously analyzed the performance of EPoC using glioblastoma data from The Cancer Genome Atlas (TCGA) consortium, and have shown that resulting network models contain both known and candidate disease-relevant genes as network hubs, as well as uncover predictors of patient survival. Here, we give a practical guide how to perform EPoC modeling in practice using R, and present a set of alternative modeling frameworks.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks/genetics , Models, Genetic , Neoplasms/genetics , Systems Biology/methods , Algorithms , Computational Biology/classification , Gene Dosage , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks/drug effects , Genetic Predisposition to Disease/genetics , Glioblastoma/drug therapy , Glioblastoma/genetics , Humans , Neoplasms/drug therapy , Prognosis , Reproducibility of Results , Survival Analysis
12.
Cancer Cell Int ; 11: 9, 2011 Apr 14.
Article in English | MEDLINE | ID: mdl-21492432

ABSTRACT

BACKGROUND: There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. RESULTS: The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). CONCLUSIONS: Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics.

13.
PLoS One ; 16(4): e0250004, 2021.
Article in English | MEDLINE | ID: mdl-33861779

ABSTRACT

BACKGROUND: The study aims to determine possible dose-volume response relationships between the rectum, sigmoid colon and small intestine and the 'excessive mucus discharge' syndrome after pelvic radiotherapy for gynaecological cancer. METHODS AND MATERIALS: From a larger cohort, 98 gynaecological cancer survivors were included in this study. These survivors, who were followed for 2 to 14 years, received external beam radiation therapy but not brachytherapy and not did not have stoma. Thirteen of the 98 developed excessive mucus discharge syndrome. Three self-assessed symptoms were weighted together to produce a score interpreted as 'excessive mucus discharge' syndrome based on the factor loadings from factor analysis. The dose-volume histograms (DVHs) for rectum, sigmoid colon, small intestine for each survivor were exported from the treatment planning systems. The dose-volume response relationships for excessive mucus discharge and each organ at risk were estimated by fitting the data to the Probit, RS, LKB and gEUD models. RESULTS: The small intestine was found to have steep dose-response curves, having estimated dose-response parameters: γ50: 1.28, 1.23, 1.32, D50: 61.6, 63.1, 60.2 for Probit, RS and LKB respectively. The sigmoid colon (AUC: 0.68) and the small intestine (AUC: 0.65) had the highest AUC values. For the small intestine, the DVHs for survivors with and without excessive mucus discharge were well separated for low to intermediate doses; this was not true for the sigmoid colon. Based on all results, we interpret the results for the small intestine to reflect a relevant link. CONCLUSION: An association was found between the mean dose to the small intestine and the occurrence of 'excessive mucus discharge'. When trying to reduce and even eliminate the incidence of 'excessive mucus discharge', it would be useful and important to separately delineate the small intestine and implement the dose-response estimations reported in the study.


Subject(s)
Colon, Sigmoid/metabolism , Genital Neoplasms, Female/radiotherapy , Intestine, Small/metabolism , Mucus/metabolism , Rectum/metabolism , Aged , Area Under Curve , Colon, Sigmoid/radiation effects , Dose-Response Relationship, Radiation , Female , Humans , Intestine, Small/radiation effects , Middle Aged , Organs at Risk , ROC Curve , Radiation, Ionizing , Radiotherapy Dosage , Rectum/radiation effects
14.
Cancer Med ; 9(10): 3551-3562, 2020 05.
Article in English | MEDLINE | ID: mdl-32207233

ABSTRACT

BACKGROUND: Characterizing breast cancer progression and aggressiveness relies on categorical descriptions of tumor stage and grade. Interpreting these categorical descriptions is challenging because stage convolutes the size and spread of the tumor and no consensus exists to define high/low grade tumors. METHODS: We address this challenge of heterogeneity in patient-specific cancer samples by adapting and applying several tools originally created for understanding heterogeneity and phenotype development in single cells (specifically, single-cell topological data analysis and Wanderlust) to create a continuous metric describing breast cancer progression using bulk RNA-seq samples from individual patient tumors. We also created a linear regression-based method to predict tumor aggressiveness in vivo from bulk RNA-seq data. RESULTS: We found that breast cancer proceeds along three convergent phenotype trajectories: luminal, HER2-enriched, and basal-like. Furthermore, 31 296 genes (for luminal cancers), 17 827 genes (for HER2-enriched), and 18 505 genes (for basal-like) are dynamically differentially expressed during breast cancer progression. Across progression trajectories, our results show that expression of genes related to ADP-ribosylation decreased as tumors progressed (while PARP1 and PARP2 increased or remained stable), suggesting the potential for a differential response to PARP inhibitors based on cancer progression. Additionally, we developed a 132-gene expression regression equation to predict mitotic index and a 23-gene expression regression equation to predict growth rate from a single breast cancer biopsy. CONCLUSION: Our results suggest that breast cancer dynamically changes during disease progression, and growth rate of the cancer cells is associated with distinct transcriptional profiles.


Subject(s)
Breast Neoplasms/genetics , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Breast Neoplasms/pathology , Databases, Genetic , Disease Progression , Female , Humans , Mitotic Index , Phenotype , Poly (ADP-Ribose) Polymerase-1/genetics , Poly(ADP-ribose) Polymerase Inhibitors , Poly(ADP-ribose) Polymerases/genetics , Prognosis , RNA-Seq , Transcriptome
15.
PLoS One ; 15(9): e0239495, 2020.
Article in English | MEDLINE | ID: mdl-32956417

ABSTRACT

Cell-type specific gene expression profiles are needed for many computational methods operating on bulk RNA-Seq samples, such as deconvolution of cell-type fractions and digital cytometry. However, the gene expression profile of a cell type can vary substantially due to both technical factors and biological differences in cell state and surroundings, reducing the efficacy of such methods. Here, we investigated which factors contribute most to this variation. We evaluated different normalization methods, quantified the variance explained by different factors, evaluated the effect on deconvolution of cell type fractions, and examined the differences between UMI-based single-cell RNA-Seq and bulk RNA-Seq. We investigated a collection of publicly available bulk and single-cell RNA-Seq datasets containing B and T cells, and found that the technical variation across laboratories is substantial, even for genes specifically selected for deconvolution, and this variation has a confounding effect on deconvolution. Tissue of origin is also a substantial factor, highlighting the challenge of using cell type profiles derived from blood with mixtures from other tissues. We also show that much of the differences between UMI-based single-cell and bulk RNA-Seq methods can be explained by the number of read duplicates per mRNA molecule in the single-cell sample. Our work shows the importance of either matching or correcting for technical factors when creating cell-type specific gene expression profiles that are to be used together with bulk samples.


Subject(s)
B-Lymphocytes/chemistry , Sequence Analysis, RNA , T-Lymphocytes/chemistry , Transcriptome , Adult , Base Composition , Datasets as Topic , Fetal Blood/cytology , Humans , Infant, Newborn , Principal Component Analysis , Single-Cell Analysis , Specimen Handling
16.
PLoS One ; 15(12): e0243360, 2020.
Article in English | MEDLINE | ID: mdl-33270740

ABSTRACT

Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.


Subject(s)
Algorithms , Databases, Nucleic Acid , RNA-Seq , Single-Cell Analysis , Software , Transcriptome
17.
Nat Commun ; 11(1): 71, 2020 01 03.
Article in English | MEDLINE | ID: mdl-31900415

ABSTRACT

Despite advances in the molecular exploration of paediatric cancers, approximately 50% of children with high-risk neuroblastoma lack effective treatment. To identify therapeutic options for this group of high-risk patients, we combine predictive data mining with experimental evaluation in patient-derived xenograft cells. Our proposed algorithm, TargetTranslator, integrates data from tumour biobanks, pharmacological databases, and cellular networks to predict how targeted interventions affect mRNA signatures associated with high patient risk or disease processes. We find more than 80 targets to be associated with neuroblastoma risk and differentiation signatures. Selected targets are evaluated in cell lines derived from high-risk patients to demonstrate reversal of risk signatures and malignant phenotypes. Using neuroblastoma xenograft models, we establish CNR2 and MAPK8 as promising candidates for the treatment of high-risk neuroblastoma. We expect that our method, available as a public tool (targettranslator.org), will enhance and expedite the discovery of risk-associated targets for paediatric and adult cancers.


Subject(s)
Antineoplastic Agents/administration & dosage , Neuroblastoma/drug therapy , Neuroblastoma/genetics , Animals , Cell Line, Tumor , Drug Evaluation, Preclinical , Female , Humans , Male , Mice , Mice, Nude , Mitogen-Activated Protein Kinase 8/antagonists & inhibitors , Mitogen-Activated Protein Kinase 8/genetics , Mitogen-Activated Protein Kinase 8/metabolism , Neuroblastoma/metabolism , Receptor, Cannabinoid, CB2/antagonists & inhibitors , Receptor, Cannabinoid, CB2/genetics , Receptor, Cannabinoid, CB2/metabolism , Xenograft Model Antitumor Assays , Zebrafish
18.
Biostatistics ; 9(3): 540-54, 2008 Jul.
Article in English | MEDLINE | ID: mdl-18256042

ABSTRACT

Model-based clustering is a popular tool for summarizing high-dimensional data. With the number of high-throughput large-scale gene expression studies still on the rise, the need for effective data- summarizing tools has never been greater. By grouping genes according to a common experimental expression profile, we may gain new insight into the biological pathways that steer biological processes of interest. Clustering of gene profiles can also assist in assigning functions to genes that have not yet been functionally annotated. In this paper, we propose 2 model selection procedures for model-based clustering. Model selection in model-based clustering has to date focused on the identification of data dimensions that are relevant for clustering. However, in more complex data structures, with multiple experimental factors, such an approach does not provide easily interpreted clustering outcomes. We propose a mixture model with multiple levels, , that provides sparse representations both "within" and "between" cluster profiles. We explore various flexible "within-cluster" parameterizations and discuss how efficient parameterizations can greatly enhance the objective interpretability of the generated clusters. Moreover, we allow for a sparse "between-cluster" representation with a different number of clusters at different levels of an experimental factor of interest. This enhances interpretability of clusters generated in multiple-factor contexts. Interpretable cluster profiles can assist in detecting biologically relevant groups of genes that may be missed with less efficient parameterizations. We use our multilevel mixture model to mine a proliferating cell line expression data set for annotational context and regulatory motifs. We also investigate the performance of the multilevel clustering approach on several simulated data sets.


Subject(s)
Cluster Analysis , Gene Expression Profiling/methods , Models, Genetic , Models, Statistical , Principal Component Analysis , Animals , Biomarkers/analysis , Cell Differentiation , Clone Cells , Computer Simulation , Data Compression/methods , Data Compression/statistics & numerical data , Databases, Genetic , Decision Theory , Factor Analysis, Statistical , Gene Expression , Gene Expression Profiling/statistics & numerical data , Neuroglia/physiology , Neurons/physiology , Oligonucleotide Array Sequence Analysis/methods , Rats , Research Design , Stem Cells/physiology , Weights and Measures
19.
Genome Med ; 12(1): 4, 2019 12 31.
Article in English | MEDLINE | ID: mdl-31892363

ABSTRACT

Personalized medicine requires the integration and processing of vast amounts of data. Here, we propose a solution to this challenge that is based on constructing Digital Twins. These are high-resolution models of individual patients that are computationally treated with thousands of drugs to find the drug that is optimal for the patient.


Subject(s)
Precision Medicine , Databases, Factual , Disease/genetics , Humans , Neural Networks, Computer
20.
Mol Biol Cell ; 16(11): 5103-14, 2005 Nov.
Article in English | MEDLINE | ID: mdl-16120643

ABSTRACT

Temporal and spatial assembly of signal transduction machinery determines dendrite branch patterning, a process crucial for proper synaptic transmission. Our laboratory previously cloned and characterized cypin, a protein that decreases PSD-95 family member localization and regulates dendrite number. Cypin contains zinc binding, collapsin response mediator protein (CRMP) homology, and PSD-95, Discs large, zona occludens-1 binding domains. Both the zinc binding and CRMP homology domains are needed for dendrite patterning. In addition, cypin binds tubulin via its CRMP homology domain to promote microtubule assembly. Using a yeast two-hybrid screen of a rat brain cDNA library with cypin lacking the carboxyl terminal eight amino acids as bait, we identified snapin as a cypin binding partner. Here, we show by affinity chromatography and coimmunoprecipitation that the carboxyl-terminal coiled-coil domain (H2) of snapin is required for cypin binding. In addition, snapin binds to cypin's CRMP homology domain, which is where tubulin binds. We also show that snapin competes with tubulin for binding to cypin, resulting in decreased microtubule assembly. Subsequently, overexpression of snapin in primary cultures of hippocampal neurons results in decreased primary dendrites present on these neurons and increased probability of branching. Together, our data suggest that snapin regulates dendrite number in developing neurons by modulating cypin-promoted microtubule assembly.


Subject(s)
Body Patterning/physiology , Carrier Proteins/metabolism , Dendrites/physiology , Guanine Deaminase/metabolism , Microtubules/physiology , Vesicular Transport Proteins/metabolism , Animals , Binding, Competitive , COS Cells , Cell Culture Techniques , Chlorocebus aethiops , Chromatography, Affinity , Hippocampus/embryology , Microtubules/metabolism , Models, Biological , Neurons/metabolism , Protein Structure, Tertiary , Rats , Synaptosomes/metabolism , Transfection
SELECTION OF CITATIONS
SEARCH DETAIL