RESUMO
Rhabdomyosarcoma (RMS) is a group of pediatric cancers with features of developing skeletal muscle. The cellular hierarchy and mechanisms leading to developmental arrest remain elusive. Here, we combined single-cell RNA sequencing, mass cytometry, and high-content imaging to resolve intratumoral heterogeneity of patient-derived primary RMS cultures. We show that the aggressive alveolar RMS (aRMS) subtype contains plastic muscle stem-like cells and cycling progenitors that drive tumor growth, and a subpopulation of differentiated cells that lost its proliferative potential and correlates with better outcomes. While chemotherapy eliminates cycling progenitors, it enriches aRMS for muscle stem-like cells. We screened for drugs hijacking aRMS toward clinically favorable subpopulations and identified a combination of RAF and MEK inhibitors that potently induces myogenic differentiation and inhibits tumor growth. Overall, our work provides insights into the developmental states underlying aRMS aggressiveness, chemoresistance, and progression and identifies the RAS pathway as a promising therapeutic target.
Assuntos
Antineoplásicos , Rabdomiossarcoma Alveolar , Rabdomiossarcoma , Criança , Humanos , Rabdomiossarcoma Alveolar/tratamento farmacológico , Rabdomiossarcoma Alveolar/genética , Rabdomiossarcoma Alveolar/patologia , Rabdomiossarcoma/tratamento farmacológico , Rabdomiossarcoma/genética , Rabdomiossarcoma/patologia , Músculo Esquelético/metabolismo , Diferenciação Celular , Antineoplásicos/uso terapêutico , Linhagem Celular TumoralRESUMO
MOTIVATION: Signaling pathways control cellular behavior. Dysregulated pathways, for example, due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. RESULTS: We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, technical artifacts have less influence on the results and dce is more likely to detect the true biological signals. We extend the approach to handle unobserved dense confounding, where each latent variable, such as, for example, batch effects or cell cycle states, affects many covariates. We show that dce outperforms competing methods on synthetic datasets and on CRISPR knockout screens. We validate its latent confounding adjustment properties on a GTEx (Genotype-Tissue Expression) dataset. Finally, in an exploratory analysis on breast cancer data from TCGA (The Cancer Genome Atlas), we recover known and discover new genes involved in breast cancer progression. AVAILABILITY AND IMPLEMENTATION: The method dce is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/dce.html) as well as on https://github.com/cbg-ethz/dce. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Neoplasias da Mama , Software , Humanos , Feminino , Genoma , Transdução de SinaisRESUMO
A major goal in proteomics is the comprehensive and accurate description of a proteome. This task includes not only the identification of proteins in a sample, but also the accurate quantification of their abundance. Although mass spectrometry typically provides information on peptide identity and abundance in a sample, it does not directly measure the concentration of the corresponding proteins. Specifically, most mass-spectrometry-based approaches (e.g. shotgun proteomics or selected reaction monitoring) allow one to quantify peptides using chromatographic peak intensities or spectral counting information. Ultimately, based on these measurements, one wants to infer the concentrations of the corresponding proteins. Inferring properties of the proteins based on experimental peptide evidence is often a complex problem because of the ambiguity of peptide assignments and different chemical properties of the peptides that affect the observed concentrations. We present SCAMPI, a novel generic and statistically sound framework for computing protein abundance scores based on quantified peptides. In contrast to most previous approaches, our model explicitly includes information from shared peptides to improve protein quantitation, especially in eukaryotes with many homologous sequences. The model accounts for uncertainty in the input data, leading to statistical prediction intervals for the protein scores. Furthermore, peptides with extreme abundances can be reassessed and classified as either regular data points or actual outliers. We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up mass spectrometry.
Assuntos
Biologia Computacional/métodos , Interpretação Estatística de Dados , Proteínas/análise , Proteômica/estatística & dados numéricos , Linhagem Celular Tumoral , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Marcação por Isótopo/métodos , Leptospira interrogans/metabolismo , Leucemia Mieloide Aguda/metabolismo , Cadeias de Markov , Proteômica/métodos , Projetos de Pesquisa , SoftwareRESUMO
BACKGROUND: Renal cell carcinoma (RCC) is characterized by a number of diverse molecular aberrations that differ among individuals. Recent approaches to molecularly classify RCC were based on clinical, pathological as well as on single molecular parameters. As a consequence, gene expression patterns reflecting the sum of genetic aberrations in individual tumors may not have been recognized. In an attempt to uncover such molecular features in RCC, we used a novel, unbiased and integrative approach. METHODS: We integrated gene expression data from 97 primary RCC of different pathologic parameters, 15 RCC metastases as well as 34 cancer cell lines for two-way nonsupervised hierarchical clustering using gene groups suggested by the PANTHER Classification System. We depicted the genomic landscape of the resulted tumor groups by means of Single Nuclear Polymorphism (SNP) technology. Finally, the achieved results were immunohistochemically analyzed using a tissue microarray (TMA) composed of 254 RCC. RESULTS: We found robust, genome wide expression signatures, which split RCC into three distinct molecular subgroups. These groups remained stable even if randomly selected gene sets were clustered. Notably, the pattern obtained from RCC cell lines was clearly distinguishable from that of primary tumors. SNP array analysis demonstrated differing frequencies of chromosomal copy number alterations among RCC subgroups. TMA analysis with group-specific markers showed a prognostic significance of the different groups. CONCLUSION: We propose the existence of characteristic and histologically independent genome-wide expression outputs in RCC with potential biological and clinical relevance.
Assuntos
Carcinoma de Células Renais/classificação , Perfilação da Expressão Gênica , Neoplasias Renais/classificação , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/mortalidade , Carcinoma de Células Renais/patologia , Linhagem Celular Tumoral , Análise por Conglomerados , Variações do Número de Cópias de DNA , Humanos , Neoplasias Renais/genética , Neoplasias Renais/mortalidade , Neoplasias Renais/patologia , Polimorfismo de Nucleotídeo Único , Prognóstico , Modelos de Riscos ProporcionaisRESUMO
DNA-encoded chemical libraries, i.e., collections of compounds individually coupled to distinctive DNA fragments serving as amplifiable identification barcodes, represent a new tool for the de novo discovery of small molecule ligands to target proteins of pharmaceutical interest. Here, we describe the design and synthesis of a novel DNA-encoded chemical library containing one million small molecules. The library was synthesized by combinatorial assembly of three sets of chemical building blocks using Diels-Alder cycloadditions and by the stepwise build-up of the DNA barcodes. Model selections were performed to test library performance and to develop a statistical method for the analysis of high-throughput sequencing data. A library selection against carbonic anhydrase IX revealed a new class of submicromolar bis(sulfonamide) inhibitors. One of these inhibitors was synthesized in the absence of the DNA-tag and showed accumulation in hypoxic tumor tissue sections in vitro and tumor targeting in vivo.
Assuntos
Antígenos de Neoplasias , Inibidores da Anidrase Carbônica/síntese química , Anidrases Carbônicas , Técnicas de Química Combinatória/métodos , Bibliotecas de Moléculas Pequenas/síntese química , Sulfonamidas/síntese química , Adenocarcinoma/tratamento farmacológico , Antígenos de Neoplasias/metabolismo , Anidrase Carbônica IX , Inibidores da Anidrase Carbônica/farmacologia , Anidrases Carbônicas/metabolismo , Linhagem Celular Tumoral , Neoplasias Colorretais/tratamento farmacológico , DNA/química , Corantes Fluorescentes/análise , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Ensaios de Triagem em Larga Escala , Humanos , Cinética , Imagem Molecular , Terapia de Alvo Molecular , Transplante de Neoplasias , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/farmacologia , Relação Estrutura-Atividade , Sulfonamidas/farmacologiaRESUMO
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Proteínas/análise , Proteômica/métodos , Algoritmos , Animais , Proteínas de Arabidopsis/análise , Bases de Dados de Proteínas , Proteínas de Drosophila/análise , Cadeias de Markov , Peptídeos/análise , Proteoma/análise , Reprodutibilidade dos Testes , Proteínas de Saccharomyces cerevisiae/análiseRESUMO
Large contingency tables summarizing categorical variables arise in many areas. One example is in biology, where large numbers of biomarkers are cross-tabulated according to their discrete expression level. Interactions of the variables are of great interest and are generally studied with log-linear models. The structure of a log-linear model can be visually represented by a graph from which the conditional independence structure can then be easily read off. However, since the number of parameters in a saturated model grows exponentially in the number of variables, this generally comes with a heavy computational burden. Even if we restrict ourselves to models of lower-order interactions or other sparse structures, we are faced with the problem of a large number of cells which play the role of sample size. This is in sharp contrast to high-dimensional regression or classification procedures because, in addition to a high-dimensional parameter, we also have to deal with the analogue of a huge sample size. Furthermore, high-dimensional tables naturally feature a large number of sampling zeros which often leads to the nonexistence of the maximum likelihood estimate. We therefore present a decomposition approach, where we first divide the problem into several lower-dimensional problems and then combine these to form a global solution. Our methodology is computationally feasible for log-linear interaction models with many categorical variables each or some of them having many levels. We demonstrate the proposed method on simulated data and apply it to a bio-medical problem in cancer research.
Assuntos
Biometria/métodos , Interpretação Estatística de Dados , Modelos Biológicos , Modelos Estatísticos , Simulação por ComputadorRESUMO
PURPOSE: Tumor stage and nuclear grade are the most important prognostic parameters of clear cell renal cell carcinoma (ccRCC). The progression risk of ccRCC remains difficult to predict particularly for tumors with organ-confined stage and intermediate differentiation grade. Elucidating molecular pathways deregulated in ccRCC may point to novel prognostic parameters that facilitate planning of therapeutic approaches. EXPERIMENTAL DESIGN: Using tissue microarrays, expression patterns of 15 different proteins were evaluated in over 800 ccRCC patients to analyze pathways reported to be physiologically controlled by the tumor suppressors von Hippel-Lindau protein and phosphatase and tensin homologue (PTEN). Tumor staging and grading were improved by performing variable selection using Cox regression and a recursive bootstrap elimination scheme. RESULTS: Patients with pT2 and pT3 tumors that were p27 and CAIX positive had a better outcome than those with all remaining marker combinations. A prolonged survival among patients with intermediate grade (grade 2) correlated with both nuclear p27 and cytoplasmic PTEN expression, as well as with inactive, nonphosphorylated ribosomal protein S6. By applying graphical log-linear modeling for over 700 ccRCC for which the molecular parameters were available, only a weak conditional dependence existed between the expression of p27, PTEN, CAIX, and p-S6, suggesting that the dysregulation of several independent pathways are crucial for tumor progression. CONCLUSIONS: The use of recursive bootstrap elimination, as well as graphical log-linear modeling for comprehensive tissue microarray (TMA) data analysis allows the unraveling of complex molecular contexts and may improve predictive evaluations for patients with advanced renal cancer.
Assuntos
Biomarcadores Tumorais/análise , Carcinoma de Células Renais/metabolismo , Neoplasias Renais/metabolismo , Proteínas de Neoplasias/análise , Análise Serial de Proteínas , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Carcinoma de Células Renais/mortalidade , Carcinoma de Células Renais/patologia , Progressão da Doença , Feminino , Humanos , Neoplasias Renais/mortalidade , Neoplasias Renais/patologia , Modelos Lineares , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Fenótipo , PrognósticoRESUMO
DNA-encoded chemical libraries are promising tools for the discovery of ligands toward protein targets of pharmaceutical relevance. DNA-encoded small molecules can be enriched in affinity-based selections and their unique DNA "barcode" allows the amplification and identification by high-throughput sequencing. We describe selection experiments using a DNA-encoded 4000-compound library generated by Diels-Alder cycloadditions. High-throughput sequencing enabled the identification and relative quantification of library members before and after selection. Sequence enrichment profiles corresponding to the "bar-coded" library members were validated by affinity measurements of single compounds. We were able to affinity mature trypsin inhibitors and identify a series of albumin binders for the conjugation of pharmaceuticals. Furthermore, we discovered a ligand for the antiapoptotic Bcl-xL protein and a class of tumor necrosis factor (TNF) binders that completely inhibited TNF-mediated killing of L-M fibroblasts in vitro.
Assuntos
DNA/química , Inibidores do Fator de Necrose Tumoral , Animais , Sequência de Bases , Linhagem Celular , Desenho de Fármacos , Ensaios de Triagem em Larga Escala , Proteínas Imobilizadas/metabolismo , Ligantes , Camundongos , Bibliotecas de Moléculas Pequenas , Fatores de Necrose Tumoral/metabolismo , Proteína bcl-X/antagonistas & inibidores , Proteína bcl-X/metabolismoRESUMO
In many studies, particularly in the field of systems biology, it is essential that identical protein sets are precisely quantified in multiple samples such as those representing differentially perturbed cell states. The high degree of reproducibility required for such experiments has not been achieved by classical mass spectrometry-based proteomics methods. In this study we describe the implementation of a targeted quantitative approach by which predetermined protein sets are first identified and subsequently quantified at high sensitivity reliably in multiple samples. This approach consists of three steps. First, the proteome is extensively mapped out by multidimensional fractionation and tandem mass spectrometry, and the data generated are assembled in the PeptideAtlas database. Second, based on this proteome map, peptides uniquely identifying the proteins of interest, proteotypic peptides, are selected, and multiple reaction monitoring (MRM) transitions are established and validated by MS2 spectrum acquisition. This process of peptide selection, transition selection, and validation is supported by a suite of software tools, TIQAM (Targeted Identification for Quantitative Analysis by MRM), described in this study. Third, the selected target protein set is quantified in multiple samples by MRM. Applying this approach we were able to reliably quantify low abundance virulence factors from cultures of the human pathogen Streptococcus pyogenes exposed to increasing amounts of plasma. The resulting quantitative protein patterns enabled us to clearly define the subset of virulence proteins that is regulated upon plasma exposure.
Assuntos
Proteoma/análise , Proteômica/métodos , Streptococcus pyogenes/química , Streptococcus pyogenes/patogenicidade , Fatores de Virulência/análise , Humanos , Peptídeos/análise , SoftwareRESUMO
We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic and diagnostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods, applied to the recurrence-free survival time of node-positive breast cancer patients, with previously published findings.
Assuntos
Análise de Sobrevida , Algoritmos , Neoplasias da Mama/patologia , Feminino , Humanos , Leucemia Mielomonocítica Aguda/diagnóstico , Metástase Linfática/patologia , Modelos Estatísticos , PrognósticoRESUMO
Rhabdomyosarcoma is a pediatric tumor type, which is classified based on histological criteria into two major subgroups, namely embryonal rhabdomyosarcoma and alveolar rhabdomyosarcoma. The majority, but not all, alveolar rhabdomyosarcoma carry the specific PAX3(7)/FKHR-translocation, whereas there is no consistent genetic abnormality recognized in embryonal rhabdomyosarcoma. To gain additional insight into the genetic characteristics of these subtypes, we used oligonucleotide microarrays to measure the expression profiles of a group of 29 rhabdomyosarcoma biopsy samples (15 embryonal rhabdomyosarcoma, and 10 translocation-positive and 4 translocation-negative alveolar rhabdomyosarcoma). Hierarchical clustering revealed expression signatures clearly discriminating all three of the subgroups. Differentially expressed genes included several tyrosine kinases and G protein-coupled receptors, which might be amenable to pharmacological intervention. In addition, the alveolar rhabdomyosarcoma signature was used to classify an additional alveolar rhabdomyosarcoma case lacking any known PAX3 or PAX7 fusion as belonging to the translocation-positive group, leading to the identification of a novel translocation t(2;2)(q35;p23), which generates a fusion protein composed of PAX3 and the nuclear receptor coactivator NCOA1, having similar transactivation properties as PAX3/FKHR. These experiments demonstrate for the first time that gene expression profiling is capable of identifying novel chromosomal translocations.
Assuntos
Cromossomos Humanos Par 2/genética , Proteínas de Ligação a DNA/genética , Proteínas de Fusão Oncogênica/genética , Rabdomiossarcoma Alveolar/genética , Rabdomiossarcoma Embrionário/genética , Fatores de Transcrição/genética , Sequência de Bases , Perfilação da Expressão Gênica , Histona Acetiltransferases , Humanos , Dados de Sequência Molecular , Coativador 1 de Receptor Nuclear , Análise de Sequência com Séries de Oligonucleotídeos , Fator de Transcrição PAX3 , Fatores de Transcrição Box Pareados , Rabdomiossarcoma Alveolar/metabolismo , Rabdomiossarcoma Embrionário/metabolismo , Transativadores/genética , Translocação GenéticaRESUMO
BACKGROUND AND OBJECTIVES: Childhood acute lymphoblastic leukemia (ALL) is a heterogeneous disease. There are several distinct genetic subtypes, characterized by typical changes in gene expression pattern. In addition to cytogenetic markers, the in vivo response to treatment is an emerging prognostic marker for risk stratification. However, it has not yet been reported whether gene expression profiles can predict risk group stratification already at the time of diagnosis. DESIGN AND METHODS: We analyzed bone marrow samples of 31 ALL patients to identify changes in gene expression that are associated with the current risk assignment, irrespective of the genetic subtype. Gene expression profiles were established using oligonucleotide microarrays. RESULTS: Considering all low- and high-risk patients, no gene was capable of predicting the risk assignment already at time of diagnosis. However, screening for risk group associated genes using more homogeneous subsets of patients revealed 10(6) discriminatory probe sets. The prognostic significance of these probe sets was subsequently determined for the entire series of patients. Using the selected subgroups as the training set and the remaining samples as an independent test set, logistic regression using 3 predictor variables could accurately predict current risk assignment for 10 out of 12 patients. INTERPRETATION AND CONCLUSIONS: Gene expression profiles established from a cytogenetically heterogeneous study group are not, as yet, sufficiently accurate to be used prognostically in a clinical setting. Additional risk-associated gene expression analyses need to be performed in more homogeneous sets of patients.
Assuntos
Perfilação da Expressão Gênica , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Família Multigênica/fisiologia , Neoplasia Residual/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/terapia , Prognóstico , Fatores de Risco , Taxa de SobrevidaRESUMO
MOTIVATION: Microarray experiments generate large datasets with expression values for thousands of genes but not more than a few dozens of samples. Accurate supervised classification of tissue samples in such high-dimensional problems is difficult but often crucial for successful diagnosis and treatment. A promising way to meet this challenge is by using boosting in conjunction with decision trees. RESULTS: We demonstrate that the generic boosting algorithm needs some modification to become an accurate classifier in the context of gene expression data. In particular, we present a feature preselection method, a more robust boosting procedure and a new approach for multi-categorical problems. This allows for slight to drastic increase in performance and yields competitive results on several publicly available datasets. AVAILABILITY: Software for the modified boosting algorithms as well as for decision trees is available for free in R at http://stat.ethz.ch/~dettling/boosting.html.
Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão , Bases de Dados Genéticas , Árvores de Decisões , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
BACKGROUND: We focus on microarray data where experiments monitor gene expression in different tissues and where each experiment is equipped with an additional response variable such as a cancer type. Although the number of measured genes is in the thousands, it is assumed that only a few marker components of gene subsets determine the type of a tissue. Here we present a new method for finding such groups of genes by directly incorporating the response variables into the grouping process, yielding a supervised clustering algorithm for genes. RESULTS: An empirical study on eight publicly available microarray datasets shows that our algorithm identifies gene clusters with excellent predictive potential, often superior to classification with state-of-the-art methods based on single genes. Permutation tests and bootstrapping provide evidence that the output is reasonably stable and more than a noise artifact. CONCLUSIONS: In contrast to other methods such as hierarchical clustering, our algorithm identifies several gene clusters whose expression levels clearly distinguish the different tissue types. The identification of such gene clusters is potentially useful for medical diagnostics and may at the same time reveal insights into functional genomics.