RESUMO
We conducted comprehensive integrative molecular analyses of the complete set of tumors in The Cancer Genome Atlas (TCGA), consisting of approximately 10,000 specimens and representing 33 types of cancer. We performed molecular clustering using data on chromosome-arm-level aneuploidy, DNA hypermethylation, mRNA, and miRNA expression levels and reverse-phase protein arrays, of which all, except for aneuploidy, revealed clustering primarily organized by histology, tissue type, or anatomic origin. The influence of cell type was evident in DNA-methylation-based clustering, even after excluding sites with known preexisting tissue-type-specific methylation. Integrative clustering further emphasized the dominant role of cell-of-origin patterns. Molecular similarities among histologically or anatomically related cancer types provide a basis for focused pan-cancer analyses, such as pan-gastrointestinal, pan-gynecological, pan-kidney, and pan-squamous cancers, and those related by stemness features, which in turn may inform strategies for future therapeutic development.
Assuntos
Neoplasias/patologia , Aneuploidia , Cromossomos/genética , Análise por Conglomerados , Ilhas de CpG , Metilação de DNA , Bases de Dados Factuais , Humanos , MicroRNAs/metabolismo , Mutação , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Neoplasias/genética , RNA Mensageiro/metabolismoRESUMO
Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
Assuntos
Desdiferenciação Celular/genética , Aprendizado de Máquina , Neoplasias/patologia , Carcinogênese , Metilação de DNA , Bases de Dados Genéticas , Epigênese Genética , Humanos , MicroRNAs/metabolismo , Metástase Neoplásica , Neoplasias/genética , Células-Tronco/citologia , Células-Tronco/metabolismo , Transcriptoma , Microambiente TumoralRESUMO
The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.
Assuntos
Carcinogênese/genética , Genômica , Neoplasias/patologia , Reparo do DNA/genética , Bases de Dados Genéticas , Genes Neoplásicos , Humanos , Redes e Vias Metabólicas/genética , Instabilidade de Microssatélites , Mutação , Neoplasias/genética , Neoplasias/imunologia , Transcriptoma , Microambiente Tumoral/genéticaRESUMO
While mutations affecting protein-coding regions have been examined across many cancers, structural variants at the genome-wide level are still poorly defined. Through integrative deep whole-genome and -transcriptome analysis of 101 castration-resistant prostate cancer metastases (109X tumor/38X normal coverage), we identified structural variants altering critical regulators of tumorigenesis and progression not detectable by exome approaches. Notably, we observed amplification of an intergenic enhancer region 624 kb upstream of the androgen receptor (AR) in 81% of patients, correlating with increased AR expression. Tandem duplication hotspots also occur near MYC, in lncRNAs associated with post-translational MYC regulation. Classes of structural variations were linked to distinct DNA repair deficiencies, suggesting their etiology, including associations of CDK12 mutation with tandem duplications, TP53 inactivation with inverted rearrangements and chromothripsis, and BRCA2 inactivation with deletions. Together, these observations provide a comprehensive view of how structural variations affect critical regulators in metastatic prostate cancer.
Assuntos
Variação Estrutural do Genoma/genética , Neoplasias da Próstata/genética , Idoso , Idoso de 80 Anos ou mais , Proteína BRCA2/metabolismo , Quinases Ciclina-Dependentes/metabolismo , Variações do Número de Cópias de DNA , Exoma , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Humanos , Masculino , Pessoa de Meia-Idade , Mutação , Metástase Neoplásica/genética , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , Receptores Androgênicos/genética , Receptores Androgênicos/metabolismo , Sequências de Repetição em Tandem/genética , Proteína Supressora de Tumor p53/metabolismo , Sequenciamento Completo do Genoma/métodosRESUMO
We used clinical tissue from lethal metastatic castration-resistant prostate cancer (CRPC) patients obtained at rapid autopsy to evaluate diverse genomic, transcriptomic, and phosphoproteomic datasets for pathway analysis. Using Tied Diffusion through Interacting Events (TieDIE), we integrated differentially expressed master transcriptional regulators, functionally mutated genes, and differentially activated kinases in CRPC tissues to synthesize a robust signaling network consisting of druggable kinase pathways. Using MSigDB hallmark gene sets, six major signaling pathways with phosphorylation of several key residues were significantly enriched in CRPC tumors after incorporation of phosphoproteomic data. Individual autopsy profiles developed using these hallmarks revealed clinically relevant pathway information potentially suitable for patient stratification and targeted therapies in late stage prostate cancer. Here, we describe phosphorylation-based cancer hallmarks using integrated personalized signatures (pCHIPS) that shed light on the diversity of activated signaling pathways in metastatic CRPC while providing an integrative, pathway-based reference for drug prioritization in individual patients.
Assuntos
Fosfoproteínas/análise , Neoplasias de Próstata Resistentes à Castração/química , Proteoma/análise , Algoritmos , Humanos , Masculino , Medicina de Precisão , Neoplasias de Próstata Resistentes à Castração/metabolismo , Transdução de Sinais , TranscriptomaRESUMO
Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.
Assuntos
Neoplasias/classificação , Neoplasias/genética , Análise por Conglomerados , Humanos , Neoplasias/patologia , TranscriptomaRESUMO
The androgen receptor (AR) antagonist enzalutamide is one of the principal treatments for men with castration-resistant prostate cancer (CRPC). However, not all patients respond, and resistance mechanisms are largely unknown. We hypothesized that genomic and transcriptional features from metastatic CRPC biopsies prior to treatment would be predictive of de novo treatment resistance. To this end, we conducted a phase II trial of enzalutamide treatment (160 mg/d) in 36 men with metastatic CRPC. Thirty-four patients were evaluable for the primary end point of a prostate-specific antigen (PSA)50 response (PSA decline ≥50% at 12 wk vs. baseline). Nine patients were classified as nonresponders (PSA decline <50%), and 25 patients were classified as responders (PSA decline ≥50%). Failure to achieve a PSA50 was associated with shorter progression-free survival, time on treatment, and overall survival, demonstrating PSA50's utility. Targeted DNA-sequencing was performed on 26 of 36 biopsies, and RNA-sequencing was performed on 25 of 36 biopsies that contained sufficient material. Using computational methods, we measured AR transcriptional function and performed gene set enrichment analysis (GSEA) to identify pathways whose activity state correlated with de novo resistance. TP53 gene alterations were more common in nonresponders, although this did not reach statistical significance (P = 0.055). AR gene alterations and AR expression were similar between groups. Importantly, however, transcriptional measurements demonstrated that specific gene sets-including those linked to low AR transcriptional activity and a stemness program-were activated in nonresponders. Our results suggest that patients whose tumors harbor this program should be considered for clinical trials testing rational agents to overcome de novo enzalutamide resistance.
Assuntos
Antineoplásicos/administração & dosagem , Resistencia a Medicamentos Antineoplásicos , Feniltioidantoína/análogos & derivados , Neoplasias de Próstata Resistentes à Castração/genética , Receptores Androgênicos/administração & dosagem , Receptores Androgênicos/genética , Idoso , Idoso de 80 Anos ou mais , Benzamidas , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Nitrilas , Feniltioidantoína/administração & dosagem , Antígeno Prostático Específico/metabolismo , Neoplasias de Próstata Resistentes à Castração/tratamento farmacológico , Neoplasias de Próstata Resistentes à Castração/metabolismo , Receptores Androgênicos/metabolismoRESUMO
Advancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, many databases have amassed information about pathways and gene "signatures"-patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets. However, few such integrative approaches exist that also provide interpretable results quantifying the importance of individual genes and pathways to model accuracy. We introduce AKLIMATE, a first kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks. AKLIMATE uses a novel multiple-kernel learning framework where individual kernels capture the prediction propensities recorded in random forests, each built from a specific pathway gene set that integrates all omics data for its member genes. AKLIMATE has comparable or improved performance relative to state-of-the-art methods on diverse phenotype learning tasks, including predicting microsatellite instability in endometrial and colorectal cancer, survival in breast cancer, and cell line response to gene knockdowns. We show how AKLIMATE is able to connect feature data across data platforms through their common pathways to identify examples of several known and novel contributors of cancer and synthetic lethality.
Assuntos
Genômica , Aprendizado de Máquina , Neoplasias/classificação , Neoplasias/genética , Linhagem Celular Tumoral , Técnicas de Silenciamento de Genes , Humanos , Fenótipo , RNA Interferente Pequeno/genética , Análise de SobrevidaRESUMO
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Assuntos
Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais CultivadasRESUMO
BACKGROUND: The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS: The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS: The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.
Assuntos
Genoma Humano , Células Germinativas/metabolismo , Polimorfismo de Nucleotídeo Único , Algoritmos , Humanos , Internet , Neoplasias/genética , Neoplasias/patologia , Interface Usuário-Computador , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile. RESULTS: To determine how to create subsets of predictions for validation that maximize accuracy of global error profile inference, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. We evaluated these selection strategies on one simulated and two experimental datasets. CONCLUSIONS: Valection is implemented in multiple programming languages, available at: http://labs.oicr.on.ca/boutros-lab/software/valection.
Assuntos
Análise de Sequência de DNA/métodos , Validação de Programas de ComputadorRESUMO
Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
Assuntos
Redes Reguladoras de Genes , Genoma , Neoplasias/genética , Transdução de Sinais/fisiologia , HumanosRESUMO
The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.
Assuntos
Benchmarking , Crowdsourcing , Genoma , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Algoritmos , HumanosRESUMO
Evidence from numerous cancers suggests that increased aggressiveness is accompanied by up-regulation of signaling pathways and acquisition of properties common to stem cells. It is unclear if different subtypes of late-stage cancer vary in stemness properties and whether or not these subtypes are transcriptionally similar to normal tissue stem cells. We report a gene signature specific for human prostate basal cells that is differentially enriched in various phenotypes of late-stage metastatic prostate cancer. We FACS-purified and transcriptionally profiled basal and luminal epithelial populations from the benign and cancerous regions of primary human prostates. High-throughput RNA sequencing showed the basal population to be defined by genes associated with stem cell signaling programs and invasiveness. Application of a 91-gene basal signature to gene expression datasets from patients with organ-confined or hormone-refractory metastatic prostate cancer revealed that metastatic small cell neuroendocrine carcinoma was molecularly more stem-like than either metastatic adenocarcinoma or organ-confined adenocarcinoma. Bioinformatic analysis of the basal cell and two human small cell gene signatures identified a set of E2F target genes common between prostate small cell neuroendocrine carcinoma and primary prostate basal cells. Taken together, our data suggest that aggressive prostate cancer shares a conserved transcriptional program with normal adult prostate basal stem cells.
Assuntos
Perfilação da Expressão Gênica , Neoplasias da Próstata/genética , Neoplasias da Próstata/patologia , Células-Tronco/metabolismo , Antígenos CD/metabolismo , Células Epiteliais/metabolismo , Feminino , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Masculino , Glândulas Mamárias Humanas/citologia , Metástase Neoplásica , Tumores Neuroendócrinos/genética , Tumores Neuroendócrinos/patologia , Fenótipo , Proteínas Proto-Oncogênicas c-myc/metabolismo , Análise de Sequência de RNA , Transdução de Sinais/genética , Fatores de Transcrição/metabolismoRESUMO
We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.
Assuntos
Mapeamento Cromossômico/métodos , Modelos Genéticos , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas/métodos , Proteoma/genética , Transdução de Sinais/genética , Animais , Simulação por Computador , HumanosRESUMO
High-throughput data sets such as genome-wide protein-protein interactions, protein-DNA interactions and gene expression data have been published for several model systems, especially for human cancer samples. The University of California, Santa Cruz (UCSC) Interaction Browser (http://sysbio.soe.ucsc.edu/nets) is an online tool for biologists to view high-throughput data sets simultaneously for the analysis of functional relationships between biological entities. Users can access several public interaction networks and functional genomics data sets through the portal as well as upload their own networks and data sets for analysis. Users can navigate through correlative relationships for focused sets of genes belonging to biological pathways using a standard web browser. Using a new visual modality called the CircleMap, multiple 'omics' data sets can be viewed simultaneously within the context of curated, predicted, directed and undirected regulatory interactions. The Interaction Browser provides an integrative viewing of biological networks based on the consensus of many observations about genes and their products, which may provide new insights about normal and disease processes not obvious from any isolated data set.
Assuntos
Redes Reguladoras de Genes , Software , Neoplasias Colorretais/genética , Gráficos por Computador , Variações do Número de Cópias de DNA , Metilação de DNA , Expressão Gênica , Genômica , Humanos , Internet , Mutação , Mapeamento de Interação de ProteínasRESUMO
Breast cancers are comprised of molecularly distinct subtypes that may respond differently to pathway-targeted therapies now under development. Collections of breast cancer cell lines mirror many of the molecular subtypes and pathways found in tumors, suggesting that treatment of cell lines with candidate therapeutic compounds can guide identification of associations between molecular subtypes, pathways, and drug response. In a test of 77 therapeutic compounds, nearly all drugs showed differential responses across these cell lines, and approximately one third showed subtype-, pathway-, and/or genomic aberration-specific responses. These observations suggest mechanisms of response and resistance and may inform efforts to develop molecular assays that predict clinical response.
Assuntos
Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Neoplasias da Mama/classificação , Neoplasias da Mama/tratamento farmacológico , Transdução de Sinais/efeitos dos fármacos , Neoplasias da Mama/genética , Linhagem Celular Tumoral , Ensaios de Seleção de Medicamentos Antitumorais , Feminino , Dosagem de Genes/genética , Humanos , Modelos Biológicos , Transdução de Sinais/genética , Transcrição Gênica/efeitos dos fármacosRESUMO
MOTIVATION: Identifying the cellular wiring that connects genomic perturbations to transcriptional changes in cancer is essential to gain a mechanistic understanding of disease initiation, progression and ultimately to predict drug response. We have developed a method called Tied Diffusion Through Interacting Events (TieDIE) that uses a network diffusion approach to connect genomic perturbations to gene expression changes characteristic of cancer subtypes. The method computes a subnetwork of protein-protein interactions, predicted transcription factor-to-target connections and curated interactions from literature that connects genomic and transcriptomic perturbations. RESULTS: Application of TieDIE to The Cancer Genome Atlas and a breast cancer cell line dataset identified key signaling pathways, with examples impinging on MYC activity. Interlinking genes are predicted to correspond to essential components of cancer signaling and may provide a mechanistic explanation of tumor character and suggest subtype-specific drug targets. AVAILABILITY: Software is available from the Stuart lab's wiki: https://sysbiowiki.soe.ucsc.edu/tiedie. CONTACT: jstuart@ucsc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Linhagem Celular Tumoral , Feminino , Perfilação da Expressão Gênica , Genômica , Humanos , Neoplasias/genética , Mapeamento de Interação de Proteínas , Transdução de Sinais , Software , Fatores de Transcrição/metabolismoRESUMO
Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.