ABSTRACT
Cancer driver events refer to key genetic aberrations that drive oncogenesis; however, their exact molecular mechanisms remain insufficiently understood. Here, our multi-omics pan-cancer analysis uncovers insights into the impacts of cancer drivers by identifying their significant cis-effects and distal trans-effects quantified at the RNA, protein, and phosphoprotein levels. Salient observations include the association of point mutations and copy-number alterations with the rewiring of protein interaction networks, and notably, most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles. A correlation between predicted neoantigen burden and measured T cell infiltration suggests potential vulnerabilities for immunotherapies. Patterns of cancer hallmarks vary by polygenic protein abundance ranging from uniform to heterogeneous. Overall, our work demonstrates the value of comprehensive proteogenomics in understanding the functional states of oncogenic drivers and their links to cancer development, surpassing the limitations of studying individual cancer types.
Subject(s)
Neoplasms , Proteogenomics , Humans , Neoplasms/genetics , Oncogenes , Cell Transformation, Neoplastic/genetics , DNA Copy Number VariationsABSTRACT
To explore the biology of lung adenocarcinoma (LUAD) and identify new therapeutic opportunities, we performed comprehensive proteogenomic characterization of 110 tumors and 101 matched normal adjacent tissues (NATs) incorporating genomics, epigenomics, deep-scale proteomics, phosphoproteomics, and acetylproteomics. Multi-omics clustering revealed four subgroups defined by key driver mutations, country, and gender. Proteomic and phosphoproteomic data illuminated biology downstream of copy number aberrations, somatic mutations, and fusions and identified therapeutic vulnerabilities associated with driver events involving KRAS, EGFR, and ALK. Immune subtyping revealed a complex landscape, reinforced the association of STK11 with immune-cold behavior, and underscored a potential immunosuppressive role of neutrophil degranulation. Smoking-associated LUADs showed correlation with other environmental exposure signatures and a field effect in NATs. Matched NATs allowed identification of differentially expressed proteins with potential diagnostic and therapeutic utility. This proteogenomics dataset represents a unique public resource for researchers and clinicians seeking to better understand and treat lung adenocarcinomas.
Subject(s)
Adenocarcinoma of Lung/drug therapy , Adenocarcinoma of Lung/genetics , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Proteogenomics , Adenocarcinoma of Lung/immunology , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/metabolism , Carcinogenesis/genetics , Carcinogenesis/pathology , DNA Copy Number Variations/genetics , DNA Methylation/genetics , Female , Humans , Lung Neoplasms/immunology , Male , Middle Aged , Mutation/genetics , Oncogene Proteins, Fusion , Phenotype , Phosphoproteins/metabolism , Proteome/metabolismABSTRACT
We report a comprehensive proteogenomics analysis, including whole-genome sequencing, RNA sequencing, and proteomics and phosphoproteomics profiling, of 218 tumors across 7 histological types of childhood brain cancer: low-grade glioma (n = 93), ependymoma (32), high-grade glioma (25), medulloblastoma (22), ganglioglioma (18), craniopharyngioma (16), and atypical teratoid rhabdoid tumor (12). Proteomics data identify common biological themes that span histological boundaries, suggesting that treatments used for one histological type may be applied effectively to other tumors sharing similar proteomics features. Immune landscape characterization reveals diverse tumor microenvironments across and within diagnoses. Proteomics data further reveal functional effects of somatic mutations and copy number variations (CNVs) not evident in transcriptomics data. Kinase-substrate association and co-expression network analysis identify important biological mechanisms of tumorigenesis. This is the first large-scale proteogenomics analysis across traditional histological boundaries to uncover foundational pediatric brain tumor biology and inform rational treatment selection.
Subject(s)
Brain Neoplasms/genetics , Brain Neoplasms/pathology , Proteogenomics , Brain Neoplasms/immunology , Child , DNA Copy Number Variations/genetics , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Genome, Human , Glioma/genetics , Glioma/pathology , Humans , Lymphocytes, Tumor-Infiltrating/immunology , Mutation/genetics , Neoplasm Grading , Neoplasm Recurrence, Local/pathology , Phosphoproteins/metabolism , Phosphorylation , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcriptome/geneticsABSTRACT
The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively. Here this "proteogenomics" approach was applied to 122 treatment-naive primary breast cancers accrued to preserve post-translational modifications, including protein phosphorylation and acetylation. Proteogenomics challenged standard breast cancer diagnoses, provided detailed analysis of the ERBB2 amplicon, defined tumor subsets that could benefit from immune checkpoint therapy, and allowed more accurate assessment of Rb status for prediction of CDK4/6 inhibitor responsiveness. Phosphoproteomics profiles uncovered novel associations between tumor suppressor loss and targetable kinases. Acetylproteome analysis highlighted acetylation on key nuclear proteins involved in the DNA damage response and revealed cross-talk between cytoplasmic and mitochondrial acetylation and metabolism. Our results underscore the potential of proteogenomics for clinical investigation of breast cancer through more accurate annotation of targetable pathways and biological features of this remarkably heterogeneous malignancy.
Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , Carcinogenesis/genetics , Carcinogenesis/pathology , Molecular Targeted Therapy , Proteogenomics , APOBEC Deaminases/metabolism , Adult , Aged , Aged, 80 and over , Breast Neoplasms/immunology , Breast Neoplasms/therapy , Cohort Studies , DNA Damage , DNA Repair , Female , Humans , Immunotherapy , Metabolomics , Middle Aged , Mutagenesis/genetics , Phosphorylation , Protein Kinase Inhibitors/pharmacology , Protein Kinases/metabolism , Receptor, ErbB-2/metabolism , Retinoblastoma Protein/metabolism , Tumor Microenvironment/immunologyABSTRACT
We undertook a comprehensive proteogenomic characterization of 95 prospectively collected endometrial carcinomas, comprising 83 endometrioid and 12 serous tumors. This analysis revealed possible new consequences of perturbations to the p53 and Wnt/ß-catenin pathways, identified a potential role for circRNAs in the epithelial-mesenchymal transition, and provided new information about proteomic markers of clinical and genomic tumor subgroups, including relationships to known druggable pathways. An extensive genome-wide acetylation survey yielded insights into regulatory mechanisms linking Wnt signaling and histone acetylation. We also characterized aspects of the tumor immune landscape, including immunogenic alterations, neoantigens, common cancer/testis antigens, and the immune microenvironment, all of which can inform immunotherapy decisions. Collectively, our multi-omic analyses provide a valuable resource for researchers and clinicians, identify new molecular associations of potential mechanistic significance in the development of endometrial cancers, and suggest novel approaches for identifying potential therapeutic targets.
Subject(s)
Carcinoma/genetics , Endometrial Neoplasms/genetics , Gene Expression Regulation, Neoplastic , Proteome/genetics , Transcriptome , Acetylation , Animals , Antigens, Neoplasm/genetics , Carcinoma/immunology , Carcinoma/pathology , Endometrial Neoplasms/immunology , Endometrial Neoplasms/pathology , Epithelial-Mesenchymal Transition/genetics , Feedback, Physiological , Female , Genomic Instability , Humans , Mice , MicroRNAs/genetics , MicroRNAs/metabolism , Microsatellite Repeats , Phosphorylation , Protein Processing, Post-Translational , Proteome/metabolism , Signal TransductionABSTRACT
To elucidate the deregulated functional modules that drive clear cell renal cell carcinoma (ccRCC), we performed comprehensive genomic, epigenomic, transcriptomic, proteomic, and phosphoproteomic characterization of treatment-naive ccRCC and paired normal adjacent tissue samples. Genomic analyses identified a distinct molecular subgroup associated with genomic instability. Integration of proteogenomic measurements uniquely identified protein dysregulation of cellular mechanisms impacted by genomic alterations, including oxidative phosphorylation-related metabolism, protein translation processes, and phospho-signaling modules. To assess the degree of immune infiltration in individual tumors, we identified microenvironment cell signatures that delineated four immune-based ccRCC subtypes characterized by distinct cellular pathways. This study reports a large-scale proteogenomic analysis of ccRCC to discern the functional impact of genomic alterations and provides evidence for rational treatment selection stemming from ccRCC pathobiology.
Subject(s)
Carcinoma, Renal Cell/genetics , Neoplasm Proteins/genetics , Proteogenomics , Transcriptome/genetics , Adult , Aged , Aged, 80 and over , Biomarkers, Tumor/genetics , Biomarkers, Tumor/immunology , Carcinoma, Renal Cell/immunology , Carcinoma, Renal Cell/pathology , Disease-Free Survival , Exome/genetics , Female , Gene Expression Regulation, Neoplastic/genetics , Genome, Human/genetics , Humans , Male , Middle Aged , Neoplasm Proteins/immunology , Oxidative Phosphorylation , Phosphorylation/genetics , Signal Transduction/genetics , Transcriptome/immunology , Tumor Microenvironment/genetics , Tumor Microenvironment/immunology , Exome SequencingABSTRACT
Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.
Subject(s)
Cell Dedifferentiation/genetics , Machine Learning , Neoplasms/pathology , Carcinogenesis , DNA Methylation , Databases, Genetic , Epigenesis, Genetic , Humans , MicroRNAs/metabolism , Neoplasm Metastasis , Neoplasms/genetics , Stem Cells/cytology , Stem Cells/metabolism , Transcriptome , Tumor MicroenvironmentABSTRACT
The Cancer Genome Atlas (TCGA) has catalyzed systematic characterization of diverse genomic alterations underlying human cancers. At this historic junction marking the completion of genomic characterization of over 11,000 tumors from 33 cancer types, we present our current understanding of the molecular processes governing oncogenesis. We illustrate our insights into cancer through synthesis of the findings of the TCGA PanCancer Atlas project on three facets of oncogenesis: (1) somatic driver mutations, germline pathogenic variants, and their interactions in the tumor; (2) the influence of the tumor genome and epigenome on transcriptome and proteome; and (3) the relationship between tumor and the microenvironment, including implications for drugs targeting driver events and immunotherapies. These results will anchor future characterization of rare and common tumor types, primary and relapsed tumors, and cancers across ancestry groups and will guide the deployment of clinical genomic sequencing.
Subject(s)
Carcinogenesis/genetics , Genomics , Neoplasms/pathology , DNA Repair/genetics , Databases, Genetic , Genes, Neoplasm , Humans , Metabolic Networks and Pathways/genetics , Microsatellite Instability , Mutation , Neoplasms/genetics , Neoplasms/immunology , Transcriptome , Tumor Microenvironment/geneticsABSTRACT
Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.
Subject(s)
Neoplasms/pathology , Algorithms , B7-H1 Antigen/genetics , Computational Biology , Databases, Genetic , Entropy , Humans , Microsatellite Instability , Mutation , Neoplasms/genetics , Neoplasms/immunology , Principal Component Analysis , Programmed Cell Death 1 Receptor/geneticsABSTRACT
We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-ß dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field.
Subject(s)
Genomics/methods , Neoplasms , Adolescent , Adult , Aged , Aged, 80 and over , Child , Female , Humans , Interferon-gamma/genetics , Interferon-gamma/immunology , Macrophages/immunology , Male , Middle Aged , Neoplasms/classification , Neoplasms/genetics , Neoplasms/immunology , Prognosis , Th1-Th2 Balance/physiology , Transforming Growth Factor beta/genetics , Transforming Growth Factor beta/immunology , Wound Healing/genetics , Wound Healing/immunology , Young AdultABSTRACT
Prediction of driver genes (tumor suppressors and oncogenes) is an essential step in understanding cancer development and discovering potential novel treatments. We recently proposed Moonlight as a bioinformatics framework to predict driver genes and analyze them in a system-biology-oriented manner based on -omics integration. Moonlight uses gene expression as a primary data source and combines it with patterns related to cancer hallmarks and regulatory networks to identify oncogenic mediators. Once the oncogenic mediators are identified, it is important to include extra levels of evidence, called mechanistic indicators, to identify driver genes and to link the observed gene expression changes to the underlying alteration that promotes them. Such a mechanistic indicator could be for example a mutation in the regulatory regions for the candidate gene. Here, we developed new functionalities and released Moonlight2 to provide the user with a mutation-based mechanistic indicator as a second layer of evidence. These functionalities analyze mutations in a cancer cohort to classify them into driver and passenger mutations. Those oncogenic mediators with at least one driver mutation are retained as the final set of driver genes. We applied Moonlight2 to the basal-like breast cancer subtype, lung adenocarcinoma and thyroid carcinoma using data from The Cancer Genome Atlas. For example, in basal-like breast cancer, we found four oncogenes (COPZ2, SF3B4, KRTCAP2 and POLR2J) and nine tumor suppressor genes (KIR2DL4, KIF26B, ARL15, ARHGAP25, EMCN, GMFG, TPK1, NR5A2 and TEK) containing a driver mutation in their promoter region, possibly explaining their deregulation. Moonlight2R is available at https://github.com/ELELAB/Moonlight2R.
Subject(s)
Breast Neoplasms , Lung Neoplasms , Neoplasms , Humans , Female , Workflow , Oncogenes , Neoplasms/genetics , Mutation , Breast Neoplasms/genetics , Lung Neoplasms/genetics , Gene Regulatory Networks , RNA Splicing Factors/genetics , RNA Polymerase II/geneticsABSTRACT
The authors present pathwayPCA, an R/Bioconductor package for integrative pathway analysis that utilizes modern statistical methodology, including supervised and adaptive, elastic-net, sparse principal component analysis. pathwayPCA can be applied to continuous, binary, and survival outcomes in studies with multiple covariates and/or interaction effects. It outperforms several alternative methods at identifying disease-associated pathways in integrative analysis using both simulated and real datasets. In addition, several case studies are provided to illustrate pathwayPCA analysis with gene selection, estimating, and visualizing sample-specific pathway activities, identifying sex-specific pathway effects in kidney cancer, and building integrative models for predicting patient prognosis. pathwayPCA is an open-source R package, freely available through the Bioconductor repository. pathwayPCA is expected to be a useful tool for empowering the wider scientific community to analyze and interpret the wealth of available proteomics data, along with other types of molecular data recently made available by Clinical Proteomic Tumor Analysis Consortium and other large consortiums.
Subject(s)
Genomics , Proteomics , Computational Biology , Humans , SoftwareABSTRACT
The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.
Subject(s)
High-Throughput Nucleotide Sequencing , Neoplasms/genetics , Carcinogenesis , Datasets as Topic , Genome, Human , HumansABSTRACT
BACKGROUND & AIMS: Patients with severe alcoholic hepatitis (AH) have a high risk of death within 90 days. Corticosteroids, which can cause severe adverse events, are the only treatment that increases short-term survival. It is a challenge to predict outcomes of patients with severe AH. Therefore, we developed a scoring system to predict patient survival, integrating baseline molecular and clinical variables. METHODS: We obtained fixed liver biopsy samples from 71 consecutive patients diagnosed with severe AH and treated with corticosteroids from July 2006 through December 2013 in Brussels, Belgium (derivation cohort). Gene expression patterns were analyzed by microarrays and clinical data were collected for 180 days. We identified gene expression signatures and clinical data that are associated with survival without liver transplantation at 90 and 180 days after initiation of corticosteroid therapy. Findings were validated using liver biopsies from 48 consecutive patients with severe AH treated with corticosteroids, collected from March 2010 through February 2015 at hospitals in Belgium and Switzerland (validation cohort 1) and in liver biopsies from 20 patients (9 received corticosteroid treatment), collected from January 2012 through May 2015 in the United States (validation cohort 2). RESULTS: We integrated data on expression patterns of 123 genes and the model for end-stage liver disease (MELD) scores to assign patients to groups with poor survival (29% survived 90 days and 26% survived 180 days) and good survival (76% survived 90 days and 65% survived 180 days) (P < .001) in the derivation cohort. We named this assignment system the gene signature-MELD (gs-MELD) score. In validation cohort 1, the gs-MELD score discriminated patients with poor survival (43% survived 90 days) from those with good survival (96% survived 90 days) (P < .001). The gs-MELD score also discriminated between patients with a poor survival at 180 days (34% survived) and a good survival at 180 days (84% survived) (P < .001). The time-dependent area under the receiver operator characteristic curve for the score was 0.86 (95% confidence interval 0.73-0.99) for survival at 90 days, and 0.83 (95% confidence interval 0.71-0.96) for survival at 180 days. This score outperformed other clinical models to predict survival of patients with severe AH in validation cohort 1. In validation cohort 2, the gs-MELD discriminated patients with a poor survival at 90 days (12% survived) from those with a good survival at 90 days (100%) (P < .001). CONCLUSIONS: We integrated data on baseline liver gene expression pattern and the MELD score to create the gs-MELD scoring system, which identifies patients with severe AH, treated or not with corticosteroids, most and least likely to survive for 90 and 180 days.
Subject(s)
Decision Support Techniques , Gene Expression Profiling/methods , Hepatitis, Alcoholic/diagnosis , Hepatitis, Alcoholic/genetics , Transcriptome , Adrenal Cortex Hormones/therapeutic use , Adult , Area Under Curve , Belgium , Biopsy , Female , Genetic Markers , Genetic Predisposition to Disease , Hepatitis, Alcoholic/drug therapy , Hepatitis, Alcoholic/mortality , Humans , Kaplan-Meier Estimate , Male , Middle Aged , Oligonucleotide Array Sequence Analysis , Phenotype , Predictive Value of Tests , Proportional Hazards Models , ROC Curve , Reproducibility of Results , Risk Assessment , Risk Factors , Severity of Illness Index , Time Factors , Treatment OutcomeABSTRACT
BACKGROUND: Modern high-throughput genomic technologies represent a comprehensive hallmark of molecular changes in pan-cancer studies. Although different cancer gene signatures have been revealed, the mechanism of tumourigenesis has yet to be completely understood. Pathways and networks are important tools to explain the role of genes in functional genomic studies. However, few methods consider the functional non-equal roles of genes in pathways and the complex gene-gene interactions in a network. RESULTS: We present a novel method in pan-cancer analysis that identifies de-regulated genes with a functional role by integrating pathway and network data. A pan-cancer analysis of 7158 tumour/normal samples from 16 cancer types identified 895 genes with a central role in pathways and de-regulated in cancer. Comparing our approach with 15 current tools that identify cancer driver genes, we found that 35.6% of the 895 genes identified by our method have been found as cancer driver genes with at least 2/15 tools. Finally, we applied a machine learning algorithm on 16 independent GEO cancer datasets to validate the diagnostic role of cancer driver genes for each cancer. We obtained a list of the top-ten cancer driver genes for each cancer considered in this study. CONCLUSIONS: Our analysis 1) confirmed that there are several known cancer driver genes in common among different types of cancer, 2) highlighted that cancer driver genes are able to regulate crucial pathways.
Subject(s)
Biomarkers, Tumor/genetics , Gene Regulatory Networks , Genomics/methods , Neoplasms/genetics , Signal Transduction , Algorithms , Case-Control Studies , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , HumansABSTRACT
SUMMARY: Identifying molecular cancer subtypes from multi-omics data is an important step in the personalized medicine. We introduce CancerSubtypes, an R package for identifying cancer subtypes using multi-omics data, including gene expression, miRNA expression and DNA methylation data. CancerSubtypes integrates four main computational methods which are highly cited for cancer subtype identification and provides a standardized framework for data pre-processing, feature selection, and result follow-up analyses, including results computing, biology validation and visualization. The input and output of each step in the framework are packaged in the same data format, making it convenience to compare different methods. The package is useful for inferring cancer subtypes from an input genomic dataset, comparing the predictions from different well-known methods and testing new subtype discovery methods, as shown with different application scenarios in the Supplementary Material. AVAILABILITY AND IMPLEMENTATION: The package is implemented in R and available under GPL-2 license from the Bioconductor website (http://bioconductor.org/packages/CancerSubtypes/). CONTACT: thuc.le@unisa.edu.au or jiuyong.li@unisa.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Neoplasms/classification , Neoplasms/genetics , Software , Computer Graphics , DNA Methylation , Gene Expression , Genomics , Humans , MicroRNAs/metabolism , Neoplasms/metabolismABSTRACT
The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.