ABSTRACT
Characterization of the prostate cancer transcriptome and genome has identified chromosomal rearrangements and copy number gains and losses, including ETS gene family fusions, PTEN loss and androgen receptor (AR) amplification, which drive prostate cancer development and progression to lethal, metastatic castration-resistant prostate cancer (CRPC). However, less is known about the role of mutations. Here we sequenced the exomes of 50 lethal, heavily pre-treated metastatic CRPCs obtained at rapid autopsy (including three different foci from the same patient) and 11 treatment-naive, high-grade localized prostate cancers. We identified low overall mutation rates even in heavily treated CRPCs (2.00 per megabase) and confirmed the monoclonal origin of lethal CRPC. Integrating exome copy number analysis identified disruptions of CHD1 that define a subtype of ETS gene family fusion-negative prostate cancer. Similarly, we demonstrate that ETS2, which is deleted in approximately one-third of CRPCs (commonly through TMPRSS2:ERG fusions), is also deregulated through mutation. Furthermore, we identified recurrent mutations in multiple chromatin- and histone-modifying genes, including MLL2 (mutated in 8.6% of prostate cancers), and demonstrate interaction of the MLL complex with the AR, which is required for AR-mediated signalling. We also identified novel recurrent mutations in the AR collaborating factor FOXA1, which is mutated in 5 of 147 (3.4%) prostate cancers (both untreated localized prostate cancer and CRPC), and showed that mutated FOXA1 represses androgen signalling and increases tumour growth. Proteins that physically interact with the AR, such as the ERG gene fusion product, FOXA1, MLL2, UTX (also known as KDM6A) and ASXL1 were found to be mutated in CRPC. In summary, we describe the mutational landscape of a heavily treated metastatic cancer, identify novel mechanisms of AR signalling deregulated in prostate cancer, and prioritize candidates for future study.
Subject(s)
Prostatic Neoplasms/genetics , Cell Proliferation , Cells, Cultured , Hepatocyte Nuclear Factor 3-alpha/genetics , Humans , Male , Molecular Sequence Data , Mutation , Orchiectomy , Prostatic Neoplasms/pathology , Receptors, Androgen/metabolism , Sequence Alignment , Signal TransductionABSTRACT
BACKGROUND: Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS: We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION: QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.
Subject(s)
Proteins/chemistry , User-Computer Interface , Algorithms , Cluster Analysis , Databases, Protein , Humans , Internet , Mutation , Neoplasms/genetics , Neoplasms/pathology , Protein Structure, Quaternary , Proteins/genetics , Proteins/metabolismABSTRACT
BACKGROUND: It is well known that the development of cancer is caused by the accumulation of somatic mutations within the genome. For oncogenes specifically, current research suggests that there is a small set of "driver" mutations that are primarily responsible for tumorigenesis. Further, due to recent pharmacological successes in treating these driver mutations and their resulting tumors, a variety of approaches have been developed to identify potential driver mutations using methods such as machine learning and mutational clustering. We propose a novel methodology that increases our power to identify mutational clusters by taking into account protein tertiary structure via a graph theoretical approach. RESULTS: We have designed and implemented GraphPAC (Graph Protein Amino acid Clustering) to identify mutational clustering while considering protein spatial structure. Using GraphPAC, we are able to detect novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of prior clustering based on current methods. Specifically, by utilizing the spatial information available in the Protein Data Bank (PDB) along with the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory to account for the tertiary structure, GraphPAC discovers clusters in DPP4, NRP1 and other proteins not identified by existing methods. The R package is available at: http://bioconductor.org/packages/release/bioc/html/GraphPAC.html. CONCLUSION: GraphPAC provides an alternative to iPAC and an extension to current methodology when identifying potential activating driver mutations by utilizing a graph theoretic approach when considering protein tertiary structure.
Subject(s)
Mutation , Protein Structure, Tertiary/genetics , Cluster Analysis , Genes, Neoplasm , Proteins/geneticsABSTRACT
BACKGROUND: Current research suggests that a small set of "driver" mutations are responsible for tumorigenesis while a larger body of "passenger" mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. RESULTS: We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html. CONCLUSION: SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.
Subject(s)
Computational Biology/methods , Mutation , Proteins/chemistry , Proteins/genetics , Algorithms , Cluster Analysis , Databases, Protein , Genes, Neoplasm/genetics , Humans , Neoplasms/genetics , Protein Structure, TertiaryABSTRACT
BACKGROUND: Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key "driver" mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering. RESULTS: We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html. CONCLUSION: Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.
Subject(s)
Algorithms , Mutation , Neoplasm Proteins/genetics , Protein Structure, Tertiary , Cluster Analysis , Humans , Neoplasm Proteins/chemistryABSTRACT
RATIONALE: Therapeutic administration of psychedelics has shown significant potential in historical accounts and recent clinical trials in the treatment of depression and other mood disorders. A recent randomized double-blind phase-IIb study demonstrated the safety and efficacy of COMP360, COMPASS Pathways' proprietary synthetic formulation of psilocybin, in participants with treatment-resistant depression. OBJECTIVE: While the phase-IIb results are promising, the treatment works for a portion of the population and early prediction of outcome is a key objective as it would allow early identification of those likely to require alternative treatment. METHODS: Transcripts were made from audio recordings of the psychological support session between participant and therapist 1 day post COMP360 administration. A zero-shot machine learning classifier based on the BART large language model was used to compute two-dimensional sentiment (valence and arousal) for the participant and therapist from the transcript. These scores, combined with the Emotional Breakthrough Index (EBI) and treatment arm were used to predict treatment outcome as measured by MADRS scores. (Code and data are available at https://github.com/compasspathways/Sentiment2D .) RESULTS: Two multinomial logistic regression models were fit to predict responder status at week 3 and through week 12. Cross-validation of these models resulted in 85% and 88% accuracy and AUC values of 88% and 85%. CONCLUSIONS: A machine learning algorithm using NLP and EBI accurately predicts long-term patient response, allowing rapid prognostication of personalized response to psilocybin treatment and insight into therapeutic model optimization. Further research is required to understand if language data from earlier stages in the therapeutic process hold similar predictive power.
ABSTRACT
Peptide mapping with liquid chromatography-tandem mass spectrometry (LC-MS/MS) is an important analytical method for characterization of post-translational and chemical modifications in therapeutic proteins. Despite its importance, there is currently no consensus on the statistical analysis of the resulting data. In this manuscript, we distinguish three statistical goals for therapeutic protein characterization: (1) estimation of site occupancy of modifications in one condition, (2) detection of differential site occupancy between conditions, and (3) estimation of combined site occupancy across multiple modification sites. We propose an approach, which addresses these goals in terms of summarizing the quantitative information from the mass spectra, statistical modeling, and model-based analysis of LC-MS/MS data. We illustrate the approach using an LC-MS/MS experiment from an antibody-drug conjugate and its monoclonal antibody intermediate. The performance was compared to a 'naïve' data analysis approach, by using computer simulation, evaluation of differential site occupancy in positive and negative controls, and comparisons of estimated site occupancy with orthogonal experimental measurements of N-linked glycoforms and total oxidation. The results demonstrated the importance of replicated studies of protein characterization, and of appropriate statistical modeling, for reproducible, accurate and efficient site occupancy estimation and differential analysis.
Subject(s)
Biological Products/chemistry , Biostatistics , Protein Processing, Post-Translational , Proteins/chemistry , Technology, Pharmaceutical , Biological Products/pharmacology , Chromatography, High Pressure Liquid , Peptide Mapping , Proteins/pharmacology , Tandem Mass SpectrometryABSTRACT
Cancers exhibit extensive mutational heterogeneity, and the resulting long-tail phenomenon complicates the discovery of genes and pathways that are significantly mutated in cancer. We perform a pan-cancer analysis of mutated networks in 3,281 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) using HotNet2, a new algorithm to find mutated subnetworks that overcomes the limitations of existing single-gene, pathway and network approaches. We identify 16 significantly mutated subnetworks that comprise well-known cancer signaling pathways as well as subnetworks with less characterized roles in cancer, including cohesin, condensin and others. Many of these subnetworks exhibit co-occurring mutations across samples. These subnetworks contain dozens of genes with rare somatic mutations across multiple cancers; many of these genes have additional evidence supporting a role in cancer. By illuminating these rare combinations of mutations, pan-cancer network analyses provide a roadmap to investigate new diagnostic and therapeutic opportunities across cancer types.