RESUMO
Model systems are an essential resource in cancer research. They simulate effects that we can infer into humans, but come at a risk of inaccurately representing human biology. This inaccuracy can lead to inconclusive experiments or misleading results, urging the need for an improved process for translating model system findings into human-relevant data. We present a process for applying joint dimension reduction (jDR) to horizontally integrate gene expression data across model systems and human tumor cohorts. We then use this approach to combine human TCGA gene expression data with data from human cancer cell lines and mouse model tumors. By identifying the aspects of genomic variation joint-acting across cohorts, we demonstrate how predictive modeling and clinical biomarkers from model systems can be improved.
Assuntos
Neoplasias , Transcriptoma , Animais , Camundongos , Humanos , Neoplasias/genética , Neoplasias/patologia , Perfilação da Expressão Gênica , BiomarcadoresRESUMO
Primary liver cancer, consisting of both cholangiocarcinoma (CCA) and hepatocellular carcinoma (HCC), is the second leading cause of cancer deaths worldwide. Our goal is to genomically characterize rare HCC subclasses to provide insight into disease biology. Leveraging The Cancer Genome Atlas (TCGA) to perform a combined analysis of CCA (n = 36) and HCC (n = 275), we integrated multiple genomic platforms, to assess transcriptional profiles, mutational signatures, and copy number patterns to uncover underlying etiology and linage specific patterns. We identified two molecular classes distinct from prototypical HCC tumors. The first, CCA-Like, although histologically indistinguishable from HCC, had enrichment of CCA mutations (IDH1, BAP1), mutational signatures, and transcriptional patterns (SOX9, KRT19). CCA-Like, however, retained a copy number landscape similar to HCC, suggesting a hepatocellular linage. The second, Blast-Like, is enriched in TP53 mutations, HBV infection, exposure related mutational signatures and transcriptionally similar to hepatoblasts. Although these subclasses are molecularly distinct, they both have a worse progression-free survival compared to classical HCC tumors, yet are clinically treated the same. The identification of and characterization of CCA-Like and Blast-Like subclasses advance our knowledge of HCC as well as represents an urgent need for the identification of class specific biomarkers and targeted therapy.
Assuntos
Carcinoma Hepatocelular/genética , Variações do Número de Cópias de DNA , Neoplasias Hepáticas/genética , Mutação , Transcrição Gênica , HumanosRESUMO
The E3 ubiquitin ligase Rad18 promotes a damage-tolerant and error-prone mode of DNA replication termed trans-lesion synthesis that is pathologically activated in cancer. However, the impact of vertebrate Rad18 on cancer genomes is not known. To determine how Rad18 affects mutagenesis in vivo, we have developed and implemented a novel computational pipeline to analyze genomes of carcinogen (7, 12-Dimethylbenz[a]anthracene, DMBA)-induced skin tumors from Rad18+/+ and Rad18- / - mice. We show that Rad18 mediates specific mutational signatures characterized by high levels of A(T)>T(A) single nucleotide variations (SNVs). In Rad18- /- tumors, an alternative mutation pattern arises, which is characterized by increased numbers of deletions >4 bp. Comparison with annotated human mutational signatures shows that COSMIC signature 22 predominates in Rad18+/+ tumors whereas Rad18- / - tumors are characterized by increased contribution of COSMIC signature 3 (a hallmark of BRCA-mutant tumors). Analysis of The Cancer Genome Atlas shows that RAD18 expression is strongly associated with high SNV burdens, suggesting RAD18 also promotes mutagenesis in human cancers. Taken together, our results show Rad18 promotes mutagenesis in vivo, modulates DNA repair pathway choice in neoplastic cells, and mediates specific mutational signatures that are present in human tumors.
RESUMO
The Mre11-Rad50-Nbs1 complex is a DNA double-strand break sensor that mediates a tumor-suppressive DNA damage response (DDR) in cells undergoing oncogenic stress, yet the mechanisms underlying this effect are poorly understood. Using a genetically inducible primary mammary epithelial cell model, we demonstrate that Mre11 suppresses proliferation and DNA damage induced by diverse oncogenic drivers through a p53-independent mechanism. Breast tumorigenesis models engineered to express a hypomorphic Mre11 allele exhibit increased levels of oncogene-induced DNA damage, R-loop accumulation, and chromosomal instability with a characteristic copy number loss phenotype. Mre11 complex dysfunction is identified in a subset of human triple-negative breast cancers and is associated with increased sensitivity to DNA-damaging therapy and inhibitors of ataxia telangiectasia and Rad3 related (ATR) and poly (ADP-ribose) polymerase (PARP). Thus, deficiencies in the Mre11-dependent DDR drive proliferation and genome instability patterns in p53-deficient breast cancers and represent an opportunity for therapeutic exploitation.
Assuntos
Carcinogênese/patologia , Dano ao DNA , Instabilidade Genômica , Proteína Supressora de Tumor p53/metabolismo , Animais , Proteínas Mutadas de Ataxia Telangiectasia/antagonistas & inibidores , Proteínas Mutadas de Ataxia Telangiectasia/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Proliferação de Células , Células Cultivadas , Instabilidade Cromossômica , Células Epiteliais/metabolismo , Dosagem de Genes , Células HEK293 , Humanos , Proteína Homóloga a MRE11/metabolismo , Glândulas Mamárias Animais/patologia , Camundongos , Modelos Biológicos , Oncogenes , Fenótipo , Inibidores de Poli(ADP-Ribose) Polimerases/farmacologia , Estruturas R-LoopRESUMO
BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. RESULTS: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the "common reference", which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the "common reference". One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. CONCLUSIONS: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV.
Assuntos
Contaminação por DNA , Sequenciamento de Nucleotídeos em Larga Escala/normas , Técnicas de Diagnóstico Molecular/normas , Neoplasias/genética , RNA , Animais , Linhagem Celular Tumoral , Biologia Computacional/métodos , Células HeLa , Humanos , Camundongos , Neoplasias/diagnóstico , Neoplasias/virologia , Filogenia , Software , Fluxo de TrabalhoRESUMO
Polymerase theta (Pol θ, gene name Polq) is a widely conserved DNA polymerase that mediates a microhomology-mediated, error-prone, double strand break (DSB) repair pathway, referred to as Theta Mediated End Joining (TMEJ). Cells with homologous recombination deficiency are reliant on TMEJ for DSB repair. It is unknown whether deficiencies in other components of the DNA damage response (DDR) also result in Pol θ addiction. Here we use a CRISPR genetic screen to uncover 140 Polq synthetic lethal (PolqSL) genes, the majority of which were previously unknown. Functional analyses indicate that Pol θ/TMEJ addiction is associated with increased levels of replication-associated DSBs, regardless of the initial source of damage. We further demonstrate that approximately 30% of TCGA breast cancers have genetic alterations in PolqSL genes and exhibit genomic scars of Pol θ/TMEJ hyperactivity, thereby substantially expanding the subset of human cancers for which Pol θ inhibition represents a promising therapeutic strategy.
Assuntos
Neoplasias da Mama/genética , Reparo do DNA por Junção de Extremidades/genética , DNA Polimerase Dirigida por DNA/genética , Aminoquinolinas/toxicidade , Animais , Sistemas CRISPR-Cas/genética , Linhagem Celular , Quebras de DNA de Cadeia Dupla , DNA Polimerase Dirigida por DNA/metabolismo , Células HEK293 , Humanos , Camundongos , Mitomicina/toxicidade , Ácidos Picolínicos/toxicidade , DNA Polimerase tetaRESUMO
BACKGROUND: Measures of the adaptive immune response have prognostic and predictive associations in melanoma and other cancer types. Specifically, intratumoral T cell density and function have considerable prognostic and predictive value in skin cutaneous melanoma (SKCM). Less is known about the significance of tumor-infiltrating B cells in SKCM. Our goal was to understand the prognostic and predictive value of B cell phenotypic subsets in SKCM using RNA sequencing. METHODS: We used our previously published algorithm, V'DJer, to assemble B cell receptor (BCR) repertoires and estimate diversity from short-read RNA sequencing (RNA-seq). We applied machine learning-based cellular phenotype classifiers to measure relative similarity of bulk tumor sample gene expression profiles and different B cell phenotypes. We assessed these aspects of B cell biology in 473 SKCM from the Cancer Genome Atlas Project (TCGA) as well as in RNA-seq data corresponding to tumor samples procured from patients who received CTLA-4 and PD-1 inhibitors for metastatic SKCM. RESULTS: We found that the BCR repertoire was associated with different clinical factors, such as tumor tissue site and sex. However, increased clonality of the BCR repertoire was favorably prognostic in SKCM and was prognostic even after first conditioning on various clinical factors. Mutation burden was not correlated with any BCR measurement, and no specific mutation had an altered BCR repertoire. Lack of an assembled BCR in pre-treatment tumor tissues was associated with a lack of anti-tumor response to a CTLA-4 inhibitor in metastatic SKCM. CONCLUSIONS: These findings suggest an important prognostic and predictive role for B cell characteristics in SKCM. This has implications for melanoma immunobiology and potential development of immunogenomics features to predict survival and response to immunotherapy.
Assuntos
Linfócitos B/imunologia , Biomarcadores Tumorais/imunologia , Linfócitos do Interstício Tumoral/imunologia , Melanoma/imunologia , Neoplasias Cutâneas/imunologia , Biomarcadores Tumorais/normas , Humanos , Melanoma/patologia , Neoplasias Cutâneas/patologiaRESUMO
MOTIVATION: Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. RESULTS: We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing. AVAILABILITY AND IMPLEMENTATION: ABRA2 is implemented in a combination of Java and C/C++ and is freely available to all from: https://github.com/mozack/abra2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Mutação INDEL , RNA , Análise de Sequência de DNA , Software , DNA , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Epstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n = 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P ≤ 1.4 × 10-20) and diversity (P ≤ 8.3 × 10-27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population. IMPORTANCE Around 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.
RESUMO
We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-ß dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field.
Assuntos
Genômica/métodos , Neoplasias , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Feminino , Humanos , Interferon gama/genética , Interferon gama/imunologia , Macrófagos/imunologia , Masculino , Pessoa de Meia-Idade , Neoplasias/classificação , Neoplasias/genética , Neoplasias/imunologia , Prognóstico , Equilíbrio Th1-Th2/fisiologia , Fator de Crescimento Transformador beta/genética , Fator de Crescimento Transformador beta/imunologia , Cicatrização/genética , Cicatrização/imunologia , Adulto JovemRESUMO
Changes in the quantity of genetic material, known as somatic copy number alterations (CNAs), can drive tumorigenesis. Many methods exist for assessing CNAs using microarrays, but considerable technical issues limit current CNA calling based upon DNA sequencing. We present SynthEx, a novel tool for detecting CNAs from whole exome and genome sequencing. SynthEx utilizes a "synthetic-normal" strategy to overcome technical and financial issues. In terms of accuracy and precision, SynthEx is highly comparable to array-based methods and outperforms sequencing-based CNA detection tools. SynthEx robustly identifies CNAs using sequencing data without the additional costs associated with matched normal specimens.
Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Neoplasias/genética , Análise de Sequência de DNA , Software , Análise por Conglomerados , Exoma , Éxons , Dosagem de Genes , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodosRESUMO
We report the discovery of a claudin-low molecular subtype of high-grade bladder cancer that shares characteristics with the homonymous subtype of breast cancer. Claudin-low bladder tumors were enriched for multiple genetic features including increased rates of RB1, EP300, and NCOR1 mutations; increased frequency of EGFR amplification; decreased rates of FGFR3, ELF3, and KDM6A mutations; and decreased frequency of PPARG amplification. While claudin-low tumors showed the highest expression of immune gene signatures, they also demonstrated gene expression patterns consistent with those observed in active immunosuppression. This did not appear to be due to differences in predicted neoantigen burden, but rather was associated with broad upregulation of cytokine and chemokine levels from low PPARG activity, allowing unopposed NFKB activity. Taken together, these results define a molecular subtype of bladder cancer with distinct molecular features and an immunologic profile that would, in theory, be primed for immunotherapeutic response.
Assuntos
Claudinas/genética , Neoplasias da Bexiga Urinária/genética , Antígenos de Neoplasias/metabolismo , Quimiocinas/imunologia , Citocinas/imunologia , Humanos , Tolerância Imunológica , Leucócitos/imunologia , NF-kappa B/metabolismo , PPAR gama/metabolismo , Microambiente Tumoral , Neoplasias da Bexiga Urinária/classificação , Neoplasias da Bexiga Urinária/imunologiaRESUMO
MOTIVATION: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. RESULTS: We present here V'DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V'DJer's ability to accurately reconstruct BCR repertoires from short read mRNA-seq data. AVAILABILITY AND IMPLEMENTATION: V'DJer is implemented in C/C ++, freely available for academic use and can be downloaded from Github: https://github.com/mozack/vdjer CONTACT: benjamin_vincent@med.unc.edu or parkerjs@email.unc.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Receptores de Antígenos de Linfócitos B/genética , Análise de Sequência de RNA/métodos , Software , Linfócitos B , HumanosRESUMO
PURPOSE: To evaluate germline variants in hereditary cancer susceptibility genes among unselected cancer patients undergoing tumor-germline sequencing. EXPERIMENTAL DESIGN: Germline sequence data from 439 individuals undergoing tumor-germline dyad sequencing through the LCCC1108/UNCseq™ (NCT01457196) study were analyzed for genetic variants in 36 hereditary cancer susceptibility genes. These variants were analyzed as an exploratory research study to determine whether pathogenic variants exist within the germline of patients undergoing tumor-germline sequencing. Patients were unselected with respect to indicators of hereditary cancer predisposition. RESULTS: Variants indicative of hereditary cancer predisposition were identified in 19 (4.3%) patients. For about half (10/19), these findings represent new diagnostic information with potentially important implications for the patient and their family. The others were previously identified through clinical genetic evaluation secondary to suspicion of a hereditary cancer predisposition. Genes with pathogenic variants included ATM, BRCA1, BRCA2, CDKN2A, and CHEK2 In contrast, a substantial proportion of patients (178, 40.5%) had Variants of Uncertain Significance (VUS), 24 of which had VUS in genes pertinent to the presenting cancer. Another 143 had VUS in other hereditary cancer genes, and 11 had VUS in both pertinent and nonpertinent genes. CONCLUSIONS: Germline analysis in tumor-germline sequencing dyads will occasionally reveal significant germline findings that were clinically occult, which could be beneficial for patients and their families. However, given the low yield for unexpected germline variation and the large proportion of patients with VUS results, analysis and return of germline results should adhere to guidelines for secondary findings rather than diagnostic hereditary cancer testing. Clin Cancer Res; 22(16); 4087-94. ©2016 AACRSee related commentary by Mandelker, p. 3987.
Assuntos
Mutação em Linhagem Germinativa , Neoplasias/diagnóstico , Neoplasias/genética , Biomarcadores Tumorais , Predisposição Genética para Doença , Testes Genéticos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/mortalidade , Síndromes Neoplásicas Hereditárias/diagnóstico , Síndromes Neoplásicas Hereditárias/genética , PrognósticoRESUMO
MOTIVATION: Variant detection from next-generation sequencing (NGS) data is an increasingly vital aspect of disease diagnosis, treatment and research. Commonly used NGS-variant analysis tools generally rely on accurately mapped short reads to identify somatic variants and germ-line genotypes. Existing NGS read mappers have difficulty accurately mapping short reads containing complex variation (i.e. more than a single base change), thus making identification of such variants difficult or impossible. Insertions and deletions (indels) in particular have been an area of great difficulty. Indels are frequent and can have substantial impact on function, which makes their detection all the more imperative. RESULTS: We present ABRA, an assembly-based realigner, which uses an efficient and flexible localized de novo assembly followed by global realignment to more accurately remap reads. This results in enhanced performance for indel detection as well as improved accuracy in variant allele frequency estimation. AVAILABILITY AND IMPLEMENTATION: ABRA is implemented in a combination of Java and C/C++ and is freely available for download at https://github.com/mozack/abra.
Assuntos
Biologia Computacional/métodos , Mutação INDEL , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Frequência do Gene , Genoma Humano , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linguagens de Programação , SoftwareRESUMO
Identifying somatic mutations is critical for cancer genome characterization and for prioritizing patient treatment. DNA whole exome sequencing (DNA-WES) is currently the most popular technology; however, this yields low sensitivity in low purity tumors. RNA sequencing (RNA-seq) covers the expressed exome with depth proportional to expression. We hypothesized that integrating DNA-WES and RNA-seq would enable superior mutation detection versus DNA-WES alone. We developed a first-of-its-kind method, called UNCeqR, that detects somatic mutations by integrating patient-matched RNA-seq and DNA-WES. In simulation, the integrated DNA and RNA model outperformed the DNA-WES only model. Validation by patient-matched whole genome sequencing demonstrated superior performance of the integrated model over DNA-WES only models, including a published method and published mutation profiles. Genome-wide mutational analysis of breast and lung cancer cohorts (n = 871) revealed remarkable tumor genomics properties. Low purity tumors experienced the largest gains in mutation detection by integrating RNA-seq and DNA-WES. RNA provided greater mutation signal than DNA in expressed mutations. Compared to earlier studies on this cohort, UNCeqR increased mutation rates of driver and therapeutically targeted genes (e.g. PIK3CA, ERBB2 and FGFR2). In summary, integrating RNA-seq with DNA-WES increases mutation detection performance, especially for low purity tumors.