RESUMEN
Identification of non-coding mutations driving tumorigenesis requires alternative approaches to coding mutations. Enriched associations between mutated regulatory elements and altered cis-regulation in tumors are a promising approach to stratify candidate non-coding driver mutations. Here we provide a bioinformatics pipeline to mine data from the Cancer Genomic Commons (GDC) for such associations. The pipeline integrates RNA and whole-genome sequencing with genotyping data to reveal putative non-coding driver mutations by cancer type. For complete information on the generation and use of this protocol, please refer to Cheng et al. (2021).
Asunto(s)
Carcinogénesis/genética , Biología Computacional/métodos , Mutación/genética , Neoplasias/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Bases de Datos Genéticas , HumanosRESUMEN
Despite the recent availability of complete genome sequences of tumors from thousands of patients, isolating disease-causing (driver) non-coding mutations from the plethora of somatic variants remains challenging, and only a handful of validated examples exist. By integrating whole-genome sequencing, genetic data, and allele-specific gene expression from TCGA, we identified 320 somatic non-coding mutations that affect gene expression in cis (FDR<0.25). These mutations cluster into 47 cis-regulatory elements that modulate expression of their subject genes through diverse molecular mechanisms. We further show that these mutations have hallmark features of non-coding drivers; namely, that they preferentially disrupt transcription factor binding motifs, are associated with a selective advantage, increased oncogene expression and decreased tumor suppressor expression.