RESUMO
Transposon-derived transcripts are abundant in RNA sequences, yet their landscape and function, especially for fusion transcripts derived from unannotated or somatically acquired transposons, remains underexplored. Here, we developed a new bioinformatic tool to detect transposon-fusion transcripts in RNA-sequencing data and performed a pan-cancer analysis of 10,257 cancer samples across 34 cancer types as well as 3,088 normal tissue samples. We identified 52,277 cancer-specific fusions with ~30 events per cancer and hotspot loci within transposons vulnerable to fusion formation. Exonization of intronic transposons was the most prevalent genic fusions, while somatic L1 insertions constituted a small fraction of cancer-specific fusions. Source L1s and HERVs, but not Alus showed decreased DNA methylation in cancer upon fusion formation. Overall cancer-specific L1 fusions were enriched in tumor suppressors while Alu fusions were enriched in oncogenes, including recurrent Alu fusions in EZH2 predictive of patient survival. We also demonstrated that transposon-derived peptides triggered CD8+ T-cell activation to the extent comparable to EBV viruses. Our findings reveal distinct epigenetic and tumorigenic mechanisms underlying transposon fusions across different families and highlight transposons as novel therapeutic targets and the source of potent neoantigens.
RESUMO
MOTIVATION: Essential gene signatures for cancer growth have been typically identified via RNAi or CRISPR-Cas9. Here, we propose an alternative method that reveals the essential gene signatures by analysing genomic expression profiles in compound-treated cells. With a large amount of the existing compound-induced data, essential gene signatures at genomic scale are efficiently characterized without technical challenges in the previous techniques. RESULTS: An essential gene is characterized as a gene presenting positive correlation between its down-regulation and cell growth inhibition induced by diverse compounds, which were collected from LINCS and CGP. Among 12 741 genes, 1092, 1 228 827 962, 1 664 580 and 829 essential genes are characterized for each of A375, A549, BT20, LNCAP, MCF7, MDAMB231 and PC3 cell lines (P-value ≤ 1.0E-05). Comparisons to the previously identified essential genes yield significant overlaps in A375 and A549 (P-value ≤ 5.0E-05) and the 103 common essential genes are enriched in crucial processes for cancer growth. In most comparisons in A375, MCF7, BT20 and A549, the characterized essential genes yield more essential characteristics than those of the previous techniques, i.e. high gene expression, high degrees of protein-protein interactions, many homologs and few paralogs. Remarkably, the essential genes commonly characterized by both the previous and proposed techniques show more significant essential characteristics than those solely relied on the previous techniques. We expect that this work provides new aspects in essential gene signatures. AVAILABILITY AND IMPLEMENTATION: The Python implementations are available at https://github.com/jmjung83/deconvolution_of_essential_gene_signitures. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Genes Essenciais , Genômica , Neoplasias , Expressão Gênica , Genômica/métodos , Humanos , Neoplasias/genéticaRESUMO
In silico network-based methods have shown promising results in the field of drug development. Yet, most of networks used in the previous research have not included context information even though biological associations actually do appear in the specific contexts. Here, we reconstruct an anatomical context-specific network by assigning contexts to biological associations using protein expression data and scientific literature. Furthermore, we employ the context-specific network for the analysis of drug effects with a proximity measure between drug targets and diseases. Distinct from previous context-specific networks, intercellular associations and phenomic level entities such as biological processes are included in our network to represent the human body. It is observed that performances in inferring drug-disease associations are increased by adding context information and phenomic level entities. In particular, hypertension, a disease related to multiple organs and associated with several phenomic level entities, is analyzed in detail to investigate how our network facilitates the inference of drug-disease associations. Our results indicate that the inclusion of context information, intercellular associations, and phenomic level entities can contribute towards a better prediction of drug-disease associations and provide detailed insight into understanding of how drugs affect diseases in the human body.