RESUMO
Tumors are complex masses composed of malignant and non-malignant cells. Variation in tumor purity (proportion of cancer cells in a sample) can both confound integrative analysis and enable studies of tumor heterogeneity. Here we developed PUREE, which uses a weakly supervised learning approach to infer tumor purity from a tumor gene expression profile. PUREE was trained on gene expression data and genomic consensus purity estimates from 7864 solid tumor samples. PUREE predicted purity with high accuracy across distinct solid tumor types and generalized to tumor samples from unseen tumor types and cohorts. Gene features of PUREE were further validated using single-cell RNA-seq data from distinct tumor types. In a comprehensive benchmark, PUREE outperformed existing transcriptome-based purity estimation approaches. Overall, PUREE is a highly accurate and versatile method for estimating tumor purity and interrogating tumor heterogeneity from bulk tumor gene expression data, which can complement genomics-based approaches or be used in settings where genomic data is unavailable.
Assuntos
Perfilação da Expressão Gênica , Neoplasias , Humanos , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Transcriptoma , GenômicaRESUMO
Angiosarcomas are rare, clinically aggressive tumors with limited treatment options and a dismal prognosis. We analyzed angiosarcomas from 68 patients, integrating information from multiomic sequencing, NanoString immuno-oncology profiling, and multiplex immunohistochemistry and immunofluorescence for tumor-infiltrating immune cells. Through whole-genome sequencing (n = 18), 50% of the cutaneous head and neck angiosarcomas exhibited higher tumor mutation burden (TMB) and UV mutational signatures; others were mutationally quiet and non-UV driven. NanoString profiling revealed 3 distinct patient clusters represented by lack (clusters 1 and 2) or enrichment (cluster 3) of immune-related signaling and immune cells. Neutrophils (CD15+), macrophages (CD68+), cytotoxic T cells (CD8+), Tregs (FOXP3+), and PD-L1+ cells were enriched in cluster 3 relative to clusters 2 and 1. Likewise, tumor inflammation signature (TIS) scores were highest in cluster 3 (7.54 vs. 6.71 vs. 5.75, respectively; P < 0.0001). Head and neck angiosarcomas were predominant in clusters 1 and 3, providing the rationale for checkpoint immunotherapy, especially in the latter subgroup with both high TMB and TIS scores. Cluster 2 was enriched for secondary angiosarcomas and exhibited higher expression of DNMT1, BRD3/4, MYC, HRAS, and PDGFRB, in keeping with the upregulation of epigenetic and oncogenic signaling pathways amenable to targeted therapies. Molecular and immunological dissection of angiosarcomas may provide insights into opportunities for precision medicine.
Assuntos
Hemangiossarcoma , Proteínas de Neoplasias , Linhagem Celular Tumoral , Feminino , Hemangiossarcoma/classificação , Hemangiossarcoma/genética , Hemangiossarcoma/imunologia , Humanos , Inflamação/classificação , Inflamação/genética , Inflamação/imunologia , Masculino , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/imunologiaRESUMO
UNLABELLED: Predicting motif pairs from a set of protein sequences based on the protein-protein interaction data is an important, but difficult computational problem. Tan et al. proposed a solution to this problem. However, the scoring function (using chi(2) testing) used in their approach is not adequate and their approach is also not scalable. It may take days to process a set of 5000 protein sequences with about 20,000 interactions. Later, Leung et al. proposed an improved scoring function and faster algorithms for solving the same problem. But, the model used in Leung et al. is complicated. The exact value of the scoring function is not easy to compute and an estimated value is used in practice. In this paper, we derive a better model to capture the significance of a given motif pair based on a clustering notion. We develop a fast heuristic algorithm to solve the problem. The algorithm is able to locate the correct motif pair in the yeast data set in about 45 minutes for 5000 protein sequences and 20,000 interactions. Moreover, we derive a lower bound result for the p-value of a motif pair in order for it to be distinguishable from random motif pairs. The lower bound result has been verified using simulated data sets. AVAILABILITY: http://alse.cs.hku.hk/motif_pair.
Assuntos
Algoritmos , Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Sítios de Ligação , Dados de Sequência Molecular , Ligação Proteica , Estrutura Terciária de ProteínaRESUMO
Effective development of host cells for therapeutic protein production is hampered by the poor characterization of cellular transfection. Here, we employed a multi-omics-based systems biotechnology approach to elucidate the genotypic and phenotypic differences between a wild-type and recombinant antibody-producing Chinese hamster ovary (CHO) cell line. At the genomic level, we observed extensive rearrangements in specific targeted loci linked to transgene integration sites. Transcriptional re-wiring of DNA damage repair and cellular metabolism in the antibody producer, via changes in gene copy numbers, was also detected. Subsequent integration of transcriptomic data with a genome-scale metabolic model showed a substantial increase in energy metabolism in the antibody producer. Metabolomics, lipidomics, and glycomics analyses revealed an elevation in long-chain lipid species, potentially associated with protein transport and secretion requirements, and a surprising stability of N-glycosylation profiles between both cell lines. Overall, the proposed knowledge-based systems biotechnology framework can further accelerate mammalian cell-line engineering in a targeted manner.
Assuntos
Células CHO/metabolismo , Proteínas Recombinantes/biossíntese , Biologia de Sistemas/métodos , Animais , Biotecnologia/métodos , Cricetulus , Dosagem de Genes/genética , Genoma , Glicômica , Glicosilação , Mamíferos/genética , Metabolômica , Proteínas Recombinantes/metabolismo , Transcriptoma , Transfecção/métodos , Transgenes/genéticaRESUMO
Hepatocellular carcinoma (HCC) has one of the poorest survival rates among cancers. Using multi-regional sampling of nine resected HCC with different aetiologies, here we construct phylogenetic relationships of these sectors, showing diverse levels of genetic sharing, spanning early to late diversification. Unlike the variegated pattern found in colorectal cancers, a large proportion of HCC display a clear isolation-by-distance pattern where spatially closer sectors are genetically more similar. Two resected intra-hepatic metastases showed genetic divergence occurring before and after primary tumour diversification, respectively. Metastatic tumours had much higher variability than their primary tumours, suggesting that intra-hepatic metastasis is accompanied by rapid diversification at the distant location. The presence of co-existing mutations offers the possibility of drug repositioning for HCC treatment. Taken together, these insights into intra-tumour heterogeneity allow for a comprehensive understanding of the evolutionary trajectories of HCC and suggest novel avenues for personalized therapy.