RESUMO
Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancer-driven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance of tissue lineage in mediating drug response. Logic-based modeling uncovers combinations of alterations that sensitize to drugs, while machine learning demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations.
Assuntos
Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Análise de Variância , Linhagem Celular Tumoral , Metilação de DNA , Resistencia a Medicamentos Antineoplásicos/genética , Dosagem de Genes , Humanos , Modelos Genéticos , Mutação , Neoplasias/genética , Oncogenes , Medicina de PrecisãoRESUMO
Understanding how a subset of expressed genes dictates cellular phenotype is a considerable challenge owing to the large numbers of molecules involved, their combinatorics and the plethora of cellular behaviours that they determine1,2. Here we reduced this complexity by focusing on cellular organization-a key readout and driver of cell behaviour3,4-at the level of major cellular structures that represent distinct organelles and functional machines, and generated the WTC-11 hiPSC Single-Cell Image Dataset v1, which contains more than 200,000 live cells in 3D, spanning 25 key cellular structures. The scale and quality of this dataset permitted the creation of a generalizable analysis framework to convert raw image data of cells and their structures into dimensionally reduced, quantitative measurements that can be interpreted by humans, and to facilitate data exploration. This framework embraces the vast cell-to-cell variability that is observed within a normal population, facilitates the integration of cell-by-cell structural data and allows quantitative analyses of distinct, separable aspects of organization within and across different cell populations. We found that the integrated intracellular organization of interphase cells was robust to the wide range of variation in cell shape in the population; that the average locations of some structures became polarized in cells at the edges of colonies while maintaining the 'wiring' of their interactions with other structures; and that, by contrast, changes in the location of structures during early mitotic reorganization were accompanied by changes in their wiring.
Assuntos
Células-Tronco Pluripotentes Induzidas , Espaço Intracelular , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Análise de Célula Única , Conjuntos de Dados como Assunto , Interfase , Forma Celular , Mitose , Polaridade Celular , Sobrevivência CelularRESUMO
We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional ß-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Our model is flexible and can be trained on images of arbitrary subcellular structures and at varying degrees of sparsity and reconstruction fidelity. We train our full model on 3D cell image data and explore design trade-offs in the 2D setting. Once trained, our model can be used to predict plausible locations of structures in cells where these structures were not imaged. The trained model can also be used to quantify the variation in the location of subcellular structures by generating plausible instantiations of each structure in arbitrary cell geometries. We apply our trained model to a small drug perturbation screen to demonstrate its applicability to new data. We show how the latent representations of drugged cells differ from unperturbed cells as expected by on-target effects of the drugs.
Assuntos
Núcleo Celular/fisiologia , Forma Celular/fisiologia , Células-Tronco Pluripotentes Induzidas/citologia , Espaço Intracelular , Modelos Biológicos , Células Cultivadas , Biologia Computacional , Humanos , Imageamento Tridimensional , Espaço Intracelular/química , Espaço Intracelular/metabolismo , Espaço Intracelular/fisiologia , Microscopia de Fluorescência , Análise de Célula ÚnicaRESUMO
Preterm birth (PTB) complications are the leading cause of long-term morbidity and mortality in children. By using whole blood samples, we integrated whole-genome sequencing (WGS), RNA sequencing (RNA-seq), and DNA methylation data for 270 PTB and 521 control families. We analyzed this combined dataset to identify genomic variants associated with PTB and secondary analyses to identify variants associated with very early PTB (VEPTB) as well as other subcategories of disease that may contribute to PTB. We identified differentially expressed genes (DEGs) and methylated genomic loci and performed expression and methylation quantitative trait loci analyses to link genomic variants to these expression and methylation changes. We performed enrichment tests to identify overlaps between new and known PTB candidate gene systems. We identified 160 significant genomic variants associated with PTB-related phenotypes. The most significant variants, DEGs, and differentially methylated loci were associated with VEPTB. Integration of all data types identified a set of 72 candidate biomarker genes for VEPTB, encompassing genes and those previously associated with PTB. Notably, PTB-associated genes RAB31 and RBPJ were identified by all three data types (WGS, RNA-seq, and methylation). Pathways associated with VEPTB include EGFR and prolactin signaling pathways, inflammation- and immunity-related pathways, chemokine signaling, IFN-γ signaling, and Notch1 signaling. Progress in identifying molecular components of a complex disease is aided by integrated analyses of multiple molecular data types and clinical data. With these data, and by stratifying PTB by subphenotype, we have identified associations between VEPTB and the underlying biology.
Assuntos
Predisposição Genética para Doença/genética , Nascimento Prematuro/genética , Metilação de DNA/genética , Feminino , Genômica/métodos , Humanos , Recém-Nascido , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Transdução de Sinais/genética , Sequenciamento Completo do Genoma/métodosRESUMO
RECQ5 (RECQL5) is one of several human helicases that dissociates RAD51-DNA filaments. The gene that encodes RECQ5 is frequently amplified in human tumors, but it is not known whether amplification correlates with increased gene expression, or how increased RECQ5 levels affect DNA repair at nicks and double-strand breaks. Here, we address these questions. We show that RECQ5 gene amplification correlates with increased gene expression in human tumors, by in silico analysis of over 9000 individual tumors representing 32 tumor types in the TCGA dataset. We demonstrate that, at double-strand breaks, increased RECQ5 levels inhibited canonical homology-directed repair (HDR) by double-stranded DNA donors, phenocopying the effect of BRCA deficiency. Conversely, at nicks, increased RECQ5 levels stimulated 'alternative' HDR by single-stranded DNA donors, which is normally suppressed by RAD51; this was accompanied by stimulation of mutagenic end-joining. Even modest changes (2-fold) in RECQ5 levels caused significant dysregulation of repair, especially HDR. These results suggest that in some tumors, RECQ5 gene amplification may have profound consequences for genomic instability.
Assuntos
Instabilidade Genômica/genética , Neoplasias/genética , Rad51 Recombinase/genética , RecQ Helicases/genética , Simulação por Computador , Quebras de DNA de Cadeia Dupla , Reparo do DNA por Junção de Extremidades/genética , Reparo do DNA/genética , Amplificação de Genes/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Mutagênese , Neoplasias/patologia , Reparo de DNA por Recombinação/genética , Transdução de Sinais/genéticaRESUMO
Cancer researchers have long recognized that somatic mutations are not uniformly distributed within genes. However, most approaches for identifying cancer mutations focus on either the entire-gene or single amino-acid level. We have bridged these two methodologies with a multiscale mutation clustering algorithm that identifies variable length mutation clusters in cancer genes. We ran our algorithm on 539 genes using the combined mutation data in 23 cancer types from The Cancer Genome Atlas (TCGA) and identified 1295 mutation clusters. The resulting mutation clusters cover a wide range of scales and often overlap with many kinds of protein features including structured domains, phosphorylation sites, and known single nucleotide variants. We statistically associated these multiscale clusters with gene expression and drug response data to illuminate the functional and clinical consequences of mutations in our clusters. Interestingly, we find multiple clusters within individual genes that have differential functional associations: these include PTEN, FUBP1, and CDH1. This methodology has potential implications in identifying protein regions for drug targets, understanding the biological underpinnings of cancer, and personalizing cancer treatments. Toward this end, we have made the mutation clusters and the clustering algorithm available to the public. Clusters and pathway associations can be interactively browsed at m2c.systemsbiology.net. The multiscale mutation clustering algorithm is available at https://github.com/IlyaLab/M2C.
Assuntos
Algoritmos , Análise Mutacional de DNA/métodos , Regulação Neoplásica da Expressão Gênica/genética , Família Multigênica/genética , Mutação/genética , Neoplasias/genética , Transdução de Sinais/genética , Mapeamento Cromossômico , DNA de Neoplasias/genética , Genes Neoplásicos/genética , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Genomic information is encoded on a wide range of distance scales, ranging from tens of bases to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as G+C content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations. By integrating the information across all scales, we demonstrated improved prediction of gene expression from polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements, and we observed that gene expression differences in colorectal cancer are related to methylation patterns that extend beyond the single-gene scale. Our software is available at https://github.com/tknijnen/msr/.
Assuntos
Genômica/métodos , Software , Transcriptoma , Animais , DNA/química , Metilação de DNA , Humanos , Análise de Sequência de DNARESUMO
MOTIVATION: Combining P-values from multiple statistical tests is a common exercise in bioinformatics. However, this procedure is non-trivial for dependent P-values. Here, we discuss an empirical adaptation of Brown's method (an extension of Fisher's method) for combining dependent P-values which is appropriate for the large and correlated datasets found in high-throughput biology. RESULTS: We show that the Empirical Brown's method (EBM) outperforms Fisher's method as well as alternative approaches for combining dependent P-values using both noisy simulated data and gene expression data from The Cancer Genome Atlas. AVAILABILITY AND IMPLEMENTATION: The Empirical Brown's method is available in Python, R, and MATLAB and can be obtained from https://github.com/IlyaLab/CombiningDependentPvalues UsingEBM The R code is also available as a Bioconductor package from https://www.bioconductor.org/packages/devel/bioc/html/EmpiricalBrownsMethod.html CONTACT: Theo.Knijnenburg@systemsbiology.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ensaios de Triagem em Larga Escala , Software , Algoritmos , Interpretação Estatística de Dados , Humanos , NeoplasiasRESUMO
Regulation of gene expression involves the orchestrated interaction of a large number of proteins with transcriptional regulatory elements in the context of chromatin. Our understanding of gene regulation is limited by the lack of a protein measurement technology that can systematically detect and quantify the ensemble of proteins associated with the transcriptional regulatory elements of specific genes. Here, we introduce a set of selected reaction monitoring (SRM) assays for the systematic measurement of 464 proteins with known or suspected roles in transcriptional regulation at RNA polymerase II transcribed promoters in Saccharomyces cerevisiae. Measurement of these proteins in nuclear extracts by SRM permitted the reproducible quantification of 42% of the proteins over a wide range of abundances. By deploying the assay to systematically identify DNA binding transcriptional regulators that interact with the environmentally regulated FLO11 promoter in cell extracts, we identified 15 regulators that bound specifically to distinct regions along â¼600 bp of the regulatory sequence. Importantly, the dataset includes a number of regulators that have been shown to either control FLO11 expression or localize to these regulatory regions in vivo. We further validated the utility of the approach by demonstrating that two of the SRM-identified factors, Mot3 and Azf1, are required for proper FLO11 expression. These results demonstrate the utility of SRM-based targeted proteomics to guide the identification of gene-specific transcriptional regulators.
Assuntos
DNA Fúngico/metabolismo , Regulação Fúngica da Expressão Gênica , Estudos de Associação Genética , Espectrometria de Massas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Núcleo Celular/metabolismo , Proteínas de Ligação a DNA/metabolismo , Regiões Promotoras Genéticas/genética , Ligação Proteica/genética , Proteoma/genética , Proteoma/metabolismo , Proteínas Repressoras/metabolismo , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/crescimento & desenvolvimento , Transativadores/metabolismoRESUMO
Phenylethanol has a characteristic rose-like aroma that makes it a popular ingredient in foods, beverages and cosmetics. Microbial production of phenylethanol currently relies on whole-cell bioconversion of phenylalanine with yeasts that harbour an Ehrlich pathway for phenylalanine catabolism. Complete biosynthesis of phenylethanol from a cheap carbon source, such as glucose, provides an economically attractive alternative for phenylalanine bioconversion. In this study, synthetic genetic array (SGA) screening was applied to identify genes involved in regulation of phenylethanol synthesis in Saccharomyces cerevisiae. The screen focused on transcriptional regulation of ARO10, which encodes the major decarboxylase involved in conversion of phenylpyruvate to phenylethanol. A deletion in ARO8, which encodes an aromatic amino acid transaminase, was found to underlie the transcriptional upregulation of ARO10 during growth, with ammonium sulphate as the sole nitrogen source. Physiological characterization revealed that the aro8Δ mutation led to substantial changes in the absolute and relative intracellular concentrations of amino acids. Moreover, deletion of ARO8 led to de novo production of phenylethanol during growth on a glucose synthetic medium with ammonium as the sole nitrogen source. The aro8Δ mutation also stimulated phenylethanol production when combined with other, previously documented, mutations that deregulate aromatic amino acid biosynthesis in S. cerevisiae. The resulting engineered S. cerevisiae strain produced >3 mm phenylethanol from glucose during growth on a simple synthetic medium. The strong impact of a transaminase deletion on intracellular amino acid concentrations opens new possibilities for yeast-based production of amino acid-derived products.
Assuntos
Deleção de Genes , Glucose/metabolismo , Álcool Feniletílico/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/enzimologia , Transaminases/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crescimento & desenvolvimento , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Transaminases/metabolismoRESUMO
BACKGROUND: A central challenge in cancer research is to create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This study was undertaken to construct such a model from functional annotations and explore its use when integrated with large-scale cancer genomics data. METHODS: We created a map that connects genes to cancer hallmarks via signaling pathways. We projected gene mutation and focal copy number data from various cancer types onto this map. We performed statistical analyses to uncover mutually exclusive and co-occurring oncogenic aberrations within this topology. RESULTS: Our analysis showed that although the genetic fingerprint of tumor types could be very different, there were less variations at the level of hallmarks, consistent with the idea that different genetic alterations have similar functional outcomes. Additionally, we showed how the multilevel map could help to clarify the role of infrequently mutated genes, and we demonstrated that mutually exclusive gene mutations were more prevalent in pathways, whereas many co-occurring gene mutations were associated with hallmark characteristics. CONCLUSIONS: Overlaying this map with gene mutation and focal copy number data from various cancer types makes it possible to investigate the similarities and differences between tumor samples systematically at the levels of not only genes but also pathways and hallmarks.
Assuntos
Genômica , Mutação , Neoplasias , Processos Neoplásicos , Humanos , Transdução de SinaisRESUMO
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an 'unresponsive population,' may provide the population with the advantage of diversified bet hedging.
Assuntos
Epigenômica , Citometria de Fluxo , Ensaios de Triagem em Larga Escala , Modelos Estatísticos , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Transcrição Gênica/efeitos dos fármacos , Tamanho Celular , Citometria de Fluxo/métodos , Citometria de Fluxo/estatística & dados numéricos , Fluorescência , Variação Genética , Glucose/metabolismo , Glucose/farmacologia , Proteínas de Fluorescência Verde/análise , Mutação , Complexo de Proteínas Formadoras de Poros Nucleares/genética , Complexo de Proteínas Formadoras de Poros Nucleares/metabolismo , Ácido Oleico/metabolismo , Ácido Oleico/farmacologia , Organismos Geneticamente Modificados/genética , Organismos Geneticamente Modificados/metabolismo , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Proteínas de Ligação a Telômeros/genética , Proteínas de Ligação a Telômeros/metabolismoRESUMO
Cells are complex systems in which many functions are performed by different genetically defined and encoded functional modules. To systematically understand how these modules respond to drug or genetic perturbations, we develop a functional module states framework. Using this framework, we (1) define the drug-induced transcriptional state space for breast cancer cell lines using large public gene expression datasets and reveal that the transcriptional states are associated with drug concentration and drug targets, (2) identify potential targetable vulnerabilities through integrative analysis of transcriptional states after drug treatment and gene knockdown-associated cancer dependency, and (3) use functional module states to predict transcriptional state-dependent drug sensitivity and build prediction models for drug response. This approach demonstrates a similar prediction performance as approaches using high-dimensional gene expression values, with the added advantage of more clearly revealing biologically relevant transcriptional states and key regulators.
Assuntos
Neoplasias da Mama , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Terapia de Alvo Molecular , Transcriptoma , Feminino , HumanosRESUMO
BACKGROUND: In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values. RESULTS: The proposed P-value estimator has been enriched with additional functionalities and is made available to the general community through a public website and web service, called EPEPT. This means that the EPEPT routines can be accessed not only via a website, but also programmatically using any programming language that can interact with the web. Examples of web service clients in multiple programming languages can be downloaded. Additionally, EPEPT accepts data of various common experiment types used in computational biology. For these experiment types EPEPT first computes the permutation values and then performs the P-value estimation. Finally, the source code of EPEPT can be downloaded. CONCLUSIONS: Different types of users, such as biologists, bioinformaticians and software engineers, can use the method in an appropriate and simple way.
Assuntos
Biologia Computacional/métodos , Software , Humanos , Internet , Linguagens de Programação , Análise de RegressãoRESUMO
MOTIVATION: Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation. RESULTS: We tested this hypothesis using a multi-evidence approach for predicting binding sites. As a training/test dataset, we used ChIP-Seq-derived TF binding site locations for five TFs in activated murine macrophages. Our model combined TF binding site motif scanning with evidence from sequence-based sources and from HAc ChIP-Seq data, using a weighted sum of thresholded scores. We find that using HAc data significantly improves the performance of motif-based TF binding site prediction. Furthermore, we find that within regions of high HAc, local minima of the HAc ChIP-Seq signal are particularly strongly correlated with TF binding locations. Our model, using motif scanning and HAc local minima, improves the sensitivity for TF binding site prediction by approximately 50% over a model based on motif scanning alone, at a false positive rate cutoff of 0.01. AVAILABILITY: The data and software source code for model training and validation are freely available online at http://magnet.systemsbiology.net/hac.
Assuntos
Imunoprecipitação da Cromatina/métodos , Ativação de Macrófagos , Fatores de Transcrição/metabolismo , Acetilação , Animais , Sítios de Ligação , Genoma , Histonas/metabolismo , Camundongos , Modelos Biológicos , SoftwareRESUMO
Cellular and molecular aberrations contribute to the disparity of human cancer incidence and etiology between ancestry groups. Multiomics profiling in The Cancer Genome Atlas (TCGA) allows for querying of the molecular underpinnings of ancestry-specific discrepancies in human cancer. Here, we provide a protocol for integrative associative analysis of ancestry with molecular correlates, including somatic mutations, DNA methylation, mRNA transcription, miRNA transcription, and pathway activity, using TCGA data. This protocol can be generalized to analyze other cancer cohorts and human diseases. For complete details on the use and execution of this protocol, please refer to Carrot-Zhang et al. (2020).
Assuntos
Genômica/métodos , Modelos Genéticos , Neoplasias/genética , Metilação de DNA/genética , Bases de Dados Genéticas , Feminino , Humanos , Masculino , MicroRNAs/genética , Transcrição Gênica/genéticaRESUMO
Although some cell types may be defined anatomically or by physiological function, a rigorous definition of cell state remains elusive. Here, we develop a quantitative, imaging-based platform for the systematic and automated classification of subcellular organization in single cells. We use this platform to quantify subcellular organization and gene expression in >30,000 individual human induced pluripotent stem cell-derived cardiomyocytes, producing a publicly available dataset that describes the population distributions of local and global sarcomere organization, mRNA abundance, and correlations between these traits. While the mRNA abundance of some phenotypically important genes correlates with subcellular organization (e.g., the beta-myosin heavy chain, MYH7), these two cellular metrics are heterogeneous and often uncorrelated, which suggests that gene expression alone is not sufficient to classify cell states. Instead, we posit that cell state should be defined by observing full distributions of quantitative, multidimensional traits in single cells that also account for space, time, and function.
Assuntos
Células-Tronco Pluripotentes Induzidas , Diferenciação Celular/genética , Humanos , Miócitos Cardíacos/metabolismo , Transcriptoma/genéticaRESUMO
MOTIVATION: Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible. RESULTS: A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values. AVAILABILITY: The Matlab code can be obtained from the corresponding author on request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos EstatísticosRESUMO
Cancer cells are adept at reprogramming energy metabolism, and the precise manifestation of this metabolic reprogramming exhibits heterogeneity across individuals (and from cell to cell). In this study, we analyzed the metabolic differences between interpersonal heterogeneous cancer phenotypes. We used divergence analysis on gene expression data of 1156 breast normal and tumor samples from The Cancer Genome Atlas (TCGA) and integrated this information with a genome-scale reconstruction of human metabolism to generate personalized, context-specific metabolic networks. Using this approach, we classified the samples into four distinct groups based on their metabolic profiles. Enrichment analysis of the subsystems indicated that amino acid metabolism, fatty acid oxidation, citric acid cycle, androgen and estrogen metabolism, and reactive oxygen species (ROS) detoxification distinguished these four groups. Additionally, we developed a workflow to identify potential drugs that can selectively target genes associated with the reactions of interest. MG-132 (a proteasome inhibitor) and OSU-03012 (a celecoxib derivative) were the top-ranking drugs identified from our analysis and known to have anti-tumor activity. Our approach has the potential to provide mechanistic insights into cancer-specific metabolic dependencies, ultimately enabling the identification of potential drug targets for each patient independently, contributing to a rational personalized medicine approach.
RESUMO
We evaluated ancestry effects on mutation rates, DNA methylation, and mRNA and miRNA expression among 10,678 patients across 33 cancer types from The Cancer Genome Atlas. We demonstrated that cancer subtypes and ancestry-related technical artifacts are important confounders that have been insufficiently accounted for. Once accounted for, ancestry-associated differences spanned all molecular features and hundreds of genes. Biologically significant differences were usually tissue specific but not specific to cancer. However, admixture and pathway analyses suggested some of these differences are causally related to cancer. Specific findings included increased FBXW7 mutations in patients of African origin, decreased VHL and PBRM1 mutations in renal cancer patients of African origin, and decreased immune activity in bladder cancer patients of East Asian origin.