Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37587790

RESUMO

Precision medicine relies on the identification of robust disease and risk factor signatures from omics data. However, current knowledge-driven approaches may overlook novel or unexpected phenomena due to the inherent biases in biological knowledge. In this study, we present a data-driven signature discovery workflow for DNA methylation analysis utilizing network-coherent autoencoders (NCAEs) with biologically relevant latent embeddings. First, we explored the architecture space of autoencoders trained on a large-scale pan-tissue compendium (n = 75 272) of human epigenome-wide association studies. We observed the emergence of co-localized patterns in the deep autoencoder latent space representations that corresponded to biological network modules. We determined the NCAE configuration with the strongest co-localization and centrality signals in the human protein interactome. Leveraging the NCAE embeddings, we then trained interpretable deep neural networks for risk factor (aging, smoking) and disease (systemic lupus erythematosus) prediction and classification tasks. Remarkably, our NCAE embedding-based models outperformed existing predictors, revealing novel DNA methylation signatures enriched in gene sets and pathways associated with the studied condition in each case. Our data-driven biomarker discovery workflow provides a generally applicable pipeline to capture relevant risk factor and disease information. By surpassing the limitations of knowledge-driven methods, our approach enhances the understanding of complex epigenetic processes, facilitating the development of more effective diagnostic and therapeutic strategies.


Assuntos
Algoritmos , Metilação de DNA , Humanos , Redes Neurais de Computação , Epigênese Genética , Fatores de Risco
2.
Proc Natl Acad Sci U S A ; 120(6): e2217868120, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36719923

RESUMO

Single-cell RNA sequencing combined with genome-scale metabolic models (GEMs) has the potential to unravel the differences in metabolism across both cell types and cell states but requires new computational methods. Here, we present a method for generating cell-type-specific genome-scale models from clusters of single-cell RNA-Seq profiles. Specifically, we developed a method to estimate the minimum number of cells required to pool to obtain stable models, a bootstrapping strategy for estimating statistical inference, and a faster version of the task-driven integrative network inference for tissues algorithm for generating context-specific GEMs. In addition, we evaluated the effect of different RNA-Seq normalization methods on model topology and differences in models generated from single-cell and bulk RNA-Seq data. We applied our methods on data from mouse cortex neurons and cells from the tumor microenvironment of lung cancer and in both cases found that almost every cell subtype had a unique metabolic profile. In addition, our approach was able to detect cancer-associated metabolic differences between cancer cells and healthy cells, showcasing its utility. We also contextualized models from 202 single-cell clusters across 19 human organs using data from Human Protein Atlas and made these available in the web portal Metabolic Atlas, thereby providing a valuable resource to the scientific community. With the ever-increasing availability of single-cell RNA-Seq datasets and continuously improved GEMs, their combination holds promise to become an important approach in the study of human metabolism.


Assuntos
Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Animais , Camundongos , Humanos , Perfilação da Expressão Gênica/métodos , Algoritmos , RNA-Seq , Genoma/genética , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos
3.
Mol Syst Biol ; 17(9): e10105, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34528760

RESUMO

Tumor cell heterogeneity is a crucial characteristic of malignant brain tumors and underpins phenomena such as therapy resistance and tumor recurrence. Advances in single-cell analysis have enabled the delineation of distinct cellular states of brain tumor cells, but the time-dependent changes in such states remain poorly understood. Here, we construct quantitative models of the time-dependent transcriptional variation of patient-derived glioblastoma (GBM) cells. We build the models by sampling and profiling barcoded GBM cells and their progeny over the course of 3 weeks and by fitting a mathematical model to estimate changes in GBM cell states and their growth rates. Our model suggests a hierarchical yet plastic organization of GBM, where the rates and patterns of cell state switching are partly patient-specific. Therapeutic interventions produce complex dynamic effects, including inhibition of specific states and altered differentiation. Our method provides a general strategy to uncover time-dependent changes in cancer cells and offers a way to evaluate and predict how therapy affects cell state composition.


Assuntos
Neoplasias Encefálicas , Glioblastoma , Neoplasias Encefálicas/genética , Linhagem Celular Tumoral , Glioblastoma/genética , Humanos , Recidiva Local de Neoplasia , Análise de Célula Única
4.
PLoS One ; 16(4): e0250004, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33861779

RESUMO

BACKGROUND: The study aims to determine possible dose-volume response relationships between the rectum, sigmoid colon and small intestine and the 'excessive mucus discharge' syndrome after pelvic radiotherapy for gynaecological cancer. METHODS AND MATERIALS: From a larger cohort, 98 gynaecological cancer survivors were included in this study. These survivors, who were followed for 2 to 14 years, received external beam radiation therapy but not brachytherapy and not did not have stoma. Thirteen of the 98 developed excessive mucus discharge syndrome. Three self-assessed symptoms were weighted together to produce a score interpreted as 'excessive mucus discharge' syndrome based on the factor loadings from factor analysis. The dose-volume histograms (DVHs) for rectum, sigmoid colon, small intestine for each survivor were exported from the treatment planning systems. The dose-volume response relationships for excessive mucus discharge and each organ at risk were estimated by fitting the data to the Probit, RS, LKB and gEUD models. RESULTS: The small intestine was found to have steep dose-response curves, having estimated dose-response parameters: γ50: 1.28, 1.23, 1.32, D50: 61.6, 63.1, 60.2 for Probit, RS and LKB respectively. The sigmoid colon (AUC: 0.68) and the small intestine (AUC: 0.65) had the highest AUC values. For the small intestine, the DVHs for survivors with and without excessive mucus discharge were well separated for low to intermediate doses; this was not true for the sigmoid colon. Based on all results, we interpret the results for the small intestine to reflect a relevant link. CONCLUSION: An association was found between the mean dose to the small intestine and the occurrence of 'excessive mucus discharge'. When trying to reduce and even eliminate the incidence of 'excessive mucus discharge', it would be useful and important to separately delineate the small intestine and implement the dose-response estimations reported in the study.


Assuntos
Colo Sigmoide/metabolismo , Neoplasias dos Genitais Femininos/radioterapia , Intestino Delgado/metabolismo , Muco/metabolismo , Reto/metabolismo , Idoso , Área Sob a Curva , Colo Sigmoide/efeitos da radiação , Relação Dose-Resposta à Radiação , Feminino , Humanos , Intestino Delgado/efeitos da radiação , Pessoa de Meia-Idade , Órgãos em Risco , Curva ROC , Radiação Ionizante , Dosagem Radioterapêutica , Reto/efeitos da radiação
5.
Cancer Med ; 9(10): 3551-3562, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32207233

RESUMO

BACKGROUND: Characterizing breast cancer progression and aggressiveness relies on categorical descriptions of tumor stage and grade. Interpreting these categorical descriptions is challenging because stage convolutes the size and spread of the tumor and no consensus exists to define high/low grade tumors. METHODS: We address this challenge of heterogeneity in patient-specific cancer samples by adapting and applying several tools originally created for understanding heterogeneity and phenotype development in single cells (specifically, single-cell topological data analysis and Wanderlust) to create a continuous metric describing breast cancer progression using bulk RNA-seq samples from individual patient tumors. We also created a linear regression-based method to predict tumor aggressiveness in vivo from bulk RNA-seq data. RESULTS: We found that breast cancer proceeds along three convergent phenotype trajectories: luminal, HER2-enriched, and basal-like. Furthermore, 31 296 genes (for luminal cancers), 17 827 genes (for HER2-enriched), and 18 505 genes (for basal-like) are dynamically differentially expressed during breast cancer progression. Across progression trajectories, our results show that expression of genes related to ADP-ribosylation decreased as tumors progressed (while PARP1 and PARP2 increased or remained stable), suggesting the potential for a differential response to PARP inhibitors based on cancer progression. Additionally, we developed a 132-gene expression regression equation to predict mitotic index and a 23-gene expression regression equation to predict growth rate from a single breast cancer biopsy. CONCLUSION: Our results suggest that breast cancer dynamically changes during disease progression, and growth rate of the cancer cells is associated with distinct transcriptional profiles.


Assuntos
Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Neoplasias da Mama/patologia , Bases de Dados Genéticas , Progressão da Doença , Feminino , Humanos , Índice Mitótico , Fenótipo , Poli(ADP-Ribose) Polimerase-1/genética , Inibidores de Poli(ADP-Ribose) Polimerases , Poli(ADP-Ribose) Polimerases/genética , Prognóstico , RNA-Seq , Transcriptoma
6.
Nat Commun ; 11(1): 71, 2020 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-31900415

RESUMO

Despite advances in the molecular exploration of paediatric cancers, approximately 50% of children with high-risk neuroblastoma lack effective treatment. To identify therapeutic options for this group of high-risk patients, we combine predictive data mining with experimental evaluation in patient-derived xenograft cells. Our proposed algorithm, TargetTranslator, integrates data from tumour biobanks, pharmacological databases, and cellular networks to predict how targeted interventions affect mRNA signatures associated with high patient risk or disease processes. We find more than 80 targets to be associated with neuroblastoma risk and differentiation signatures. Selected targets are evaluated in cell lines derived from high-risk patients to demonstrate reversal of risk signatures and malignant phenotypes. Using neuroblastoma xenograft models, we establish CNR2 and MAPK8 as promising candidates for the treatment of high-risk neuroblastoma. We expect that our method, available as a public tool (targettranslator.org), will enhance and expedite the discovery of risk-associated targets for paediatric and adult cancers.


Assuntos
Antineoplásicos/administração & dosagem , Neuroblastoma/tratamento farmacológico , Neuroblastoma/genética , Animais , Linhagem Celular Tumoral , Avaliação Pré-Clínica de Medicamentos , Feminino , Humanos , Masculino , Camundongos , Camundongos Nus , Proteína Quinase 8 Ativada por Mitógeno/antagonistas & inibidores , Proteína Quinase 8 Ativada por Mitógeno/genética , Proteína Quinase 8 Ativada por Mitógeno/metabolismo , Neuroblastoma/metabolismo , Receptor CB2 de Canabinoide/antagonistas & inibidores , Receptor CB2 de Canabinoide/genética , Receptor CB2 de Canabinoide/metabolismo , Ensaios Antitumorais Modelo de Xenoenxerto , Peixe-Zebra
7.
Acta Oncol ; 57(10): 1352-1358, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29733238

RESUMO

PURPOSE: To find out what organs and doses are most relevant for 'radiation-induced urgency syndrome' in order to derive the corresponding dose-response relationships as an aid for avoiding the syndrome in the future. MATERIAL AND METHODS: From a larger group of gynecological cancer survivors followed-up 2-14 years, we identified 98 whom had undergone external beam radiation therapy but not brachytherapy and not having a stoma. Of those survivors, 24 developed urgency syndrome. Based on the loading factor from a factor analysis, and symptom frequency, 15 symptoms were weighted together to a score interpreted as the intensity of radiation-induced urgency symptom. On reactivated dose plans, we contoured the small intestine, sigmoid colon and the rectum (separate from the anal-sphincter region) and we exported the dose-volume histograms for each survivor. Dose-response relationships from respective risk organ and urgency syndrome were estimated by fitting the data to the Probit, RS, LKB and gEUD models. RESULTS: The rectum and sigmoid colon have steep dose-response relationships for urgency syndrome for Probit, RS and LKB. The dose-response parameters for the rectum were D50: 51.3, 51.4, and 51.3 Gy, γ50 = 1.19 for all models, s was 7.0e-09 for RS and n was 9.9 × 107 for LKB. For Sigmoid colon, D50 were 51.6, 51.6, and 51.5 Gy, γ50 were 1.20, 1.25, and 1.27, s was 2.8 for RS and n was 0.079 for LKB. CONCLUSIONS: Primarily the dose to sigmoid colon as well as the rectum is related to urgency syndrome among gynecological cancer survivors. Separate delineation of the rectum and sigmoid colon in order to incorporate the dose-response results may aid in reduction of the incidence of the urgency syndrome.


Assuntos
Colo Sigmoide/efeitos da radiação , Neoplasias dos Genitais Femininos/radioterapia , Lesões por Radiação/etiologia , Reto/efeitos da radiação , Idoso , Relação Dose-Resposta à Radiação , Feminino , Humanos , Intestino Delgado/efeitos da radiação , Pessoa de Meia-Idade , Órgãos em Risco , Dosagem Radioterapêutica
8.
Acta Oncol ; 56(5): 682-691, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28366105

RESUMO

BACKGROUND: It is unknown whether smoking; age at time of radiotherapy or time since radiotherapy influence the intensity of late radiation-induced bowel syndromes. MATERIAL AND METHODS: We have previously identified 28 symptoms decreasing bowel health among 623 gynecological-cancer survivors (three to twelve years after radiotherapy) and 344 matched population-based controls. The 28 symptoms were grouped into five separate late bowel syndromes through factor analysis. Here, we related possible predictors of bowel health to syndrome intensity, by combining factor analysis weights and symptom frequency on a person-incidence scale. RESULTS: A strong (p < .001) association between smoking and radiation-induced urgency syndrome was found with a syndrome intensity (normalized factor score) of 0.4 (never smoker), 1.2 (former smoker) and 2.5 (current smoker). Excessive gas discharge was also related to smoking (p = .001). Younger age at treatment resulted in a higher intensity, except for the leakage syndrome. For the urgency syndrome, intensity decreased with time since treatment. CONCLUSIONS: Smoking aggravates the radiation-induced urgency syndrome and excessive gas discharge syndrome. Smoking cessation may promote bowel health among gynecological-cancer survivors. Furthermore, by understanding the mechanism for the decline in urgency-syndrome intensity over time, we may identify new strategies for prevention and alleviation.


Assuntos
Sobreviventes de Câncer , Neoplasias dos Genitais Femininos/radioterapia , Intestinos/efeitos da radiação , Síndrome do Intestino Irritável/etiologia , Lesões por Radiação/etiologia , Radioterapia/efeitos adversos , Fumar Tabaco/efeitos adversos , Adolescente , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Estudos de Casos e Controles , Feminino , Seguimentos , Humanos , Intestinos/patologia , Masculino , Pessoa de Meia-Idade , Prognóstico , Adulto Jovem
9.
PLoS One ; 12(2): e0171461, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28158314

RESUMO

BACKGROUND: During radiotherapy unwanted radiation to normal tissue surrounding the tumor triggers survivorship diseases; we lack a nosology for radiation-induced survivorship diseases that decrease bowel health and we do not know which symptoms are related to which diseases. METHODS: Gynecological-cancer survivors were followed-up two to 15 years after having undergone radiotherapy; they reported in a postal questionnaire the frequency of 28 different symptoms related to bowel health. Population-based controls gave the same information. With a modified factor analysis, we determined the optimal number of factors, factor loadings for each symptom, factor-specific factor-loading cutoffs and factor scores. RESULTS: Altogether data from 623 survivors and 344 population-based controls were analyzed. Six factors best explain the correlation structure of the symptoms; for five of these a statistically significant difference (P< 0.001, Mann-Whitney U test) was found between survivors and controls concerning factor score quantiles. Taken together these five factors explain 42 percent of the variance of the symptoms. We interpreted these five factors as radiation-induced syndromes that may reflect distinct survivorship diseases. We obtained the following frequencies, defined as survivors having a factor loading above the 95 percent percentile of the controls, urgency syndrome (190 of 623, 30 percent), leakage syndrome (164 of 623, 26 percent), excessive gas discharge (93 of 623, 15 percent), excessive mucus discharge (102 of 623, 16 percent) and blood discharge (63 of 623, 10 percent). CONCLUSION: Late effects of radiotherapy include five syndromes affecting bowel health; studying them and identifying the underlying survivorship diseases, instead of the approximately 30 long-term symptoms they produce, will simplify the search for prevention, alleviation and elimination.


Assuntos
Neoplasias dos Genitais Femininos/radioterapia , Lesões por Radiação/diagnóstico , Radioterapia/efeitos adversos , Idoso , Idoso de 80 Anos ou mais , Feminino , Neoplasias dos Genitais Femininos/cirurgia , Humanos , Pessoa de Meia-Idade , Qualidade de Vida , Inquéritos e Questionários
10.
Methods Inf Med ; 55(5): 431-439, 2016 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-27588322

RESUMO

BACKGROUND: In the field of radiation oncology, the use of extensive patient reported outcomes is increasingly common to measure adverse side effects after radiotherapy in cancer patients. Factor analysis has the potential to identify an optimal number of latent factors (i.e., symptom groups). However, the ultimate goal of treatment response modeling is to understand the relationship between treatment variables such as radiation dose and symptom groups resulting from FA. Hence, it is crucial to identify clinically more relevant symptom groups and improved response variables from those symptom groups for a quantitative analysis. OBJECTIVES: The goal of this study is to design a computational method for finding clinically relevant symptom groups from PROs and to test associations between symptom groups and radiation dose. METHODS: We propose a novel approach where exploratory factor analysis is followed by confirmatory factor analysis to determine the relevant number of symptom groups. We also propose to use a combination of symptoms in a symptom group identified as a new response variable in linear regression analysis to investigate the relationship between the symptom group and dose-volume variables. RESULTS: We analyzed patient-reported gastrointestinal symptom profiles from 3 datasets in prostate cancer patients treated with radiotherapy. The final structural model of each dataset was validated using the other two datasets and compared to four other existing FA methods. Our systematic EFA-CFA approach provided clinically more relevant solutions than other methods, resulting in new clinically relevant outcome variables that enabled a quantitative analysis. As a result, statistically significant correlations were found between some dose-volume variables to relevant anatomic structures and symptom groups identified by FA. CONCLUSIONS: Our proposed method can aid in the process of understanding PROs and provide a basis for improving our understanding of radiation-induced side effects.


Assuntos
Análise Fatorial , Medidas de Resultados Relatados pelo Paciente , Análise por Conglomerados , Estudos de Coortes , Simulação por Computador , Confiabilidade dos Dados , Bases de Dados como Assunto , Humanos , Modelos Lineares , Doses de Radiação , Reprodutibilidade dos Testes
11.
EBioMedicine ; 12: 72-85, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27667176

RESUMO

Glioblastomas are characterized by transcriptionally distinct subtypes, but despite possible clinical relevance, their regulation remains poorly understood. The commonly used molecular classification systems for GBM all identify a subtype with high expression of mesenchymal marker transcripts, strongly associated with invasive growth. We used a comprehensive data-driven network modeling technique (augmented sparse inverse covariance selection, aSICS) to define separate genomic, epigenetic, and transcriptional regulators of glioblastoma subtypes. Our model identified Annexin A2 (ANXA2) as a novel methylation-controlled positive regulator of the mesenchymal subtype. Subsequent evaluation in two independent cohorts established ANXA2 expression as a prognostic factor that is dependent on ANXA2 promoter methylation. ANXA2 knockdown in primary glioblastoma stem cell-like cultures suppressed known mesenchymal master regulators, and abrogated cell proliferation and invasion. Our results place ANXA2 at the apex of a regulatory cascade that determines glioblastoma mesenchymal transformation and validate aSICS as a general methodology to uncover regulators of cancer subtypes.


Assuntos
Anexina A2/metabolismo , Epigênese Genética , Regulação Neoplásica da Expressão Gênica , Glioblastoma/genética , Glioblastoma/metabolismo , Mesenquimoma/genética , Mesenquimoma/metabolismo , Algoritmos , Anexina A2/genética , Biomarcadores Tumorais , Linhagem Celular Tumoral , Biologia Computacional/métodos , Metilação de DNA , Bases de Dados de Ácidos Nucleicos , Transição Epitelial-Mesenquimal , Perfilação da Expressão Gênica , Técnicas de Silenciamento de Genes , Glioblastoma/mortalidade , Glioblastoma/patologia , Humanos , Mesenquimoma/mortalidade , Mesenquimoma/patologia , Anotação de Sequência Molecular , Gradação de Tumores , Células-Tronco Neoplásicas/metabolismo , Prognóstico , Regiões Promotoras Genéticas
12.
Nucleic Acids Res ; 43(15): e98, 2015 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-25953855

RESUMO

Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets.


Assuntos
Modelos Genéticos , Modelos Estatísticos , Neoplasias/genética , Antineoplásicos/farmacologia , Deleção Cromossômica , Cromossomos Humanos Par 11 , Variações do Número de Cópias de DNA , Metilação de DNA , Genômica/métodos , Glioma/genética , Humanos , Internet , Isocitrato Desidrogenase/genética , Estimativa de Kaplan-Meier , MicroRNAs/metabolismo , Mutação , Neoplasias/mortalidade , RNA Mensageiro/metabolismo , Software
13.
PLoS One ; 8(7): e68598, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23935877

RESUMO

Functionally interacting perturbations, such as synergistic drugs pairs or synthetic lethal gene pairs, are of key interest in both pharmacology and functional genomics. However, to find such pairs by traditional screening methods is both time consuming and costly. We present a novel computational-experimental framework for efficient identification of synergistic target pairs, applicable for screening of systems with sizes on the order of current drug, small RNA or SGA (Synthetic Genetic Array) libraries (>1000 targets). This framework exploits the fact that the response of a drug pair in a given system, or a pair of genes' propensity to interact functionally, can be partly predicted by computational means from (i) a small set of experimentally determined target pairs, and (ii) pre-existing data (e.g. gene ontology, PPI) on the similarities between targets. Predictions are obtained by a novel matrix algebraic technique, based on cyclical projections onto convex sets. We demonstrate the efficiency of the proposed method using drug-drug interaction data from seven cancer cell lines and gene-gene interaction data from yeast SGA screens. Our protocol increases the rate of synergism discovery significantly over traditional screening, by up to 7-fold. Our method is easy to implement and could be applied to accelerate pair screening for both animal and microbial systems.


Assuntos
Algoritmos , Antineoplásicos/farmacologia , Epistasia Genética , Ensaios de Triagem em Larga Escala , Saccharomyces cerevisiae/genética , Animais , Antineoplásicos/química , Linhagem Celular Tumoral , Interações Medicamentosas , Genes Letais , Genes Sintéticos , Humanos
14.
Genome Biol ; 13(6): R46, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22703998

RESUMO

BACKGROUND: Complex diseases are associated with altered interactions between thousands of genes. We developed a novel method to identify and prioritize disease genes, which was generally applicable to complex diseases. RESULTS: We identified modules of highly interconnected genes in disease-specific networks derived from integrating gene-expression and protein interaction data. We examined if those modules were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies. First, we analyzed publicly available gene expression microarray and genome-wide association study (GWAS) data from 13, highly diverse, complex diseases. In each disease, highly interconnected genes formed modules, which were significantly enriched for genes harboring disease-associated SNPs. To test if such modules could be used to find novel genes for functional studies, we repeated the analyses using our own gene expression microarray and GWAS data from seasonal allergic rhinitis. We identified a novel gene, FGF2, whose relevance was supported by functional studies using combined small interfering RNA-mediated knock-down and gene expression microarrays. The modules in the 13 complex diseases analyzed here tended to overlap and were enriched for pathways related to oncological, metabolic and inflammatory diseases. This suggested that this union of the modules would be associated with a general increase in susceptibility for complex diseases. Indeed, we found that this union was enriched with GWAS genes for 145 other complex diseases. CONCLUSIONS: Modules of highly interconnected complex disease genes were enriched for disease-associated SNPs, and could be used to find novel genes for functional studies.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas , Rinite Alérgica Sazonal/genética , Bases de Dados Genéticas , Fator 2 de Crescimento de Fibroblastos/genética , Fator 2 de Crescimento de Fibroblastos/metabolismo , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Pleiotropia Genética , Humanos , Inflamação/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , Sensibilidade e Especificidade
15.
Biostatistics ; 13(4): 748-61, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22699861

RESUMO

With the growing availability of omics data generated to describe different cells and tissues, the modeling and interpretation of such data has become increasingly important. Pathways are sets of reactions involving genes, metabolites, and proteins highlighting functional modules in the cell. Therefore, to discover activated or perturbed pathways when comparing two conditions, for example two different tissues, it is beneficial to use several types of omics data. We present a model that integrates transcriptomic and metabolomic data in order to make an informed pathway-level decision. Since metabolites can be seen as end-points of perturbations happening at the gene level, the gene expression data constitute the explanatory variables in a sparse regression model for the metabolite data. Sophisticated model selection procedures are developed to determine an appropriate model. We demonstrate that the transcript profiles can be used to informatively explain the metabolite data from cancer cell lines. Simulation studies further show that the proposed model offers a better performance in identifying active pathways than, for example, enrichment methods performed separately on the transcript and metabolite data.


Assuntos
Interpretação Estatística de Dados , Metabolômica , Modelos Biológicos , Transcriptoma , Simulação por Computador , Modelos Genéticos
16.
Adv Exp Med Biol ; 736: 617-43, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22161356

RESUMO

One of the central problems of cancer systems biology is to understand the complex molecular changes of cancerous cells and tissues, and use this understanding to support the development of new targeted therapies. EPoC (Endogenous Perturbation analysis of Cancer) is a network modeling technique for tumor molecular profiles. EPoC models are constructed from combined copy number aberration (CNA) and mRNA data and aim to (1) identify genes whose copy number aberrations significantly affect target mRNA expression and (2) generate markers for long- and short-term survival of cancer patients. Models are constructed by a combination of regression and bootstrapping methods. Prognostic scores are obtained from a singular value decomposition of the networks. We have previously analyzed the performance of EPoC using glioblastoma data from The Cancer Genome Atlas (TCGA) consortium, and have shown that resulting network models contain both known and candidate disease-relevant genes as network hubs, as well as uncover predictors of patient survival. Here, we give a practical guide how to perform EPoC modeling in practice using R, and present a set of alternative modeling frameworks.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Modelos Genéticos , Neoplasias/genética , Biologia de Sistemas/métodos , Algoritmos , Biologia Computacional/classificação , Dosagem de Genes , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes/efeitos dos fármacos , Predisposição Genética para Doença/genética , Glioblastoma/tratamento farmacológico , Glioblastoma/genética , Humanos , Neoplasias/tratamento farmacológico , Prognóstico , Reprodutibilidade dos Testes , Análise de Sobrevida
17.
Mol Syst Biol ; 7: 486, 2011 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-21525872

RESUMO

DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.


Assuntos
Dosagem de Genes , Glioblastoma/genética , Proteínas do Tecido Nervoso/metabolismo , Neoplasias do Sistema Nervoso/genética , Proteínas Nucleares/metabolismo , Ativação Transcricional/genética , Proteína Supressora de Tumor p53/metabolismo , Linhagem Celular Tumoral , Aberrações Cromossômicas , Bases de Dados Factuais , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Genoma Humano , Estudo de Associação Genômica Ampla , Glioblastoma/metabolismo , Glioblastoma/mortalidade , Glioblastoma/patologia , Humanos , Modelos Genéticos , Proteínas do Tecido Nervoso/genética , Neoplasias do Sistema Nervoso/metabolismo , Neoplasias do Sistema Nervoso/mortalidade , Neoplasias do Sistema Nervoso/patologia , Proteínas Nucleares/genética , Prognóstico , Software , Proteína Supressora de Tumor p53/genética
18.
Cancer Cell Int ; 11: 9, 2011 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-21492432

RESUMO

BACKGROUND: There are currently three postulated genomic subtypes of the childhood tumour neuroblastoma (NB); Type 1, Type 2A, and Type 2B. The most aggressive forms of NB are characterized by amplification of the oncogene MYCN (MNA) and low expression of the favourable marker NTRK1. Recently, mutations or high expression of the familial predisposition gene Anaplastic Lymphoma Kinase (ALK) was associated to unfavourable biology of sporadic NB. Also, various other genes have been linked to NB pathogenesis. RESULTS: The present study explores subgroup discrimination by gene expression profiling using three published microarray studies on NB (47 samples). Four distinct clusters were identified by Principal Components Analysis (PCA) in two separate data sets, which could be verified by an unsupervised hierarchical clustering in a third independent data set (101 NB samples) using a set of 74 discriminative genes. The expression signature of six NB-associated genes ALK, BIRC5, CCND1, MYCN, NTRK1, and PHOX2B, significantly discriminated the four clusters (p < 0.05, one-way ANOVA test). PCA clusters p1, p2, and p3 were found to correspond well to the postulated subtypes 1, 2A, and 2B, respectively. Remarkably, a fourth novel cluster was detected in all three independent data sets. This cluster comprised mainly 11q-deleted MNA-negative tumours with low expression of ALK, BIRC5, and PHOX2B, and was significantly associated with higher tumour stage, poor outcome and poor survival compared to the Type 1-corresponding favourable group (INSS stage 4 and/or dead of disease, p < 0.05, Fisher's exact test). CONCLUSIONS: Based on expression profiling we have identified four molecular subgroups of neuroblastoma, which can be distinguished by a 6-gene signature. The fourth subgroup has not been described elsewhere, and efforts are currently made to further investigate this group's specific characteristics.

19.
Bioinformatics ; 21(22): 4155-61, 2005 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-16118262

RESUMO

MOTIVATION: Significance analysis of differential expression in DNA microarray data is an important task. Much of the current research is focused on developing improved tests and software tools. The task is difficult not only owing to the high dimensionality of the data (number of genes), but also because of the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression. RESULTS: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If the data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if the data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus, LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two datasets, different amounts of missing values, different imputation methods, the standard t-test and the regularized t-test and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC and BPCA. Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test. AVAILABILITY: Matlab code is available on request from the authors.


Assuntos
Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise de Variância , Análise por Conglomerados , Interpretação Estatística de Dados , Reações Falso-Positivas , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica , Humanos , Fígado/metabolismo , Neoplasias Hepáticas/genética , Análise em Microsséries , Modelos Genéticos , Modelos Estatísticos , Tamanho da Amostra , Software
20.
Bioinformatics ; 19(9): 1100-9, 2003 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-12801870

RESUMO

MOTIVATION: The microarray technology allows for the simultaneous monitoring of thousands of genes for each sample. The high-dimensional gene expression data can be used to study similarities of gene expression profiles across different samples to form a gene clustering. The clusters may be indicative of genetic pathways. Parallel to gene clustering is the important application of sample classification based on all or selected gene expressions. The gene clustering and sample classification are often undertaken separately, or in a directional manner (one as an aid for the other). However, such separation of these two tasks may occlude informative structure in the data. Here we present an algorithm for the simultaneous clustering of genes and subset selection of gene clusters for sample classification. We develop a new model selection criterion based on Rissanen's MDL (minimum description length) principle. For the first time, an MDL code length is given for both explanatory variables (genes) and response variables (sample class labels). The final output of the proposed algorithm is a sparse and interpretable classification rule based on cluster centroids or the closest genes to the centroids. RESULTS: Our algorithm for simultaneous gene clustering and subset selection for classification is applied to three publicly available data sets. For all three data sets, we obtain sparse and interpretable classification models based on centroids of clusters. At the same time, these models give competitive test error rates as the best reported methods. Compared with classification models based on single gene selections, our rules are stable in the sense that the number of clusters has a small variability and the centroids of the clusters are well correlated (or consistent) across different cross validation samples. We also discuss models where the centroids of clusters are replaced with the genes closest to the centroids. These models show comparable test error rates to models based on single gene selection, but are more sparse as well as more stable. Moreover, we comment on how the inclusion of a classification criterion affects the gene clustering, bringing out class informative structure in the data. AVAILABILITY: The methods presented in this paper have been implemented in the R language. The source code is available from the first author.


Assuntos
Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Modelos Estatísticos , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Linhagem Celular Tumoral , Neoplasias do Colo/classificação , Neoplasias do Colo/genética , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Leucemia/classificação , Leucemia/genética , Reconhecimento Automatizado de Padrão , Análise de Componente Principal , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA