RESUMO
Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases.
RESUMO
Genome-wide association studies (GWAS) have successfully identified genetic loci associated with glycemic traits. However, characterizing the functional significance of these loci has proven challenging. We sought to gain insights into the regulation of fasting insulin and fasting glucose through the use of gene expression microarray data from peripheral blood samples of participants without diabetes in the Framingham Heart Study (FHS) (n = 5,056), the Rotterdam Study (RS) (n = 723), and the InCHIANTI Study (Invecchiare in Chianti) (n = 595). Using a false discovery rate q <0.05, we identified three transcripts associated with fasting glucose and 433 transcripts associated with fasting insulin levels after adjusting for age, sex, technical covariates, and complete blood cell counts. Among the findings, circulating IGF2BP2 transcript levels were positively associated with fasting insulin in both the FHS and RS. Using 1000 Genomes-imputed genotype data, we identified 47,587 cis-expression quantitative trait loci (eQTL) and 6,695 trans-eQTL associated with the 433 significant insulin-associated transcripts. Of note, we identified a trans-eQTL (rs592423), where the A allele was associated with higher IGF2BP2 levels and with fasting insulin in an independent genetic meta-analysis comprised of 50,823 individuals. We conclude that integration of genomic and transcriptomic data implicate circulating IGF2BP2 mRNA levels associated with glucose and insulin homeostasis.