RESUMO
This paper concerns the identification of gene co-expression modules in transcriptomics data, i.e. collections of genes which are highly co-expressed and potentially linked to a biological mechanism. Weighted gene co-expression network analysis (WGCNA) is a widely used method for module detection based on the computation of eigengenes, the weights of the first principal component for the module gene expression matrix. This eigengene has been used as a centroid in ak-means algorithm to improve module memberships. In this paper, we present four new module representatives: the eigengene subspace, flag mean, flag median and module expression vector. The eigengene subspace, flag mean and flag median are subspace module representatives which capture more variance of the gene expression within a module. The module expression vector is a weighted centroid of the module which leverages the structure of the module gene co-expression network. We use these module representatives in Linde-Buzo-Gray clustering algorithms to refine WGCNA module membership. We evaluate these methodologies on two transcriptomics data sets. We find that most of our module refinement techniques improve upon the WGCNA modules by two statistics: (1) module classification between phenotype and (2) module biological significance according to Gene Ontology terms.
Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Algoritmos , FenótipoRESUMO
This paper introduces a pathway expression framework as an approach for constructing derived biomarkers. The pathway expression framework incorporates the biological connections of genes leading to a biologically relevant model. Using this framework, we distinguish between shedding subjects post-infection and all subjects pre-infection in human blood transcriptomic samples challenged with various respiratory viruses: H1N1, H3N2, HRV (Human Rhinoviruses), and RSV (Respiratory Syncytial Virus). Additionally, pathway expression data is used for selecting discriminatory pathways from these experiments. The classification results and selected pathways are benchmarked against standard gene expression based classification and pathway ranking methodologies. We find that using the pathway expression data along with selected pathways, which have minimal overlap with high ranking pathways found by traditional methods, improves classification rates across experiments.