Your browser doesn't support javascript.
loading
A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression.
Yan, Xiting; Liang, Anqi; Gomez, Jose; Cohn, Lauren; Zhao, Hongyu; Chupp, Geoffrey L.
Affiliation
  • Yan X; Center for Pulmonary Personalized Medicine, Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06520, USA. xiting.yan@yale.edu.
  • Liang A; Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA. xiting.yan@yale.edu.
  • Gomez J; Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
  • Cohn L; Center for Pulmonary Personalized Medicine, Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
  • Zhao H; Center for Pulmonary Personalized Medicine, Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
  • Chupp GL; Center for Pulmonary Personalized Medicine, Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
BMC Bioinformatics ; 18(1): 309, 2017 Jun 20.
Article de En | MEDLINE | ID: mdl-28637421
ABSTRACT

BACKGROUND:

Distance based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. An alternative method to examine disease phenotypes is to use pre-defined biological pathways. These pathways have been shown to be perturbed in different ways in different subjects who have similar clinical features. We hypothesize that differences in the expressions of genes in a given pathway are more predictive of differences in biological differences compared to standard approaches and if integrated into clustering analysis will enhance the robustness and accuracy of the clustering method. To examine this hypothesis, we developed a novel computational method to assess the biological differences between samples using gene expression data by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior.

RESULTS:

Pre-defined biological pathways were downloaded and genes in each pathway were used to cluster samples using the Gaussian mixture model. The clustering results across different pathways were then summarized to calculate the pathway-based distance score between samples. This method was applied to both simulated and real data sets and compared to the traditional Euclidean distance and another pathway-based clustering method, Pathifier. The results show that the pathway-based distance score performs significantly better than the Euclidean distance, especially when the heterogeneity is low and genes in the same pathways are correlated. Compared to Pathifier, we demonstrated that our approach achieves higher accuracy and robustness for small pathways. When the pathway size is large, by downsampling the pathways into smaller pathways, our approach was able to achieve comparable performance.

CONCLUSIONS:

We have developed a novel distance score that represents the biological differences between samples using gene expression data and pre-defined biological pathway information. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both simulated data and real data when compared to traditional methods. It also has comparable or better performance compared to Pathifier.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Algorithmes / Expression des gènes / Voies et réseaux métaboliques Type d'étude: Prognostic_studies Aspects: Patient_preference Limites: Humans Langue: En Journal: BMC Bioinformatics Sujet du journal: INFORMATICA MEDICA Année: 2017 Type de document: Article Pays d'affiliation: États-Unis d'Amérique

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Algorithmes / Expression des gènes / Voies et réseaux métaboliques Type d'étude: Prognostic_studies Aspects: Patient_preference Limites: Humans Langue: En Journal: BMC Bioinformatics Sujet du journal: INFORMATICA MEDICA Année: 2017 Type de document: Article Pays d'affiliation: États-Unis d'Amérique