Interpolation based consensus clustering for gene expression time series.

Chiu, Tai-Yu; Hsu, Ting-Chieh; Yen, Chia-Cheng; Wang, Jia-Shung

Chiu, Tai-Yu; Hsu, Ting-Chieh; Yen, Chia-Cheng; Wang, Jia-Shung.

Affiliation

Chiu TY; Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, HsinChu, 30013, Taiwan. tychiu@vc.cs.nthu.edu.tw.
Hsu TC; Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, HsinChu, 30013, Taiwan. tchsu@vc.cs.nthu.edu.tw.
Yen CC; Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, HsinChu, 30013, Taiwan. ccyen@vc.cs.nthu.edu.tw.
Wang JS; Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, HsinChu, 30013, Taiwan. jswang@cs.nthu.edu.tw.

BMC Bioinformatics ; 16: 117, 2015 Apr 16.

Article in En | MEDLINE | ID: mdl-25888019

ABSTRACT

ABSTRACT

BACKGROUND:

Unsupervised analyses such as clustering are the essential tools required to interpret time-series expression data from microarrays. Several clustering algorithms have been developed to analyze gene expression data. Early methods such as k-means, hierarchical clustering, and self-organizing maps are popular for their simplicity. However, because of noise and uncertainty of measurement, these common algorithms have low accuracy. Moreover, because gene expression is a temporal process, the relationship between successive time points should be considered in the analyses. In addition, biological processes are generally continuous; therefore, the datasets collected from time series experiments are often found to have an insufficient number of data points and, as a result, compensation for missing data can also be an issue.

RESULTS:

An affinity propagation-based clustering algorithm for time-series gene expression data is proposed. The algorithm explores the relationship between genes using a sliding-window mechanism to extract a large number of features. In addition, the time-course datasets are resampled with spline interpolation to predict the unobserved values. Finally, a consensus process is applied to enhance the robustness of the method. Some real gene expression datasets were analyzed to demonstrate the accuracy and efficiency of the algorithm.

CONCLUSION:

The proposed algorithm has benefitted from the use of cubic B-splines interpolation, sliding-window, affinity propagation, gene relativity graph, and a consensus process, and, as a result, provides both appropriate and effective clustering of time-series gene expression data. The proposed method was tested with gene expression data from the Yeast galactose dataset, the Yeast cell-cycle dataset (Y5), and the Yeast sporulation dataset, and the results illustrated the relationships between the expressed genes, which may give some insights into the biological processes involved.

Subject(s)

Algorithms; Computer Graphics; Gene Expression Profiling/methods; Gene Expression Regulation, Fungal; Saccharomyces cerevisiae Proteins/genetics; Saccharomyces cerevisiae/genetics; Cell Cycle/physiology; Cluster Analysis; Consensus Sequence; Galactose/metabolism; Oligonucleotide Array Sequence Analysis/methods; Spores, Fungal/physiology; Time Factors

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Saccharomyces cerevisiae / Algorithms / Computer Graphics / Gene Expression Regulation, Fungal / Gene Expression Profiling / Saccharomyces cerevisiae Proteins Type of study: Prognostic_studies Language: En Journal: BMC Bioinformatics Year: 2015 Document type: Article

Fulltext

XML

PubMed Links

Search on Google