TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.

Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun

Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun.

Afiliação

Jung I; Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-Gu, Seoul, 151-747, Republic of Korea.
Jo K; Department of Computer Science and Engineering.
Kang H; Department of Applied Biology and chemistry, Seoul National University, Gwanak-Gu, Seoul, 151-744, Republic of Korea.
Ahn H; Department of Computer Science and Engineering.
Yu Y; Department of Computer Science and Engineering.
Kim S; Interdisciplinary Program in Bioinformatics, Seoul National University, Gwanak-Gu, Seoul, 151-747, Republic of Korea.

Bioinformatics ; 33(23): 3827-3835, 2017 Dec 01.

Article em En | MEDLINE | ID: mdl-28096084

ABSTRACT

ABSTRACT

MOTIVATION Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions.

RESULTS:

We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three

steps:

(i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. AVAILABILITY AND IMPLEMENTATION The TimesVector software is available at http//biohealth.snu.ac.kr/software/TimesVector/. CONTACT sunkim.bioinfo@snu.ac.kr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos; Análise por Conglomerados; Perfilação da Expressão Gênica/métodos; Fenótipo; Transcriptoma; Sequenciamento de Nucleotídeos em Larga Escala; Análise de Sequência com Séries de Oligonucleotídeos; Reprodutibilidade dos Testes; Software; Fatores de Tempo

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Fenótipo / Algoritmos / Análise por Conglomerados / Perfilação da Expressão Gênica / Transcriptoma Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google