Disentangling multidimensional spatio-temporal data into their common and aberrant responses.

Chang, Young Hwan; Korkola, James; Amin, Dhara N; Moasser, Mark M; Carmena, Jose M; Gray, Joe W; Tomlin, Claire J

Chang, Young Hwan; Korkola, James; Amin, Dhara N; Moasser, Mark M; Carmena, Jose M; Gray, Joe W; Tomlin, Claire J.

Afiliação

Chang YH; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
Korkola J; Department of Biomedical Engineering and the Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, USA.
Amin DN; Department of Medicine, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA.
Moasser MM; Department of Medicine, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, USA.
Carmena JM; Department of Electrical Engineering and Computer Sciences, Helen Wills Neuroscience Institute, University of California, Berkeley and UCB/UCSF Graduate Program in Bioengineering, CA, USA.
Gray JW; Department of Biomedical Engineering and the Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, USA.
Tomlin CJ; Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA; Faculty Scientist, Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

PLoS One ; 10(4): e0121607, 2015.

Article em En | MEDLINE | ID: mdl-25901353

ABSTRACT

ABSTRACT

With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations over different cell lines, or neural spike trains across many experimental trials, have the potential to acquire insight about the dynamic behavior of the system. For this potential to be realized, we need a suitable representation to understand the data. A general question is how to organize the observed data into meaningful structures and how to find an appropriate similarity measure. A natural way of viewing these complex high dimensional data sets is to examine and analyze the large-scale features and then to focus on the interesting details. Since the wide range of experiments and unknown complexity of the underlying system contribute to the heterogeneity of biological data, we develop a new method by proposing an extension of Robust Principal Component Analysis (RPCA), which models common variations across multiple experiments as the lowrank component and anomalies across these experiments as the sparse component. We show that the proposed method is able to find distinct subtypes and classify data sets in a robust way without any prior knowledge by separating these common responses and abnormal responses. Thus, the proposed method provides us a new representation of these data sets which has the potential to help users acquire new insight from data.

Assuntos

Algoritmos; Biomarcadores Tumorais/genética; Neoplasias da Mama/genética; Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos; Redes Reguladoras de Genes; Redes Neurais de Computação; Neoplasias da Mama/tratamento farmacológico; Feminino; Humanos; Lapatinib; Mutação/genética; Análise de Componente Principal; Análise Serial de Proteínas; Proteínas Proto-Oncogênicas c-akt/genética; Quinazolinas/farmacologia

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Neoplasias da Mama / Biomarcadores Tumorais / Regulação Neoplásica da Expressão Gênica / Redes Neurais de Computação / Redes Reguladoras de Genes Limite: Female / Humans Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google