Accelerating high-dimensional clustering with lossless data reduction.

Qaqish, Bahjat F; O'Brien, Jonathon J; Hibbard, Jonathan C; Clowers, Katie J

Qaqish, Bahjat F; O'Brien, Jonathon J; Hibbard, Jonathan C; Clowers, Katie J.

Afiliação

Qaqish BF; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
O'Brien JJ; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA.
Hibbard JC; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Clowers KJ; Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA.

Bioinformatics ; 33(18): 2867-2872, 2017 Sep 15.

Article em En | MEDLINE | ID: mdl-28520900

RESUMO

MOTIVATION: For cluster analysis, high-dimensional data are associated with instability, decreased classification accuracy and high-computational burden. The latter challenge can be eliminated as a serious concern. For applications where dimension reduction techniques are not implemented, we propose a temporary transformation which accelerates computations with no loss of information. The algorithm can be applied for any statistical procedure depending only on Euclidean distances and can be implemented sequentially to enable analyses of data that would otherwise exceed memory limitations. RESULTS: The method is easily implemented in common statistical software as a standard pre-processing step. The benefit of our algorithm grows with the dimensionality of the problem and the complexity of the analysis. Consequently, our simple algorithm not only decreases the computation time for routine analyses, it opens the door to performing calculations that may have otherwise been too burdensome to attempt. AVAILABILITY AND IMPLEMENTATION: R, Matlab and SAS/IML code for implementing lossless data reduction is freely available in the Appendix. CONTACT: obrienj@hms.harvard.edu.

Assuntos

Análise por Conglomerados; Biologia Computacional/métodos; Software; Algoritmos; Metilação de DNA; Proteínas Fúngicas/genética; Regulação Fúngica da Expressão Gênica; Humanos; Proteômica/métodos; Leveduras/genética; Leveduras/metabolismo

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Análise por Conglomerados / Biologia Computacional Limite: Humans Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2017 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google