Exploring high-dimensional biological data with sparse contrastive principal component analysis.

Boileau, Philippe; Hejazi, Nima S; Dudoit, Sandrine

Boileau, Philippe; Hejazi, Nima S; Dudoit, Sandrine.

Afiliación

Boileau P; Graduate Group in Biostatistics.
Hejazi NS; Graduate Group in Biostatistics.
Dudoit S; Center for Computational Biology.

Bioinformatics ; 36(11): 3422-3430, 2020 06 01.

Article en En | MEDLINE | ID: mdl-32176249

ABSTRACT

ABSTRACT

MOTIVATION Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.

RESULTS:

Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets. AVAILABILITY AND IMPLEMENTATION A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub. CONTACT philippe_boileau@berkeley.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento; Programas Informáticos; Análisis de Componente Principal

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / Secuenciación de Nucleótidos de Alto Rendimiento Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2020 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google