Exploring high-dimensional biological data with sparse contrastive principal component analysis.
Bioinformatics
; 36(11): 3422-3430, 2020 06 01.
Article
en En
| MEDLINE
| ID: mdl-32176249
ABSTRACT
MOTIVATION Statistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously. RESULTS:
Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis (PCA), sparse contrastive PCA that extracts sparse, stable, interpretable and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study and via analyses of several publicly available protein expression, microarray gene expression and single-cell transcriptome sequencing datasets. AVAILABILITY AND IMPLEMENTATION A free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in this article is also available via GitHub. CONTACT philippe_boileau@berkeley.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Programas Informáticos
/
Secuenciación de Nucleótidos de Alto Rendimiento
Idioma:
En
Revista:
Bioinformatics
Asunto de la revista:
INFORMATICA MEDICA
Año:
2020
Tipo del documento:
Article