Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data.

Rashid, Sabrina; Shah, Sohrab; Bar-Joseph, Ziv; Pandya, Ravi

Rashid, Sabrina; Shah, Sohrab; Bar-Joseph, Ziv; Pandya, Ravi.

Afiliação

Rashid S; Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15232, USA.
Shah S; Department of Computer Science.
Bar-Joseph Z; Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
Pandya R; Department of Molecular Oncology, BC Cancer Agency, Vancouver, BC V5Z 4E6, Canada.

Bioinformatics ; 37(11): 1535-1543, 2021 Jul 12.

Article em En | MEDLINE | ID: mdl-30768159

ABSTRACT

ABSTRACT

MOTIVATION Intra-tumor heterogeneity is one of the key confounding factors in deciphering tumor evolution. Malignant cells exhibit variations in their gene expression, copy numbers and mutation even when originating from a single progenitor cell. Single cell sequencing of tumor cells has recently emerged as a viable option for unmasking the underlying tumor heterogeneity. However, extracting features from single cell genomic data in order to infer their evolutionary trajectory remains computationally challenging due to the extremely noisy and sparse nature of the data.

RESULTS:

Here we describe 'Dhaka', a variational autoencoder method which transforms single cell genomic data to a reduced dimension feature space that is more efficient in differentiating between (hidden) tumor subpopulations. Our method is general and can be applied to several different types of genomic data including copy number variation from scDNA-Seq and gene expression from scRNA-Seq experiments. We tested the method on synthetic and six single cell cancer datasets where the number of cells ranges from 250 to 6000 for each sample. Analysis of the resulting feature space revealed subpopulations of cells and their marker genes. The features are also able to infer the lineage and/or differentiation trajectory between cells greatly improving upon prior methods suggested for feature extraction and dimensionality reduction of such data. AVAILABILITY AND IMPLEMENTATION All the datasets used in the paper are publicly available and developed software package and supporting info is available on Github https//github.com/MicrosoftGenomics/Dhaka. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article