Your browser doesn't support javascript.
loading
False signals induced by single-cell imputation.
Andrews, Tallulah S; Hemberg, Martin.
Afiliação
  • Andrews TS; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
  • Hemberg M; Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK.
F1000Res ; 7: 1740, 2018.
Article em En | MEDLINE | ID: mdl-30906525
ABSTRACT

Background:

Single-cell RNASeq is a powerful tool for measuring gene expression at the resolution of individual cells.  A significant challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to deal with this issue, but since these methods generally rely on structure inherent to the dataset under consideration they may not provide any additional information.

Methods:

We evaluated the risk of generating false positive or irreproducible results when imputing data with five different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNASeq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X Chromium and Smartseq2 data from the Tabula Muris database we examined the reproducibility of markers before and after imputation.

Results:

The extent of false-positive signals introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC and knn-smooth, generated a very high number of false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on how well datasets conformed to the underlying model. Furthermore, only SAVER exhibited reproducibility comparable to unimputed data across matched data.

Conclusions:

Imputation of single-cell RNASeq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2018 Tipo de documento: Article