Your browser doesn't support javascript.
loading
Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.
BinTayyash, Nuha; Georgaka, Sokratia; John, S T; Ahmed, Sumon; Boukouvalas, Alexis; Hensman, James; Rattray, Magnus.
Afiliação
  • BinTayyash N; School of Computer Science, University of Manchester, Manchester M13 9PL, UK.
  • Georgaka S; Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK.
  • John ST; Secondmind, Cambridge CB2 1LA, UK.
  • Ahmed S; Finnish Center for Artificial Intelligence, FCAI, Department of Computer Science, Aalto University, Finland.
  • Boukouvalas A; Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK.
  • Hensman J; Institute of Information Technology, University of Dhaka, Dhaka 1000, Bangladesh.
  • Rattray M; Amazon, Cambridge CB1 2GA, UK.
Bioinformatics ; 37(21): 3788-3795, 2021 11 05.
Article em En | MEDLINE | ID: mdl-34213536
ABSTRACT
MOTIVATION The negative binomial distribution has been shown to be a good model for counts data from both bulk and single-cell RNA-sequencing (RNA-seq). Gaussian process (GP) regression provides a useful non-parametric approach for modelling temporal or spatial changes in gene expression. However, currently available GP regression methods that implement negative binomial likelihood models do not scale to the increasingly large datasets being produced by single-cell and spatial transcriptomics.

RESULTS:

The GPcounts package implements GP regression methods for modelling counts data using a negative binomial likelihood function. Computational efficiency is achieved through the use of variational Bayesian inference. The GP function models changes in the mean of the negative binomial likelihood through a logarithmic link function and the dispersion parameter is fitted by maximum likelihood. We validate the method on simulated time course data, showing better performance to identify changes in over-dispersed counts data than methods based on Gaussian or Poisson likelihoods. To demonstrate temporal inference, we apply GPcounts to single-cell RNA-seq datasets after pseudotime and branching inference. To demonstrate spatial inference, we apply GPcounts to data from the mouse olfactory bulb to identify spatially variable genes and compare to two published GP methods. We also provide the option of modelling additional dropout using a zero-inflated negative binomial. Our results show that GPcounts can be used to model temporal and spatial counts data in cases where simpler Gaussian and Poisson likelihoods are unrealistic. AVAILABILITY AND IMPLEMENTATION GPcounts is implemented using the GPflow library in Python and is available at https//github.com/ManchesterBioinference/GPcounts along with the data, code and notebooks required to reproduce the results presented here. The version used for this paper is archived at https//doi.org/10.5281/zenodo.5027066. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Perfilação da Expressão Gênica Tipo de estudo: Prognostic_studies Limite: Animals Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Perfilação da Expressão Gênica Tipo de estudo: Prognostic_studies Limite: Animals Idioma: En Ano de publicação: 2021 Tipo de documento: Article