Your browser doesn't support javascript.
loading
A comprehensive assessment of hurdle and zero-inflated models for single cell RNA-sequencing analysis.
Cui, Tao; Wang, Tingting.
Afiliação
  • Cui T; Department of Pharmacology and Physiology Georgetown University Medical Center  SE407 Med/Dent 3900 Reservoir Road, N.W. Washington D.C., USA.
  • Wang T; Department of Pharmacology and Physiology Georgetown University Medical Center  SE407 Med/Dent 3900 Reservoir Road, N.W. Washington D.C., USA.
Brief Bioinform ; 24(5)2023 09 20.
Article em En | MEDLINE | ID: mdl-37507115
Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Perfilação da Expressão Gênica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Perfilação da Expressão Gênica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido