RESUMO
The recent advances in single-cell RNA sequencing (scRNA-seq) techniques have stimulated efforts to identify and characterize the cellular composition of complex tissues. With the advent of various sequencing techniques, automated cell-type annotation using a well-annotated scRNA-seq reference becomes popular. But it relies on the diversity of cell types in the reference, which may not capture all the cell types present in the query data of interest. There are generally unseen cell types in the query data of interest because most data atlases are obtained for different purposes and techniques. Identifying previously unseen cell types is essential for improving annotation accuracy and uncovering novel biological discoveries. To address this challenge, we propose mtANN (multiple-reference-based scRNA-seq data annotation), a new method to automatically annotate query data while accurately identifying unseen cell types with the aid of multiple references. Key innovations of mtANN include the integration of deep learning and ensemble learning to improve prediction accuracy, and the introduction of a new metric that considers three complementary aspects to distinguish between unseen cell types and shared cell types. Additionally, we provide a data-driven method to adaptively select a threshold for identifying previously unseen cell types. We demonstrate the advantages of mtANN over state-of-the-art methods for unseen cell-type identification and cell-type annotation on two benchmark dataset collections, as well as its predictive power on a collection of COVID-19 datasets. The source code and tutorial are available at https://github.com/Zhangxf-ccnu/mtANN.
Assuntos
Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , COVID-19/diagnóstico , SoftwareRESUMO
Inferring gene co-expression networks from high-throughput gene expression data is an important task in bioinformatics. Many gene networks often exhibit modular structures. Although several Gaussian graphical model-based methods have been developed to estimate gene co-expression networks by incorporating the modular structural prior, none of them takes into account the modular structures captured by the prior networks (e.g., protein interaction networks). In this study, we propose a novel prior network-dependent gene network inference (pGNI) method to estimate gene co-expression networks by integrating gene expression data and prior protein interaction network data. The underlying modular structure is learned from both sets of data. Through simulation studies, we demonstrate the feasibility and effectiveness of our method. We also apply our method to two real datasets. The modular structures in the networks estimated by our method are biological significant.