Pesquisa | Portal de Pesquisa da BVS Enfermagem

Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data.

Xu, Fan; Wang, Shike; Dai, Xinnan; Mundra, Piyushkumar A; Zheng, Jie.

Methods ; 189: 65-73, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33039573

RESUMO

Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNA-seq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundances from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundances prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.

Assuntos

Biologia Computacional/métodos , Aprendizado Profundo , Regulação da Expressão Gênica , Proteínas de Membrana/genética , Transcriptoma , Humanos , Análise de Sequência de RNA , Análise de Célula Única

PIKE-R2P: Protein-protein interaction network-based knowledge embedding with graph neural network for single-cell RNA to protein prediction.

Dai, Xinnan; Xu, Fan; Wang, Shike; Mundra, Piyushkumar A; Zheng, Jie.

BMC Bioinformatics ; 22(Suppl 6): 139, 2021 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-34078261

RESUMO

BACKGROUND: Recent advances in simultaneous measurement of RNA and protein abundances at single-cell level provide a unique opportunity to predict protein abundance from scRNA-seq data using machine learning models. However, existing machine learning methods have not considered relationship among the proteins sufficiently. RESULTS: We formulate this task in a multi-label prediction framework where multiple proteins are linked to each other at the single-cell level. Then, we propose a novel method for single-cell RNA to protein prediction named PIKE-R2P, which incorporates protein-protein interactions (PPI) and prior knowledge embedding into a graph neural network. Compared with existing methods, PIKE-R2P could significantly improve prediction performance in terms of smaller errors and higher correlations with the gold standard measurements. CONCLUSION: The superior performance of PIKE-R2P indicates that adding the prior knowledge of PPI to graph neural networks can be a powerful strategy for cross-modality prediction of protein abundances at the single-cell level.

Assuntos

Mapas de Interação de Proteínas , RNA , Algoritmos , Aprendizado de Máquina , Redes Neurais de Computação

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA