Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Nat Methods ; 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38844628

RESUMO

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

2.
Cancer Immunol Res ; 12(2): 232-246, 2024 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-38091354

RESUMO

Isocitrate dehydrogenase (IDH)-wild-type (WT) high-grade gliomas, especially glioblastomas, are highly aggressive and have an immunosuppressive tumor microenvironment. Although tumor-infiltrating immune cells are known to play a critical role in glioma genesis, their heterogeneity and intercellular interactions remain poorly understood. In this study, we constructed a single-cell transcriptome landscape of immune cells from tumor tissue and matching peripheral blood mononuclear cells (PBMC) from IDH-WT high-grade glioma patients. Our analysis identified two subsets of tumor-associated macrophages (TAM) in tumors with the highest protumorigenesis signatures, highlighting their potential role in glioma progression. We also investigated the T-cell trajectory and identified the aryl hydrocarbon receptor (AHR) as a regulator of T-cell dysfunction, providing a potential target for glioma immunotherapy. We further demonstrated that knockout of AHR decreased chimeric antigen receptor (CAR) T-cell exhaustion and improved CAR T-cell antitumor efficacy both in vitro and in vivo. Finally, we explored intercellular communication mediated by ligand-receptor interactions within the tumor microenvironment and PBMCs and revealed the unique cellular interactions present in the tumor microenvironment. Taken together, our study provides a comprehensive immune landscape of IDH-WT high-grade gliomas and offers potential drug targets for glioma immunotherapy.


Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Isocitrato Desidrogenase/genética , Leucócitos Mononucleares/patologia , Perfilação da Expressão Gênica , Mutação , Microambiente Tumoral/genética
3.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36975610

RESUMO

MOTIVATION: We have entered the multi-omics era and can measure cells from different aspects. Hence, we can get a more comprehensive view by integrating or matching data from different spaces corresponding to the same object. However, it is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Though some techniques can be used to measure scATAC-seq and scRNA-seq simultaneously, the data are usually highly noisy due to the limitations of the experimental environment. RESULTS: To promote single-cell multi-omics research, we overcome the above challenges, proposing a novel framework, contrastive cycle adversarial autoencoders, which can align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Con-AAE can efficiently map the above data with high sparsity and noise from different spaces to a coordinated subspace, where alignment and integration tasks can be easier. We demonstrate its advantages on several datasets. AVAILABILITY AND IMPLEMENTATION: Zenodo link: https://zenodo.org/badge/latestdoi/368779433. github: https://github.com/kakarotcq/Con-AAE.


Assuntos
Multiômica , Análise de Célula Única , Análise de Célula Única/métodos , Sequenciamento do Exoma , Análise de Sequência de RNA
4.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36070863

RESUMO

Computational recovery of gene regulatory network (GRN) has recently undergone a great shift from bulk-cell towards designing algorithms targeting single-cell data. In this work, we investigate whether the widely available bulk-cell data could be leveraged to assist the GRN predictions for single cells. We infer cell-type-specific GRNs from both the single-cell RNA sequencing data and the generic GRN derived from the bulk cells by constructing a weakly supervised learning framework based on the axial transformer. We verify our assumption that the bulk-cell transcriptomic data are a valuable resource, which could improve the prediction of single-cell GRN by conducting extensive experiments. Our GRN-transformer achieves the state-of-the-art prediction accuracy in comparison to existing supervised and unsupervised approaches. In addition, we show that our method can identify important transcription factors and potential regulations for Alzheimer's disease risk genes by using the predicted GRN. Availability: The implementation of GRN-transformer is available at https://github.com/HantaoShu/GRN-Transformer.


Assuntos
Biologia Computacional , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Fatores de Transcrição/genética , Transcriptoma
5.
Cell Discov ; 8(1): 68, 2022 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-35853872

RESUMO

The clear cell renal cell carcinoma (ccRCC) microenvironment consists of many different cell types and structural components that play critical roles in cancer progression and drug resistance, but the cellular architecture and underlying gene regulatory features of ccRCC have not been fully characterized. Here, we applied single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) to generate transcriptional and epigenomic landscapes of ccRCC. We identified tumor cell-specific regulatory programs mediated by four key transcription factors (TFs) (HOXC5, VENTX, ISL1, and OTP), and these TFs have prognostic significance in The Cancer Genome Atlas (TCGA) database. Targeting these TFs via short hairpin RNAs (shRNAs) or small molecule inhibitors decreased tumor cell proliferation. We next performed an integrative analysis of chromatin accessibility and gene expression for CD8+ T cells and macrophages to reveal the different regulatory elements in their subgroups. Furthermore, we delineated the intercellular communications mediated by ligand-receptor interactions within the tumor microenvironment. Taken together, our multiomics approach further clarifies the cellular heterogeneity of ccRCC and identifies potential therapeutic targets.

6.
Sensors (Basel) ; 22(11)2022 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-35684708

RESUMO

It is hard to directly deploy deep learning models on today's smartphones due to the substantial computational costs introduced by millions of parameters. To compress the model, we develop an ℓ0-based sparse group lasso model called MobilePrune which can generate extremely compact neural network models for both desktop and mobile platforms. We adopt group lasso penalty to enforce sparsity at the group level to benefit General Matrix Multiply (GEMM) and develop the very first algorithm that can optimize the ℓ0 norm in an exact manner and achieve the global convergence guarantee in the deep learning context. MobilePrune also allows complicated group structures to be applied on the group penalty (i.e., trees and overlapping groups) to suit DNN models with more complex architectures. Empirically, we observe the substantial reduction of compression ratio and computational costs for various popular deep learning models on multiple benchmark datasets compared to the state-of-the-art methods. More importantly, the compression models are deployed on the android system to confirm that our approach is able to achieve less response delay and battery consumption on mobile phones.


Assuntos
Compressão de Dados , Redes Neurais de Computação , Algoritmos , Fenômenos Físicos
7.
Proc Natl Acad Sci U S A ; 119(11): e2122954119, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35238654

RESUMO

SignificanceSARS-CoV-2 continues to evolve through emerging variants, more frequently observed with higher transmissibility. Despite the wide application of vaccines and antibodies, the selection pressure on the Spike protein may lead to further evolution of variants that include mutations that can evade immune response. To catch up with the virus's evolution, we introduced a deep learning approach to redesign the complementarity-determining regions (CDRs) to target multiple virus variants and obtained an antibody that broadly neutralizes SARS-CoV-2 variants.


Assuntos
Anticorpos Amplamente Neutralizantes/imunologia , COVID-19/imunologia , SARS-CoV-2/imunologia , Anticorpos Neutralizantes/imunologia , Anticorpos Antivirais/imunologia , Anticorpos Amplamente Neutralizantes/farmacologia , Vacinas contra COVID-19/imunologia , Regiões Determinantes de Complementaridade , Aprendizado Profundo , Epitopos/imunologia , Humanos , Imunoterapia/métodos , Testes de Neutralização/métodos , Domínios Proteicos , SARS-CoV-2/patogenicidade , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/imunologia
8.
Bioinformatics ; 38(6): 1607-1614, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34999749

RESUMO

MOTIVATION: Rapidly generated scRNA-seq datasets enable us to understand cellular differences and the function of each individual cell at single-cell resolution. Cell-type classification, which aims at characterizing and labeling groups of cells according to their gene expression, is one of the most important steps for single-cell analysis. To facilitate the manual curation process, supervised learning methods have been used to automatically classify cells. Most of the existing supervised learning approaches only utilize annotated cells in the training step while ignoring the more abundant unannotated cells. In this article, we proposed scPretrain, a multi-task self-supervised learning approach that jointly considers annotated and unannotated cells for cell-type classification. scPretrain consists of a pre-training step and a fine-tuning step. In the pre-training step, scPretrain uses a multi-task learning framework to train a feature extraction encoder based on each dataset's pseudo-labels, where only unannotated cells are used. In the fine-tuning step, scPretrain fine-tunes this feature extraction encoder using the limited annotated cells in a new dataset. RESULTS: We evaluated scPretrain on 60 diverse datasets from different technologies, species and organs, and obtained a significant improvement on both cell-type classification and cell clustering. Moreover, the representations obtained by scPretrain in the pre-training step also enhanced the performance of conventional classifiers, such as random forest, logistic regression and support-vector machines. scPretrain is able to effectively utilize the massive amount of unlabeled data and be applied to annotating increasingly generated scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: The data and code underlying this article are available in scPretrain: Multi-task self-supervised learning for cell type classification, at https://github.com/ruiyi-zhang/scPretrain and https://zenodo.org/record/5802306. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmo Florestas Aleatórias , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Máquina de Vetores de Suporte
9.
Nat Comput Sci ; 2(3): 169-178, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38177446

RESUMO

Resonance structures and features are ubiquitous in optical science. However, capturing their time dynamics in real-world scenarios suffers from long data acquisition time and low analysis accuracy due to slow convergence and limited time windows. Here we report a physics-informed recurrent neural network to forecast the time-domain response of optical resonances and infer corresponding resonance frequencies by acquiring a fraction of the sequence as input. The model is trained in a two-step multi-fidelity framework for high-accuracy forecast, using first a large amount of low-fidelity physical-model-generated synthetic data and then a small set of high-fidelity application-specific data. Through simulations and experiments, we demonstrate that the model is applicable to a wide range of resonances, including dielectric metasurfaces, graphene plasmonics and ultra-strongly coupled Landau polaritons, where our model captures small signal features and learns physical quantities. The demonstrated machine-learning algorithm can help to accelerate the exploration of physical phenomena and device design under resonance-enhanced light-matter interaction.

11.
NAR Genom Bioinform ; 3(4): lqab097, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34729476

RESUMO

Prediction of cancer-specific drug responses as well as identification of the corresponding drug-sensitive genes and pathways remains a major biological and clinical challenge. Deep learning models hold immense promise for better drug response predictions, but most of them cannot provide biological and clinical interpretability. Visible neural network (VNN) models have emerged to solve the problem by giving neurons biological meanings and directly casting biological networks into the models. However, the biological networks used in VNNs are often redundant and contain components that are irrelevant to the downstream predictions. Therefore, the VNNs using these redundant biological networks are overparameterized, which significantly limits VNNs' predictive and explanatory power. To overcome the problem, we treat the edges and nodes in biological networks used in VNNs as features and develop a sparse learning framework ParsVNN to learn parsimony VNNs with only edges and nodes that contribute the most to the prediction task. We applied ParsVNN to build cancer-specific VNN models to predict drug response for five different cancer types. We demonstrated that the parsimony VNNs built by ParsVNN are superior to other state-of-the-art methods in terms of prediction performance and identification of cancer driver genes. Furthermore, we found that the pathways selected by ParsVNN have great potential to predict clinical outcomes as well as recommend synergistic drug combinations.

12.
Nature ; 600(7889): 536-542, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34819669

RESUMO

The cell is a multi-scale structure with modular organization across at least four orders of magnitude1. Two central approaches for mapping this structure-protein fluorescent imaging and protein biophysical association-each generate extensive datasets, but of distinct qualities and resolutions that are typically treated separately2,3. Here we integrate immunofluorescence images in the Human Protein Atlas4 with affinity purifications in BioPlex5 to create a unified hierarchical map of human cell architecture. Integration is achieved by configuring each approach as a general measure of protein distance, then calibrating the two measures using machine learning. The map, known as the multi-scale integrated cell (MuSIC 1.0), resolves 69 subcellular systems, of which approximately half are to our knowledge undocumented. Accordingly, we perform 134 additional affinity purifications and validate subunit associations for the majority of systems. The map reveals a pre-ribosomal RNA processing assembly and accessory factors, which we show govern rRNA maturation, and functional roles for SRRM1 and FAM120C in chromatin and RPS3A in splicing. By integration across scales, MuSIC increases the resolution of imaging while giving protein interactions a spatial dimension, paving the way to incorporate diverse types of data in proteome-wide cell maps.


Assuntos
Cromossomos , Proteoma , Antígenos Nucleares/genética , Antígenos Nucleares/metabolismo , Cromatina/genética , Cromossomos/metabolismo , Humanos , Proteínas Associadas à Matriz Nuclear/metabolismo , Proteoma/metabolismo , RNA Ribossômico , Proteínas de Ligação a RNA/genética
13.
Nat Cancer ; 2(2): 233-244, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-34223192

RESUMO

Cell-line screens create expansive datasets for learning predictive markers of drug response, but these models do not readily translate to the clinic with its diverse contexts and limited data. In the present study, we apply a recently developed technique, few-shot machine learning, to train a versatile neural network model in cell lines that can be tuned to new contexts using few additional samples. The model quickly adapts when switching among different tissue types and in moving from cell-line models to clinical contexts, including patient-derived tumor cells and patient-derived xenografts. It can also be interpreted to identify the molecular features most important to a drug response, highlighting critical roles for RB1 and SMAD4 in the response to CDK inhibition and RNF8 and CHD4 in the response to ATM inhibition. The few-shot learning framework provides a bridge from the many samples surveyed in high-throughput screens (n-of-many) to the distinctive contexts of individual patients (n-of-one).


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Proteínas de Ligação a DNA , Humanos , Ubiquitina-Proteína Ligases
14.
Bioinformatics ; 37(Suppl_1): i254-i261, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252932

RESUMO

MOTIVATION: The prediction of the binding between peptides and major histocompatibility complex (MHC) molecules plays an important role in neoantigen identification. Although a large number of computational methods have been developed to address this problem, they produce high false-positive rates in practical applications, since in most cases, a single residue mutation may largely alter the binding affinity of a peptide binding to MHC which cannot be identified by conventional deep learning methods. RESULTS: We developed a differential boundary tree-based model, named DBTpred, to address this problem. We demonstrated that DBTpred can accurately predict MHC class I binding affinity compared to the state-of-art deep learning methods. We also presented a parallel training algorithm to accelerate the training and inference process which enables DBTpred to be applied to large datasets. By investigating the statistical properties of differential boundary trees and the prediction paths to test samples, we revealed that DBTpred can provide an intuitive interpretation and possible hints in detecting important residue mutations that can largely influence binding affinity. AVAILABILITY AND IMPLEMENTATION: The DBTpred package is implemented in Python and freely available at: https://github.com/fpy94/DBT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antígenos de Histocompatibilidade Classe I , Peptídeos , Algoritmos , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/metabolismo , Humanos , Complexo Principal de Histocompatibilidade , Peptídeos/metabolismo , Ligação Proteica
15.
Bioinformatics ; 37(Suppl_1): i410-i417, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252957

RESUMO

MOTIVATION: Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. RESULTS: In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. AVAILABILITY AND IMPLEMENTATION: Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.


Assuntos
Bases de Dados Genéticas , Redes Neurais de Computação , Algoritmos , Humanos , Aprendizado de Máquina
16.
Immunity ; 54(6): 1304-1319.e9, 2021 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-34048708

RESUMO

Despite mounting evidence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) engagement with immune cells, most express little, if any, of the canonical receptor of SARS-CoV-2, angiotensin-converting enzyme 2 (ACE2). Here, using a myeloid cell receptor-focused ectopic expression screen, we identified several C-type lectins (DC-SIGN, L-SIGN, LSECtin, ASGR1, and CLEC10A) and Tweety family member 2 (TTYH2) as glycan-dependent binding partners of the SARS-CoV-2 spike. Except for TTYH2, these molecules primarily interacted with spike via regions outside of the receptor-binding domain. Single-cell RNA sequencing analysis of pulmonary cells from individuals with coronavirus disease 2019 (COVID-19) indicated predominant expression of these molecules on myeloid cells. Although these receptors do not support active replication of SARS-CoV-2, their engagement with the virus induced robust proinflammatory responses in myeloid cells that correlated with COVID-19 severity. We also generated a bispecific anti-spike nanobody that not only blocked ACE2-mediated infection but also the myeloid receptor-mediated proinflammatory responses. Our findings suggest that SARS-CoV-2-myeloid receptor interactions promote immune hyperactivation, which represents potential targets for COVID-19 therapy.


Assuntos
COVID-19/metabolismo , COVID-19/virologia , Interações Hospedeiro-Patógeno , Lectinas Tipo C/metabolismo , Proteínas de Membrana/metabolismo , Células Mieloides/imunologia , Células Mieloides/metabolismo , Proteínas de Neoplasias/metabolismo , SARS-CoV-2/fisiologia , Enzima de Conversão de Angiotensina 2/metabolismo , Sítios de Ligação , COVID-19/genética , Linhagem Celular , Citocinas , Regulação da Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Humanos , Mediadores da Inflamação/metabolismo , Lectinas Tipo C/química , Proteínas de Membrana/química , Modelos Moleculares , Proteínas de Neoplasias/química , Ligação Proteica , Conformação Proteica , Anticorpos de Domínio Único/imunologia , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/imunologia , Glicoproteína da Espícula de Coronavírus/metabolismo , Relação Estrutura-Atividade
17.
Nat Comput Sci ; 1(7): 491-501, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38217125

RESUMO

Gene regulatory networks (GRNs) encode the complex molecular interactions that govern cell identity. Here we propose DeepSEM, a deep generative model that can jointly infer GRNs and biologically meaningful representation of single-cell RNA sequencing (scRNA-seq) data. In particular, we developed a neural network version of the structural equation model (SEM) to explicitly model the regulatory relationships among genes. Benchmark results show that DeepSEM achieves comparable or better performance on a variety of single-cell computational tasks, such as GRN inference, scRNA-seq data visualization, clustering and simulation, compared with the state-of-the-art methods. In addition, the gene regulations predicted by DeepSEM on cell-type marker genes in the mouse cortex can be validated by epigenetic data, which further demonstrates the accuracy and efficiency of our method. DeepSEM can provide a useful and powerful tool to analyze scRNA-seq data and infer a GRN.

18.
Sensors (Basel) ; 21(1)2020 Dec 26.
Artigo em Inglês | MEDLINE | ID: mdl-33375324

RESUMO

In this paper, we propose AirSign, a novel user authentication technology to provide users with more convenient, intuitive, and secure ways of interacting with smartphones in daily settings. AirSign leverages both acoustic and motion sensors for user authentication by signing signatures in the air through smartphones without requiring any special hardware. This technology actively transmits inaudible acoustic signals from the earpiece speaker, receives echoes back through both built-in microphones to "illuminate" signature and hand geometry, and authenticates users according to the unique features extracted from echoes and motion sensors. To evaluate our system, we collected registered, genuine, and forged signatures from 30 participants, and by applying AirSign on the above dataset, we were able to successfully distinguish between genuine and forged signatures with a 97.1% F-score while requesting only seven signatures during the registration phase.

19.
Cancer Cell ; 38(5): 672-684.e6, 2020 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-33096023

RESUMO

Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. We address these challenges by developing DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of 1,235 tumor cell lines to 684 drugs. Tumor genotypes induce states in cellular subsystems that are integrated with drug structure to predict response to therapy and, simultaneously, learn biological mechanisms underlying the drug response. DrugCell predictions are accurate in cell lines and also stratify clinical outcomes. Analysis of DrugCell mechanisms leads directly to the design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. DrugCell provides a blueprint for constructing interpretable models for predictive medicine.


Assuntos
Antineoplásicos/uso terapêutico , Biologia Computacional/métodos , Neoplasias/tratamento farmacológico , Antineoplásicos/farmacologia , Linhagem Celular Tumoral , Bases de Dados Factuais , Aprendizado Profundo , Ensaios de Seleção de Medicamentos Antitumorais , Sinergismo Farmacológico , Genótipo , Humanos , Neoplasias/genética , Modelagem Computacional Específica para o Paciente
20.
Bioinformatics ; 36(Suppl_1): i542-i550, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657383

RESUMO

MOTIVATION: Cellular Indexing of Transcriptomes and Epitopes by sequencing (CITE-seq), couples the measurement of surface marker proteins with simultaneous sequencing of mRNA at single cell level, which brings accurate cell surface phenotyping to single-cell transcriptomics. Unfortunately, multiplets in CITE-seq datasets create artificial cell types (ACT) and complicate the automation of cell surface phenotyping. RESULTS: We propose CITE-sort, an artificial-cell-type aware surface marker clustering method for CITE-seq. CITE-sort is aware of and is robust to multiplet-induced ACT. We benchmarked CITE-sort with real and simulated CITE-seq datasets and compared CITE-sort against canonical clustering methods. We show that CITE-sort produces the best clustering performance across the board. CITE-sort not only accurately identifies real biological cell types (BCT) but also consistently and reliably separates multiplet-induced artificial-cell-type droplet clusters from real BCT droplet clusters. In addition, CITE-sort organizes its clustering process with a binary tree, which facilitates easy interpretation and verification of its clustering result and simplifies cell-type annotation with domain knowledge in CITE-seq. AVAILABILITY AND IMPLEMENTATION: http://github.com/QiuyuLian/CITE-sort. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise por Conglomerados , Epitopos , Análise de Sequência de RNA , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA