RESUMO
Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.
Assuntos
Aprendizado de Máquina , Análise de Célula Única/métodos , Algoritmos , Artrite Reumatoide , Sequenciamento de Cromatina por Imunoprecipitação , Análise por Conglomerados , Expressão Gênica , Genes Mitocondriais , Humanos , RNA-Seq , Projetos de Pesquisa , Análise de Sequência de RNA , SoftwareRESUMO
BACKGROUND: Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. RESULTS: We present SCCONSENSUS, an [Formula: see text] framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. CONCLUSIONS: SCCONSENSUS combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. SCCONSENSUS is implemented in [Formula: see text] and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus .
Assuntos
RNA , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , Leucócitos Mononucleares , Análise de Sequência de RNARESUMO
We report a case of isodicentric chromosome 15 (idic(15) chromosome), the presence of which resulted in uncontrolled seizures, including epileptic spasms, tonic seizures, and global developmental delay. A 10-month-old female infant was referred to our pediatric neurology clinic because of uncontrolled seizures and global developmental delay. She had generalized tonic-clonic seizures since 7 months of age. At referral, she could not control her head and presented with generalized hypotonia. Her brain magnetic resonance imaging scans and metabolic evaluation results were normal. Routine karyotyping indicated the presence of a supernumerary marker chromosome of unknown origin (47, XX +mar). An array-comparative genomic hybridization (CGH) analysis revealed amplification from 15q11.1 to 15q13.1. Subsequent fluorescence in situ hybridization analysis confirmed a idic(15) chromosome. Array-CGH analysis has the advantage in determining the unknown origin of a supernumerary marker chromosome, and could be a useful method for the genetic diagnosis of epilepsy syndromes associated with various chromosomal aberrations.