Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
J Biomed Inform ; 154: 104648, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38692464

ABSTRACT

BACKGROUND: Advances in artificial intelligence (AI) have realized the potential of revolutionizing healthcare, such as predicting disease progression via longitudinal inspection of Electronic Health Records (EHRs) and lab tests from patients admitted to Intensive Care Units (ICU). Although substantial literature exists addressing broad subjects, including the prediction of mortality, length-of-stay, and readmission, studies focusing on forecasting Acute Kidney Injury (AKI), specifically dialysis anticipation like Continuous Renal Replacement Therapy (CRRT) are scarce. The technicality of how to implement AI remains elusive. OBJECTIVE: This study aims to elucidate the important factors and methods that are required to develop effective predictive models of AKI and CRRT for patients admitted to ICU, using EHRs in the Medical Information Mart for Intensive Care (MIMIC) database. METHODS: We conducted a comprehensive comparative analysis of established predictive models, considering both time-series measurements and clinical notes from MIMIC-IV databases. Subsequently, we proposed a novel multi-modal model which integrates embeddings of top-performing unimodal models, including Long Short-Term Memory (LSTM) and BioMedBERT, and leverages both unstructured clinical notes and structured time series measurements derived from EHRs to enable the early prediction of AKI and CRRT. RESULTS: Our multimodal model achieved a lead time of at least 12 h ahead of clinical manifestation, with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.888 for AKI and 0.997 for CRRT, as well as an Area Under the Precision Recall Curve (AUPRC) of 0.727 for AKI and 0.840 for CRRT, respectively, which significantly outperformed the baseline models. Additionally, we performed a SHapley Additive exPlanation (SHAP) analysis using the expected gradients algorithm, which highlighted important, previously underappreciated predictive features for AKI and CRRT. CONCLUSION: Our study revealed the importance and the technicality of applying longitudinal, multimodal modeling to improve early prediction of AKI and CRRT, offering insights for timely interventions. The performance and interpretability of our model indicate its potential for further assessment towards clinical applications, to ultimately optimize AKI management and enhance patient outcomes.


Subject(s)
Acute Kidney Injury , Electronic Health Records , Intensive Care Units , Acute Kidney Injury/therapy , Humans , Longitudinal Studies , Renal Replacement Therapy , Artificial Intelligence , Forecasting , Length of Stay , Male , Databases, Factual , Female
2.
medRxiv ; 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38559064

ABSTRACT

Background: Advances in artificial intelligence (AI) have realized the potential of revolutionizing healthcare, such as predicting disease progression via longitudinal inspection of Electronic Health Records (EHRs) and lab tests from patients admitted to Intensive Care Units (ICU). Although substantial literature exists addressing broad subjects, including the prediction of mortality, length-of-stay, and readmission, studies focusing on forecasting Acute Kidney Injury (AKI), specifically dialysis anticipation like Continuous Renal Replacement Therapy (CRRT) are scarce. The technicality of how to implement AI remains elusive. Objective: This study aims to elucidate the important factors and methods that are required to develop effective predictive models of AKI and CRRT for patients admitted to ICU, using EHRs in the Medical Information Mart for Intensive Care (MIMIC) database. Methods: We conducted a comprehensive comparative analysis of established predictive models, considering both time-series measurements and clinical notes from MIMIC-IV databases. Subsequently, we proposed a novel multi-modal model which integrates embeddings of top-performing unimodal models, including Long Short-Term Memory (LSTM) and BioMedBERT, and leverages both unstructured clinical notes and structured time series measurements derived from EHRs to enable the early prediction of AKI and CRRT. Results: Our multimodal model achieved a lead time of at least 12 hours ahead of clinical manifestation, with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.888 for AKI and 0.997 for CRRT, as well as an Area Under the Precision Recall Curve (AUPRC) of 0.727 for AKI and 0.840 for CRRT, respectively, which significantly outperformed the baseline models. Additionally, we performed a SHapley Additive exPlanation (SHAP) analysis using the expected gradients algorithm, which highlighted important, previously underappreciated predictive features for AKI and CRRT. Conclusion: Our study revealed the importance and the technicality of applying longitudinal, multimodal modeling to improve early prediction of AKI and CRRT, offering insights for timely interventions. The performance and interpretability of our model indicate its potential for further assessment towards clinical applications, to ultimately optimize AKI management and enhance patient outcomes.

3.
Commun Biol ; 7(1): 326, 2024 Mar 14.
Article in English | MEDLINE | ID: mdl-38486077

ABSTRACT

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (LAD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate LAD on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). LAD provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.


Subject(s)
Cluster Analysis , Animals , Mice , Gene Expression
4.
Nat Med ; 30(3): 772-784, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38238616

ABSTRACT

There is a pressing need for allogeneic chimeric antigen receptor (CAR)-immune cell therapies that are safe, effective and affordable. We conducted a phase 1/2 trial of cord blood-derived natural killer (NK) cells expressing anti-CD19 chimeric antigen receptor and interleukin-15 (CAR19/IL-15) in 37 patients with CD19+ B cell malignancies. The primary objectives were safety and efficacy, defined as day 30 overall response (OR). Secondary objectives included day 100 response, progression-free survival, overall survival and CAR19/IL-15 NK cell persistence. No notable toxicities such as cytokine release syndrome, neurotoxicity or graft-versus-host disease were observed. The day 30 and day 100 OR rates were 48.6% for both. The 1-year overall survival and progression-free survival were 68% and 32%, respectively. Patients who achieved OR had higher levels and longer persistence of CAR-NK cells. Receiving CAR-NK cells from a cord blood unit (CBU) with nucleated red blood cells ≤ 8 × 107 and a collection-to-cryopreservation time ≤ 24 h was the most significant predictor for superior outcome. NK cells from these optimal CBUs were highly functional and enriched in effector-related genes. In contrast, NK cells from suboptimal CBUs had upregulation of inflammation, hypoxia and cellular stress programs. Finally, using multiple mouse models, we confirmed the superior antitumor activity of CAR/IL-15 NK cells from optimal CBUs in vivo. These findings uncover new features of CAR-NK cell biology and underscore the importance of donor selection for allogeneic cell therapies. ClinicalTrials.gov identifier: NCT03056339 .


Subject(s)
Hematopoietic Stem Cell Transplantation , Neoplasms , Receptors, Chimeric Antigen , Animals , Mice , Humans , Receptors, Chimeric Antigen/genetics , Interleukin-15 , Killer Cells, Natural , Immunotherapy, Adoptive/adverse effects , Antigens, CD19 , Adaptor Proteins, Signal Transducing
6.
Nat Biotechnol ; 2023 Aug 17.
Article in English | MEDLINE | ID: mdl-37592035

ABSTRACT

Single-cell omics technologies enable molecular characterization of diverse cell types and states, but how the resulting transcriptional and epigenetic profiles depend on the cell's genetic background remains understudied. We describe Monopogen, a computational tool to detect single-nucleotide variants (SNVs) from single-cell sequencing data. Monopogen leverages linkage disequilibrium from external reference panels to identify germline SNVs and detects putative somatic SNVs using allele cosegregating patterns at the cell population level. It can identify 100 K to 3 M germline SNVs achieving a genotyping accuracy of 95%, together with hundreds of putative somatic SNVs. Monopogen-derived genotypes enable global and local ancestry inference and identification of admixed samples. It identifies variants associated with cardiomyocyte metabolic levels and epigenomic programs. It also improves putative somatic SNV detection that enables clonal lineage tracing in primary human clonal hematopoiesis. Monopogen brings together population genetics, cell lineage tracing and single-cell omics to uncover genetic determinants of cellular processes.

7.
Res Sq ; 2023 Jul 26.
Article in English | MEDLINE | ID: mdl-37547002

ABSTRACT

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Batch-Corrected Distance (BCD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate BCD on simulated data as well as applied it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). BCD achieves more accurate clusters and better visualizations than state-of-the-art batch correction methods on longitudinal datasets. BCD can be directly integrated with most clustering and visualization methods to enable more scientific findings.

8.
Sci Adv ; 9(30): eadd6997, 2023 07 28.
Article in English | MEDLINE | ID: mdl-37494448

ABSTRACT

Chimeric antigen receptor (CAR) engineering of natural killer (NK) cells is promising, with early-phase clinical studies showing encouraging responses. However, the transcriptional signatures that control the fate of CAR-NK cells after infusion and factors that influence tumor control remain poorly understood. We performed single-cell RNA sequencing and mass cytometry to study the heterogeneity of CAR-NK cells and their in vivo evolution after adoptive transfer, from the phase of tumor control to relapse. Using a preclinical model of noncurative lymphoma and samples from a responder and a nonresponder patient treated with CAR19/IL-15 NK cells, we observed the emergence of NK cell clusters with distinct patterns of activation, function, and metabolic signature associated with different phases of in vivo evolution and tumor control. Interaction with the highly metabolically active tumor resulted in loss of metabolic fitness in NK cells that could be partly overcome by incorporation of IL-15 in the CAR construct.


Subject(s)
Receptors, Chimeric Antigen , Humans , Receptors, Chimeric Antigen/genetics , Receptors, Chimeric Antigen/metabolism , Interleukin-15/genetics , Interleukin-15/metabolism , Cytokines/metabolism , Cell Line, Tumor , Killer Cells, Natural , Cell- and Tissue-Based Therapy
9.
bioRxiv ; 2023 Dec 19.
Article in English | MEDLINE | ID: mdl-38187699

ABSTRACT

Key to understanding many biological phenomena is knowing the temporal ordering of cellular events, which often require continuous direct observations [1, 2]. An alternative solution involves the utilization of irreversible genetic changes, such as naturally occurring mutations, to create indelible markers that enables retrospective temporal ordering [3-8]. Using NSC-seq, a newly designed and validated multi-purpose single-cell CRISPR platform, we developed a molecular clock approach to record the timing of cellular events and clonality in vivo , while incorporating assigned cell state and lineage information. Using this approach, we uncovered precise timing of tissue-specific cell expansion during murine embryonic development and identified new intestinal epithelial progenitor states by their unique genetic histories. NSC-seq analysis of murine adenomas and single-cell multi-omic profiling of human precancers as part of the Human Tumor Atlas Network (HTAN), including 116 scRNA-seq datasets and clonal analysis of 418 human polyps, demonstrated the occurrence of polyancestral initiation in 15-30% of colonic precancers, revealing their origins from multiple normal founders. Thus, our multimodal framework augments existing single-cell analyses and lays the foundation for in vivo multimodal recording, enabling the tracking of lineage and temporal events during development and tumorigenesis.

10.
Life Sci Alliance ; 5(12)2022 12.
Article in English | MEDLINE | ID: mdl-36241426

ABSTRACT

The FcγRII (CD32) ligands are IgFc fragments and pentraxins. The existence of additional ligands is unknown. We engineered T cells with human chimeric receptors resulting from the fusion between CD32 extracellular portion and transmembrane CD8α linked to CD28/ζ chain intracellular moiety (CD32-CR). Transduced T cells recognized three breast cancer (BC) and one colon cancer cell line among 15 tested in the absence of targeting antibodies. Sensitive BC cell conjugation with CD32-CR T cells induced CD32 polarization and down-regulation, CD107a release, mutual elimination, and proinflammatory cytokine production unaffected by human IgGs but enhanced by cetuximab. CD32-CR T cells protected immunodeficient mice from subcutaneous growth of MDA-MB-468 BC cells. RNAseq analysis identified a 42 gene fingerprint predicting BC cell sensitivity and favorable outcomes in advanced BC. ICAM1 was a major regulator of CD32-CR T cell-mediated cytotoxicity. CD32-CR T cells may help identify cell surface CD32 ligand(s) and novel prognostically relevant transcriptomic signatures and develop innovative BC treatments.


Subject(s)
Breast Neoplasms , T-Lymphocytes , Animals , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Breast Neoplasms/therapy , CD28 Antigens/metabolism , Cetuximab/metabolism , Female , Humans , Ligands , Mice
11.
Genes Dev ; 2022 Aug 25.
Article in English | MEDLINE | ID: mdl-36008138

ABSTRACT

Stem cells are fundamental units of tissue remodeling whose functions are dictated by lineage-specific transcription factors. Home to epidermal stem cells and their upward-stratifying progenies, skin relies on its secretory functions to form the outermost protective barrier, of which a transcriptional orchestrator has been elusive. KLF5 is a Krüppel-like transcription factor broadly involved in development and regeneration whose lineage specificity, if any, remains unclear. Here we report KLF5 specifically marks the epidermis, and its deletion leads to skin barrier dysfunction in vivo. Lipid envelopes and secretory lamellar bodies are defective in KLF5-deficient skin, accompanied by preferential loss of complex sphingolipids. KLF5 binds to and transcriptionally regulates genes encoding rate-limiting sphingolipid metabolism enzymes. Remarkably, skin barrier defects elicited by KLF5 ablation can be rescued by dietary interventions. Finally, we found that KLF5 is widely suppressed in human diseases with disrupted epidermal secretion, and its regulation of sphingolipid metabolism is conserved in human skin. Altogether, we established KLF5 as a disease-relevant transcription factor governing sphingolipid metabolism and barrier function in the skin, likely representing a long-sought secretory lineage-defining factor across tissue types.

12.
Genome Biol ; 23(1): 112, 2022 05 09.
Article in English | MEDLINE | ID: mdl-35534898

ABSTRACT

Integration of single-cell multiomics profiles generated by different single-cell technologies from the same biological sample is still challenging. Previous approaches based on shared features have only provided approximate solutions. Here, we present a novel mathematical solution named bi-order canonical correlation analysis (bi-CCA), which extends the widely used CCA approach to iteratively align the rows and the columns between data matrices. Bi-CCA is generally applicable to combinations of any two single-cell modalities. Validations using co-assayed ground truth data and application to a CAR-NK study and a fetal muscle atlas demonstrate its capability in generating accurate multimodal co-embeddings and discovering cellular identity.

13.
BMC Bioinformatics ; 23(1): 2, 2022 Jan 04.
Article in English | MEDLINE | ID: mdl-34983369

ABSTRACT

Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .


Subject(s)
Neoplasms , Software , Binomial Distribution , Humans , Neoplasms/genetics , Research Design , Sample Size
14.
Nat Commun ; 13(1): 474, 2022 01 25.
Article in English | MEDLINE | ID: mdl-35078987

ABSTRACT

The specificity of CRISPR/Cas9 genome editing is largely determined by the sequences of guide RNA (gRNA) and the targeted DNA, yet the sequence-dependent rules underlying off-target effects are not fully understood. To systematically explore the sequence determinants governing CRISPR/Cas9 specificity, here we describe a dual-target system to measure the relative cleavage rate between off- and on-target sequences (off-on ratios) of 1902 gRNAs on 13,314 synthetic target sequences, and reveal a set of sequence rules involving 2 factors in off-targeting: 1) a guide-intrinsic mismatch tolerance (GMT) independent of the mismatch context; 2) an "epistasis-like" combinatorial effect of multiple mismatches, which are associated with the free-energy landscape in R-loop formation and are explainable by a multi-state kinetic model. These sequence rules lead to the development of MOFF, a model-based predictor of Cas9-mediated off-target effects. Moreover, the "epistasis-like" combinatorial effect suggests a strategy of allele-specific genome editing using mismatched guides. With the aid of MOFF prediction, this strategy significantly improves the selectivity and expands the application domain of Cas9-based allele-specific editing, as tested in a high-throughput allele-editing screen on 18 cancer hotspot mutations.


Subject(s)
Base Sequence/genetics , CRISPR-Cas Systems , Gene Editing/methods , Mutation , Neoplasms/therapy , RNA, Guide, Kinetoplastida/chemistry , Cell Line , Humans , Neoplasms/genetics , Neoplasms/pathology , RNA, Guide, Kinetoplastida/genetics
15.
Cell Rep ; 36(3): 109432, 2021 07 20.
Article in English | MEDLINE | ID: mdl-34270918

ABSTRACT

Adoptive cell therapy with virus-specific T cells has been used successfully to treat life-threatening viral infections, supporting application of this approach to coronavirus disease 2019 (COVID-19). We expand severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) T cells from the peripheral blood of COVID-19-recovered donors and non-exposed controls using different culture conditions. We observe that the choice of cytokines modulates the expansion, phenotype, and hierarchy of antigenic recognition by SARS-CoV-2 T cells. Culture with interleukin (IL)-2/4/7, but not under other cytokine-driven conditions, results in more than 1,000-fold expansion in SARS-CoV-2 T cells with a retained phenotype, function, and hierarchy of antigenic recognition compared with baseline (pre-expansion) samples. Expanded cytotoxic T lymphocytes (CTLs) are directed against structural SARS-CoV-2 proteins, including the receptor-binding domain of Spike. SARS-CoV-2 T cells cannot be expanded efficiently from the peripheral blood of non-exposed controls. Because corticosteroids are used for management of severe COVID-19, we propose an efficient strategy to inactivate the glucocorticoid receptor gene (NR3C1) in SARS-CoV-2 CTLs using CRISPR-Cas9 gene editing.

16.
Mol Biol Evol ; 38(10): 4463-4474, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34152401

ABSTRACT

The Peranakan Chinese are culturally unique descendants of immigrants from China who settled in the Malay Archipelago ∼300-500 years ago. Today, among large communities in Southeast Asia, the Peranakans have preserved Chinese traditions with strong influence from the local indigenous Malays. Yet, whether or to what extent genetic admixture co-occurred with the cultural mixture has been a topic of ongoing debate. We performed whole-genome sequencing (WGS) on 177 Singapore (SG) Peranakans and analyzed the data jointly with WGS data of Asian and European populations. We estimated that Peranakan Chinese inherited ∼5.62% (95% confidence interval [CI]: 4.76-6.49%) Malay ancestry, much higher than that in SG Chinese (1.08%, 0.65-1.51%), southern Chinese (0.86%, 0.50-1.23%), and northern Chinese (0.25%, 0.18-0.32%). A sex-biased admixture history, in which the Malay ancestry was contributed primarily by females, was supported by X chromosomal variants, and mitochondrial (MT) and Y haplogroups. Finally, we identified an ancient admixture event shared by Peranakan Chinese and SG Chinese ∼1,612 (95% CI: 1,345-1,923) years ago, coinciding with the settlement history of Han Chinese in southern China, apart from the recent admixture event with Malays unique to Peranakan Chinese ∼190 (159-213) years ago. These findings greatly advance our understanding of the dispersal history of Chinese and their interaction with indigenous populations in Southeast Asia.


Subject(s)
Asian People , Genetics, Population , Asia, Southeastern , Asian People/genetics , China , Female , Humans , Whole Genome Sequencing
17.
J Clin Invest ; 131(14)2021 07 15.
Article in English | MEDLINE | ID: mdl-34138753

ABSTRACT

Glioblastoma multiforme (GBM), the most aggressive brain cancer, recurs because glioblastoma stem cells (GSCs) are resistant to all standard therapies. We showed that GSCs, but not normal astrocytes, are sensitive to lysis by healthy allogeneic natural killer (NK) cells in vitro. Mass cytometry and single-cell RNA sequencing of primary tumor samples revealed that GBM tumor-infiltrating NK cells acquired an altered phenotype associated with impaired lytic function relative to matched peripheral blood NK cells from patients with GBM or healthy donors. We attributed this immune evasion tactic to direct cell-to-cell contact between GSCs and NK cells via αv integrin-mediated TGF-ß activation. Treatment of GSC-engrafted mice with allogeneic NK cells in combination with inhibitors of integrin or TGF-ß signaling or with TGFBR2 gene-edited allogeneic NK cells prevented GSC-induced NK cell dysfunction and tumor growth. These findings reveal an important mechanism of NK cell immune evasion by GSCs and suggest the αv integrin/TGF-ß axis as a potentially useful therapeutic target in GBM.


Subject(s)
Glioblastoma/immunology , Integrins/immunology , Killer Cells, Natural/immunology , Neoplasm Proteins/immunology , Neoplastic Stem Cells/immunology , Transforming Growth Factor beta/immunology , Animals , Female , Glioblastoma/genetics , Glioblastoma/pathology , Glioblastoma/therapy , Heterografts , Humans , Integrins/genetics , Killer Cells, Natural/pathology , Male , Mice , Neoplasm Proteins/genetics , Neoplasm Transplantation , Neoplastic Stem Cells/pathology , Receptor, Transforming Growth Factor-beta Type II/genetics , Receptor, Transforming Growth Factor-beta Type II/immunology , Transforming Growth Factor beta/genetics
18.
Genome Biol ; 22(1): 70, 2021 02 23.
Article in English | MEDLINE | ID: mdl-33622385

ABSTRACT

We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT .


Subject(s)
Aneuploidy , Computational Biology/methods , Gene Dosage , RNA-Seq , Single-Cell Analysis , Software , Algorithms , Evolution, Molecular , Genetic Association Studies , Genetic Fitness , Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , RNA-Seq/methods , Single-Cell Analysis/methods
19.
Cytometry A ; 99(9): 899-909, 2021 09.
Article in English | MEDLINE | ID: mdl-33342071

ABSTRACT

Signal intensity measured in a mass cytometry (CyTOF) channel can often be affected by the neighboring channels due to technological limitations. Such signal artifacts are known as spillover effects and can substantially limit the accuracy of cell population clustering. Current approaches reduce these effects by using additional beads for normalization purposes known as single-stained controls. While effective in compensating for spillover effects, incorporating single-stained controls can be costly and require customized panel design. This is especially evident when executing large-scale immune profiling studies. We present a novel statistical method, named CytoSpill that independently quantifies and compensates the spillover effects in CyTOF data without requiring the use of single-stained controls. Our method utilizes knowledge-guided modeling and statistical techniques, such as finite mixture modeling and sequential quadratic programming, to achieve optimal error correction. We evaluated our method using five publicly available CyTOF datasets obtained from human peripheral blood mononuclear cells (PBMCs), C57BL/6J mouse bone marrow, healthy human bone marrow, chronic lymphocytic leukemia patient, and healthy human cord blood samples. In the PBMCs with known ground truth, our method achieved comparable results to experiments that incorporated single-stained controls. In datasets without ground-truth, our method not only reduced spillover on likely affected markers, but also led to the discovery of potentially novel subpopulations expressing functionally meaningful, cluster-specific markers. CytoSpill (developed in R) will greatly enhance the execution of large-scale cellular profiling of tumor immune microenvironment, development of novel immunotherapy, and the discovery of immune-specific biomarkers. The implementation of our method can be found at https://github.com/KChen-lab/CytoSpill.git.


Subject(s)
Leukocytes, Mononuclear , Animals , Biomarkers , Cluster Analysis , Flow Cytometry , Humans , Mice , Mice, Inbred C57BL
20.
Nat Comput Sci ; 1(5): 374-384, 2021 May.
Article in English | MEDLINE | ID: mdl-36969355

ABSTRACT

A key challenge in studying organisms and diseases is to detect rare molecular programs and rare cell populations (RCPs) that drive development, differentiation, and transformation. Molecular features such as genes and proteins defining RCPs are often unknown and difficult to detect from unenriched single-cell data, using conventional dimensionality reduction and clustering-based approaches. Here, we propose an unsupervised approach, SCMER (Single-Cell Manifold presERving feature selection), which selects a compact set of molecular features with definitive meanings that preserve the manifold of the data. We applied SCMER in the context of hematopoiesis, lymphogenesis, tumorigenesis, and drug resistance and response. We found that SCMER can identify non-redundant features that sensitively delineate both common cell lineages and rare cellular states. SCMER can be used for discovering molecular features in a high dimensional dataset, designing targeted, cost-effective assays for clinical applications, and facilitating multi-modality integration.

SELECTION OF CITATIONS
SEARCH DETAIL
...