Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
J Biomed Inform ; 154: 104648, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38692464

RESUMEN

BACKGROUND: Advances in artificial intelligence (AI) have realized the potential of revolutionizing healthcare, such as predicting disease progression via longitudinal inspection of Electronic Health Records (EHRs) and lab tests from patients admitted to Intensive Care Units (ICU). Although substantial literature exists addressing broad subjects, including the prediction of mortality, length-of-stay, and readmission, studies focusing on forecasting Acute Kidney Injury (AKI), specifically dialysis anticipation like Continuous Renal Replacement Therapy (CRRT) are scarce. The technicality of how to implement AI remains elusive. OBJECTIVE: This study aims to elucidate the important factors and methods that are required to develop effective predictive models of AKI and CRRT for patients admitted to ICU, using EHRs in the Medical Information Mart for Intensive Care (MIMIC) database. METHODS: We conducted a comprehensive comparative analysis of established predictive models, considering both time-series measurements and clinical notes from MIMIC-IV databases. Subsequently, we proposed a novel multi-modal model which integrates embeddings of top-performing unimodal models, including Long Short-Term Memory (LSTM) and BioMedBERT, and leverages both unstructured clinical notes and structured time series measurements derived from EHRs to enable the early prediction of AKI and CRRT. RESULTS: Our multimodal model achieved a lead time of at least 12 h ahead of clinical manifestation, with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.888 for AKI and 0.997 for CRRT, as well as an Area Under the Precision Recall Curve (AUPRC) of 0.727 for AKI and 0.840 for CRRT, respectively, which significantly outperformed the baseline models. Additionally, we performed a SHapley Additive exPlanation (SHAP) analysis using the expected gradients algorithm, which highlighted important, previously underappreciated predictive features for AKI and CRRT. CONCLUSION: Our study revealed the importance and the technicality of applying longitudinal, multimodal modeling to improve early prediction of AKI and CRRT, offering insights for timely interventions. The performance and interpretability of our model indicate its potential for further assessment towards clinical applications, to ultimately optimize AKI management and enhance patient outcomes.

2.
medRxiv ; 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38559064

RESUMEN

Background: Advances in artificial intelligence (AI) have realized the potential of revolutionizing healthcare, such as predicting disease progression via longitudinal inspection of Electronic Health Records (EHRs) and lab tests from patients admitted to Intensive Care Units (ICU). Although substantial literature exists addressing broad subjects, including the prediction of mortality, length-of-stay, and readmission, studies focusing on forecasting Acute Kidney Injury (AKI), specifically dialysis anticipation like Continuous Renal Replacement Therapy (CRRT) are scarce. The technicality of how to implement AI remains elusive. Objective: This study aims to elucidate the important factors and methods that are required to develop effective predictive models of AKI and CRRT for patients admitted to ICU, using EHRs in the Medical Information Mart for Intensive Care (MIMIC) database. Methods: We conducted a comprehensive comparative analysis of established predictive models, considering both time-series measurements and clinical notes from MIMIC-IV databases. Subsequently, we proposed a novel multi-modal model which integrates embeddings of top-performing unimodal models, including Long Short-Term Memory (LSTM) and BioMedBERT, and leverages both unstructured clinical notes and structured time series measurements derived from EHRs to enable the early prediction of AKI and CRRT. Results: Our multimodal model achieved a lead time of at least 12 hours ahead of clinical manifestation, with an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.888 for AKI and 0.997 for CRRT, as well as an Area Under the Precision Recall Curve (AUPRC) of 0.727 for AKI and 0.840 for CRRT, respectively, which significantly outperformed the baseline models. Additionally, we performed a SHapley Additive exPlanation (SHAP) analysis using the expected gradients algorithm, which highlighted important, previously underappreciated predictive features for AKI and CRRT. Conclusion: Our study revealed the importance and the technicality of applying longitudinal, multimodal modeling to improve early prediction of AKI and CRRT, offering insights for timely interventions. The performance and interpretability of our model indicate its potential for further assessment towards clinical applications, to ultimately optimize AKI management and enhance patient outcomes.

3.
Commun Biol ; 7(1): 326, 2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38486077

RESUMEN

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Label-Aware Distance (LAD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate LAD on simulated data as well as apply it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). LAD provides better cell embedding than state-of-the-art batch correction methods on longitudinal datasets. It can be used in distance-based clustering and visualization methods to combine the power of multiple samples to help make biological findings.


Asunto(s)
Análisis por Conglomerados , Animales , Ratones , Expresión Génica
4.
Nat Med ; 30(3): 772-784, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38238616

RESUMEN

There is a pressing need for allogeneic chimeric antigen receptor (CAR)-immune cell therapies that are safe, effective and affordable. We conducted a phase 1/2 trial of cord blood-derived natural killer (NK) cells expressing anti-CD19 chimeric antigen receptor and interleukin-15 (CAR19/IL-15) in 37 patients with CD19+ B cell malignancies. The primary objectives were safety and efficacy, defined as day 30 overall response (OR). Secondary objectives included day 100 response, progression-free survival, overall survival and CAR19/IL-15 NK cell persistence. No notable toxicities such as cytokine release syndrome, neurotoxicity or graft-versus-host disease were observed. The day 30 and day 100 OR rates were 48.6% for both. The 1-year overall survival and progression-free survival were 68% and 32%, respectively. Patients who achieved OR had higher levels and longer persistence of CAR-NK cells. Receiving CAR-NK cells from a cord blood unit (CBU) with nucleated red blood cells ≤ 8 × 107 and a collection-to-cryopreservation time ≤ 24 h was the most significant predictor for superior outcome. NK cells from these optimal CBUs were highly functional and enriched in effector-related genes. In contrast, NK cells from suboptimal CBUs had upregulation of inflammation, hypoxia and cellular stress programs. Finally, using multiple mouse models, we confirmed the superior antitumor activity of CAR/IL-15 NK cells from optimal CBUs in vivo. These findings uncover new features of CAR-NK cell biology and underscore the importance of donor selection for allogeneic cell therapies. ClinicalTrials.gov identifier: NCT03056339 .


Asunto(s)
Trasplante de Células Madre Hematopoyéticas , Neoplasias , Receptores Quiméricos de Antígenos , Animales , Ratones , Humanos , Receptores Quiméricos de Antígenos/genética , Interleucina-15 , Células Asesinas Naturales , Inmunoterapia Adoptiva/efectos adversos , Antígenos CD19 , Proteínas Adaptadoras Transductoras de Señales
6.
Res Sq ; 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37547002

RESUMEN

Clustering and visualization are essential parts of single-cell gene expression data analysis. The Euclidean distance used in most distance-based methods is not optimal. The batch effect, i.e., the variability among samples gathered from different times, tissues, and patients, introduces large between-group distance and obscures the true identities of cells. To solve this problem, we introduce Batch-Corrected Distance (BCD), a metric using temporal/spatial locality of the batch effect to control for such factors. We validate BCD on simulated data as well as applied it to a mouse retina development dataset and a lung dataset. We also found the utility of our approach in understanding the progression of the Coronavirus Disease 2019 (COVID-19). BCD achieves more accurate clusters and better visualizations than state-of-the-art batch correction methods on longitudinal datasets. BCD can be directly integrated with most clustering and visualization methods to enable more scientific findings.

7.
Nat Biotechnol ; 2023 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-37592035

RESUMEN

Single-cell omics technologies enable molecular characterization of diverse cell types and states, but how the resulting transcriptional and epigenetic profiles depend on the cell's genetic background remains understudied. We describe Monopogen, a computational tool to detect single-nucleotide variants (SNVs) from single-cell sequencing data. Monopogen leverages linkage disequilibrium from external reference panels to identify germline SNVs and detects putative somatic SNVs using allele cosegregating patterns at the cell population level. It can identify 100 K to 3 M germline SNVs achieving a genotyping accuracy of 95%, together with hundreds of putative somatic SNVs. Monopogen-derived genotypes enable global and local ancestry inference and identification of admixed samples. It identifies variants associated with cardiomyocyte metabolic levels and epigenomic programs. It also improves putative somatic SNV detection that enables clonal lineage tracing in primary human clonal hematopoiesis. Monopogen brings together population genetics, cell lineage tracing and single-cell omics to uncover genetic determinants of cellular processes.

8.
Sci Adv ; 9(30): eadd6997, 2023 07 28.
Artículo en Inglés | MEDLINE | ID: mdl-37494448

RESUMEN

Chimeric antigen receptor (CAR) engineering of natural killer (NK) cells is promising, with early-phase clinical studies showing encouraging responses. However, the transcriptional signatures that control the fate of CAR-NK cells after infusion and factors that influence tumor control remain poorly understood. We performed single-cell RNA sequencing and mass cytometry to study the heterogeneity of CAR-NK cells and their in vivo evolution after adoptive transfer, from the phase of tumor control to relapse. Using a preclinical model of noncurative lymphoma and samples from a responder and a nonresponder patient treated with CAR19/IL-15 NK cells, we observed the emergence of NK cell clusters with distinct patterns of activation, function, and metabolic signature associated with different phases of in vivo evolution and tumor control. Interaction with the highly metabolically active tumor resulted in loss of metabolic fitness in NK cells that could be partly overcome by incorporation of IL-15 in the CAR construct.


Asunto(s)
Receptores Quiméricos de Antígenos , Humanos , Receptores Quiméricos de Antígenos/genética , Receptores Quiméricos de Antígenos/metabolismo , Interleucina-15/genética , Interleucina-15/metabolismo , Citocinas/metabolismo , Línea Celular Tumoral , Células Asesinas Naturales , Tratamiento Basado en Trasplante de Células y Tejidos
9.
bioRxiv ; 2023 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-38187699

RESUMEN

Key to understanding many biological phenomena is knowing the temporal ordering of cellular events, which often require continuous direct observations [1, 2]. An alternative solution involves the utilization of irreversible genetic changes, such as naturally occurring mutations, to create indelible markers that enables retrospective temporal ordering [3-8]. Using NSC-seq, a newly designed and validated multi-purpose single-cell CRISPR platform, we developed a molecular clock approach to record the timing of cellular events and clonality in vivo , while incorporating assigned cell state and lineage information. Using this approach, we uncovered precise timing of tissue-specific cell expansion during murine embryonic development and identified new intestinal epithelial progenitor states by their unique genetic histories. NSC-seq analysis of murine adenomas and single-cell multi-omic profiling of human precancers as part of the Human Tumor Atlas Network (HTAN), including 116 scRNA-seq datasets and clonal analysis of 418 human polyps, demonstrated the occurrence of polyancestral initiation in 15-30% of colonic precancers, revealing their origins from multiple normal founders. Thus, our multimodal framework augments existing single-cell analyses and lays the foundation for in vivo multimodal recording, enabling the tracking of lineage and temporal events during development and tumorigenesis.

10.
Life Sci Alliance ; 5(12)2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36241426

RESUMEN

The FcγRII (CD32) ligands are IgFc fragments and pentraxins. The existence of additional ligands is unknown. We engineered T cells with human chimeric receptors resulting from the fusion between CD32 extracellular portion and transmembrane CD8α linked to CD28/ζ chain intracellular moiety (CD32-CR). Transduced T cells recognized three breast cancer (BC) and one colon cancer cell line among 15 tested in the absence of targeting antibodies. Sensitive BC cell conjugation with CD32-CR T cells induced CD32 polarization and down-regulation, CD107a release, mutual elimination, and proinflammatory cytokine production unaffected by human IgGs but enhanced by cetuximab. CD32-CR T cells protected immunodeficient mice from subcutaneous growth of MDA-MB-468 BC cells. RNAseq analysis identified a 42 gene fingerprint predicting BC cell sensitivity and favorable outcomes in advanced BC. ICAM1 was a major regulator of CD32-CR T cell-mediated cytotoxicity. CD32-CR T cells may help identify cell surface CD32 ligand(s) and novel prognostically relevant transcriptomic signatures and develop innovative BC treatments.


Asunto(s)
Neoplasias de la Mama , Linfocitos T , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/terapia , Antígenos CD28/metabolismo , Cetuximab/metabolismo , Femenino , Humanos , Ligandos , Ratones
11.
Genes Dev ; 2022 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-36008138

RESUMEN

Stem cells are fundamental units of tissue remodeling whose functions are dictated by lineage-specific transcription factors. Home to epidermal stem cells and their upward-stratifying progenies, skin relies on its secretory functions to form the outermost protective barrier, of which a transcriptional orchestrator has been elusive. KLF5 is a Krüppel-like transcription factor broadly involved in development and regeneration whose lineage specificity, if any, remains unclear. Here we report KLF5 specifically marks the epidermis, and its deletion leads to skin barrier dysfunction in vivo. Lipid envelopes and secretory lamellar bodies are defective in KLF5-deficient skin, accompanied by preferential loss of complex sphingolipids. KLF5 binds to and transcriptionally regulates genes encoding rate-limiting sphingolipid metabolism enzymes. Remarkably, skin barrier defects elicited by KLF5 ablation can be rescued by dietary interventions. Finally, we found that KLF5 is widely suppressed in human diseases with disrupted epidermal secretion, and its regulation of sphingolipid metabolism is conserved in human skin. Altogether, we established KLF5 as a disease-relevant transcription factor governing sphingolipid metabolism and barrier function in the skin, likely representing a long-sought secretory lineage-defining factor across tissue types.

12.
Genome Biol ; 23(1): 112, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-35534898

RESUMEN

Integration of single-cell multiomics profiles generated by different single-cell technologies from the same biological sample is still challenging. Previous approaches based on shared features have only provided approximate solutions. Here, we present a novel mathematical solution named bi-order canonical correlation analysis (bi-CCA), which extends the widely used CCA approach to iteratively align the rows and the columns between data matrices. Bi-CCA is generally applicable to combinations of any two single-cell modalities. Validations using co-assayed ground truth data and application to a CAR-NK study and a fetal muscle atlas demonstrate its capability in generating accurate multimodal co-embeddings and discovering cellular identity.

13.
Nat Commun ; 13(1): 474, 2022 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-35078987

RESUMEN

The specificity of CRISPR/Cas9 genome editing is largely determined by the sequences of guide RNA (gRNA) and the targeted DNA, yet the sequence-dependent rules underlying off-target effects are not fully understood. To systematically explore the sequence determinants governing CRISPR/Cas9 specificity, here we describe a dual-target system to measure the relative cleavage rate between off- and on-target sequences (off-on ratios) of 1902 gRNAs on 13,314 synthetic target sequences, and reveal a set of sequence rules involving 2 factors in off-targeting: 1) a guide-intrinsic mismatch tolerance (GMT) independent of the mismatch context; 2) an "epistasis-like" combinatorial effect of multiple mismatches, which are associated with the free-energy landscape in R-loop formation and are explainable by a multi-state kinetic model. These sequence rules lead to the development of MOFF, a model-based predictor of Cas9-mediated off-target effects. Moreover, the "epistasis-like" combinatorial effect suggests a strategy of allele-specific genome editing using mismatched guides. With the aid of MOFF prediction, this strategy significantly improves the selectivity and expands the application domain of Cas9-based allele-specific editing, as tested in a high-throughput allele-editing screen on 18 cancer hotspot mutations.


Asunto(s)
Secuencia de Bases/genética , Sistemas CRISPR-Cas , Edición Génica/métodos , Mutación , Neoplasias/terapia , ARN Guía de Kinetoplastida/química , Línea Celular , Humanos , Neoplasias/genética , Neoplasias/patología , ARN Guía de Kinetoplastida/genética
14.
BMC Bioinformatics ; 23(1): 2, 2022 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-34983369

RESUMEN

Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .


Asunto(s)
Neoplasias , Programas Informáticos , Distribución Binomial , Humanos , Neoplasias/genética , Proyectos de Investigación , Tamaño de la Muestra
15.
Cell Rep ; 36(3): 109432, 2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-34270918

RESUMEN

Adoptive cell therapy with virus-specific T cells has been used successfully to treat life-threatening viral infections, supporting application of this approach to coronavirus disease 2019 (COVID-19). We expand severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) T cells from the peripheral blood of COVID-19-recovered donors and non-exposed controls using different culture conditions. We observe that the choice of cytokines modulates the expansion, phenotype, and hierarchy of antigenic recognition by SARS-CoV-2 T cells. Culture with interleukin (IL)-2/4/7, but not under other cytokine-driven conditions, results in more than 1,000-fold expansion in SARS-CoV-2 T cells with a retained phenotype, function, and hierarchy of antigenic recognition compared with baseline (pre-expansion) samples. Expanded cytotoxic T lymphocytes (CTLs) are directed against structural SARS-CoV-2 proteins, including the receptor-binding domain of Spike. SARS-CoV-2 T cells cannot be expanded efficiently from the peripheral blood of non-exposed controls. Because corticosteroids are used for management of severe COVID-19, we propose an efficient strategy to inactivate the glucocorticoid receptor gene (NR3C1) in SARS-CoV-2 CTLs using CRISPR-Cas9 gene editing.

16.
Mol Biol Evol ; 38(10): 4463-4474, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34152401

RESUMEN

The Peranakan Chinese are culturally unique descendants of immigrants from China who settled in the Malay Archipelago ∼300-500 years ago. Today, among large communities in Southeast Asia, the Peranakans have preserved Chinese traditions with strong influence from the local indigenous Malays. Yet, whether or to what extent genetic admixture co-occurred with the cultural mixture has been a topic of ongoing debate. We performed whole-genome sequencing (WGS) on 177 Singapore (SG) Peranakans and analyzed the data jointly with WGS data of Asian and European populations. We estimated that Peranakan Chinese inherited ∼5.62% (95% confidence interval [CI]: 4.76-6.49%) Malay ancestry, much higher than that in SG Chinese (1.08%, 0.65-1.51%), southern Chinese (0.86%, 0.50-1.23%), and northern Chinese (0.25%, 0.18-0.32%). A sex-biased admixture history, in which the Malay ancestry was contributed primarily by females, was supported by X chromosomal variants, and mitochondrial (MT) and Y haplogroups. Finally, we identified an ancient admixture event shared by Peranakan Chinese and SG Chinese ∼1,612 (95% CI: 1,345-1,923) years ago, coinciding with the settlement history of Han Chinese in southern China, apart from the recent admixture event with Malays unique to Peranakan Chinese ∼190 (159-213) years ago. These findings greatly advance our understanding of the dispersal history of Chinese and their interaction with indigenous populations in Southeast Asia.


Asunto(s)
Pueblo Asiatico , Genética de Población , Asia Sudoriental , Pueblo Asiatico/genética , China , Femenino , Humanos , Secuenciación Completa del Genoma
17.
J Clin Invest ; 131(14)2021 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-34138753

RESUMEN

Glioblastoma multiforme (GBM), the most aggressive brain cancer, recurs because glioblastoma stem cells (GSCs) are resistant to all standard therapies. We showed that GSCs, but not normal astrocytes, are sensitive to lysis by healthy allogeneic natural killer (NK) cells in vitro. Mass cytometry and single-cell RNA sequencing of primary tumor samples revealed that GBM tumor-infiltrating NK cells acquired an altered phenotype associated with impaired lytic function relative to matched peripheral blood NK cells from patients with GBM or healthy donors. We attributed this immune evasion tactic to direct cell-to-cell contact between GSCs and NK cells via αv integrin-mediated TGF-ß activation. Treatment of GSC-engrafted mice with allogeneic NK cells in combination with inhibitors of integrin or TGF-ß signaling or with TGFBR2 gene-edited allogeneic NK cells prevented GSC-induced NK cell dysfunction and tumor growth. These findings reveal an important mechanism of NK cell immune evasion by GSCs and suggest the αv integrin/TGF-ß axis as a potentially useful therapeutic target in GBM.


Asunto(s)
Glioblastoma/inmunología , Integrinas/inmunología , Células Asesinas Naturales/inmunología , Proteínas de Neoplasias/inmunología , Células Madre Neoplásicas/inmunología , Factor de Crecimiento Transformador beta/inmunología , Animales , Femenino , Glioblastoma/genética , Glioblastoma/patología , Glioblastoma/terapia , Xenoinjertos , Humanos , Integrinas/genética , Células Asesinas Naturales/patología , Masculino , Ratones , Proteínas de Neoplasias/genética , Trasplante de Neoplasias , Células Madre Neoplásicas/patología , Receptor Tipo II de Factor de Crecimiento Transformador beta/genética , Receptor Tipo II de Factor de Crecimiento Transformador beta/inmunología , Factor de Crecimiento Transformador beta/genética
18.
Genome Biol ; 22(1): 70, 2021 02 23.
Artículo en Inglés | MEDLINE | ID: mdl-33622385

RESUMEN

We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT .


Asunto(s)
Aneuploidia , Biología Computacional/métodos , Dosificación de Gen , RNA-Seq , Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Evolución Molecular , Estudios de Asociación Genética , Aptitud Genética , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos
19.
Nat Comput Sci ; 1(5): 374-384, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-36969355

RESUMEN

A key challenge in studying organisms and diseases is to detect rare molecular programs and rare cell populations (RCPs) that drive development, differentiation, and transformation. Molecular features such as genes and proteins defining RCPs are often unknown and difficult to detect from unenriched single-cell data, using conventional dimensionality reduction and clustering-based approaches. Here, we propose an unsupervised approach, SCMER (Single-Cell Manifold presERving feature selection), which selects a compact set of molecular features with definitive meanings that preserve the manifold of the data. We applied SCMER in the context of hematopoiesis, lymphogenesis, tumorigenesis, and drug resistance and response. We found that SCMER can identify non-redundant features that sensitively delineate both common cell lineages and rare cellular states. SCMER can be used for discovering molecular features in a high dimensional dataset, designing targeted, cost-effective assays for clinical applications, and facilitating multi-modality integration.

20.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32591784

RESUMEN

Whole-exome sequencing (WES) has been widely used to study the role of protein-coding variants in genetic diseases. Non-coding regions, typically covered by sparse off-target data, are often discarded by conventional WES analyses. Here, we develop a genotype calling pipeline named WEScall to analyse both target and off-target data. We leverage linkage disequilibrium shared within study samples and from an external reference panel to improve genotyping accuracy. In an application to WES of 2527 Chinese and Malays, WEScall can reduce the genotype discordance rate from 0.26% (SE= 6.4 × 10-6) to 0.08% (SE = 3.6 × 10-6) across 1.1 million single nucleotide polymorphisms (SNPs) in the deeply sequenced target regions. Furthermore, we obtain genotypes at 0.70% (SE = 3.0 × 10-6) discordance rate across 5.2 million off-target SNPs, which had ~1.2× mean sequencing depth. Using this dataset, we perform genome-wide association studies of 10 metabolic traits. Despite of our small sample size, we identify 10 loci at genome-wide significance (P < 5 × 10-8), including eight well-established loci. The two novel loci, both associated with glycated haemoglobin levels, are GPATCH8-SLC4A1 (rs369762319, P = 2.56 × 10-12) and ROR2 (rs1201042, P = 3.24 × 10-8). Finally, using summary statistics from UK Biobank and Biobank Japan, we show that polygenic risk prediction can be significantly improved for six out of nine traits by incorporating off-target data (P < 0.01). These results demonstrate WEScall as a useful tool to facilitate WES studies with decent amounts of off-target data.


Asunto(s)
Secuenciación del Exoma/métodos , Predisposición Genética a la Enfermedad , Genotipo , Proteína 1 de Intercambio de Anión de Eritrocito/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Desequilibrio de Ligamiento , Proteínas Musculares/genética , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...