Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Genome Res ; 33(10): 1757-1773, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37903634

RESUMO

Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Teorema de Bayes , Redes Neurais de Computação , Análise Espacial
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36513377

RESUMO

Single-cell analysis is a valuable approach for dissecting the cellular heterogeneity, and single-cell chromatin accessibility sequencing (scCAS) can profile the epigenetic landscapes for thousands of individual cells. It is challenging to analyze scCAS data, because of its high dimensionality and a higher degree of sparsity compared with scRNA-seq data. Topic modeling in single-cell data analysis can lead to robust identification of the cell types and it can provide insight into the regulatory mechanisms. Reference-guided approach may facilitate the analysis of scCAS data by utilizing the information in existing datasets. We present RefTM (Reference-guided Topic Modeling of single-cell chromatin accessibility data), which not only utilizes the information in existing bulk chromatin accessibility and annotated scCAS data, but also takes advantage of topic models for single-cell data analysis. RefTM simultaneously models: (1) the shared biological variation among reference data and the target scCAS data; (2) the unique biological variation in scCAS data; (3) other variations from known covariates in scCAS data.


Assuntos
Cromatina , Cromatina/genética
3.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38113078

RESUMO

Single-cell chromatin accessibility sequencing (scCAS) technologies have enabled characterizing the epigenomic heterogeneity of individual cells. However, the identification of features of scCAS data that are relevant to underlying biological processes remains a significant gap. Here, we introduce a novel method Cofea, to fill this gap. Through comprehensive experiments on 5 simulated and 54 real datasets, Cofea demonstrates its superiority in capturing cellular heterogeneity and facilitating downstream analysis. Applying this method to identification of cell type-specific peaks and candidate enhancers, as well as pathway enrichment analysis and partitioned heritability analysis, we illustrate the potential of Cofea to uncover functional biological process.


Assuntos
Cromatina , Sequências Reguladoras de Ácido Nucleico , Cromatina/genética
4.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38625746

RESUMO

MOTIVATION: With the rapid advancement of single-cell sequencing technology, it becomes gradually possible to delve into the cellular responses to various external perturbations at the gene expression level. However, obtaining perturbed samples in certain scenarios may be considerably challenging, and the substantial costs associated with sequencing also curtail the feasibility of large-scale experimentation. A repertoire of methodologies has been employed for forecasting perturbative responses in single-cell gene expression. However, existing methods primarily focus on the average response of a specific cell type to perturbation, overlooking the single-cell specificity of perturbation responses and a more comprehensive prediction of the entire perturbation response distribution. RESULTS: Here, we present scPRAM, a method for predicting perturbation responses in single-cell gene expression based on attention mechanisms. Leveraging variational autoencoders and optimal transport, scPRAM aligns cell states before and after perturbation, followed by accurate prediction of gene expression responses to perturbations for unseen cell types through attention mechanisms. Experiments on multiple real perturbation datasets involving drug treatments and bacterial infections demonstrate that scPRAM attains heightened accuracy in perturbation prediction across cell types, species, and individuals, surpassing existing methodologies. Furthermore, scPRAM demonstrates outstanding capability in identifying differentially expressed genes under perturbation, capturing heterogeneity in perturbation responses across species, and maintaining stability in the presence of data noise and sample size variations. AVAILABILITY AND IMPLEMENTATION: https://github.com/jiang-q19/scPRAM and https://doi.org/10.5281/zenodo.10935038.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Expressão Gênica
5.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38588573

RESUMO

SUMMARY: Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.


Assuntos
Cromatina , Análise de Célula Única , Software , Cromatina/metabolismo , Análise de Célula Única/métodos , Humanos , Genômica/métodos , Biologia Computacional/métodos
6.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36610708

RESUMO

SUMMARY: Recent innovations in single-cell chromatin accessibility sequencing (scCAS) have revolutionized the characterization of epigenomic heterogeneity. Estimation of the number of cell types is a crucial step for downstream analyses and biological implications. However, efforts to perform estimation specifically for scCAS data are limited. Here, we propose ASTER, an ensemble learning-based tool for accurately estimating the number of cell types in scCAS data. ASTER outperformed baseline methods in systematic evaluation on 27 datasets of various protocols, sizes, numbers of cell types, degrees of cell-type imbalance, cell states and qualities, providing valuable guidance for scCAS data analysis. AVAILABILITY AND IMPLEMENTATION: ASTER along with detailed documentation is freely accessible at https://aster.readthedocs.io/ under the MIT License. It can be seamlessly integrated into existing scCAS analysis workflows. The source code is available at https://github.com/biox-nku/aster. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Cromatina , Software , Epigenômica , Documentação , Fluxo de Trabalho
7.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37494428

RESUMO

MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS.


Assuntos
Cromatina , Regulação da Expressão Gênica , Simulação por Computador , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos
8.
Nucleic Acids Res ; 49(D1): D221-D228, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33045745

RESUMO

Gene regulatory elements, including promoters, enhancers, silencers, etc., control transcriptional programs in a spatiotemporal manner. Though these elements are known to be able to induce either positive or negative transcriptional control, the community has been mostly studying enhancers which amplify transcription initiation, with less emphasis given to silencers which repress gene expression. To facilitate the study of silencers and the investigation of their potential roles in transcriptional control, we developed SilencerDB (http://health.tsinghua.edu.cn/silencerdb/), a comprehensive database of silencers by manually curating silencers from 2300 published articles. The current version, SilencerDB 1.0, contains (1) 33 060 validated silencers from experimental methods, and (ii) 5 045 547 predicted silencers from state-of-the-art machine learning methods. The functionality of SilencerDB includes (a) standardized categorization of silencers in a tree-structured class hierarchy based on species, organ, tissue and cell line and (b) comprehensive annotations of silencers with the nearest gene and potential regulatory genes. SilencerDB, to the best of our knowledge, is the first comprehensive database at this scale dedicated to silencers, with reliable annotations and user-friendly interactive database features. We believe this database has the potential to enable advanced understanding of silencers in regulatory mechanisms and to empower researchers to devise diverse applications of silencers in disease development.


Assuntos
Bases de Dados de Ácidos Nucleicos , Aprendizado de Máquina , Elementos Silenciadores Transcricionais , Transcrição Gênica , Interface Usuário-Computador , Animais , Búfalos/genética , Linhagem Celular , Galinhas/genética , Drosophila melanogaster/genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Especificidade de Órgãos , Ratos , Sus scrofa/genética
9.
Nucleic Acids Res ; 49(W1): W483-W490, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33999180

RESUMO

Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.


Assuntos
Cromatina/metabolismo , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Internet , Sequências Reguladoras de Ácido Nucleico , Análise de Célula Única , Fatores de Transcrição/metabolismo
10.
Stress ; 24(5): 612-620, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34184955

RESUMO

Prenatal stress (PS) affects neurodevelopment and increases the risk for anxiety in adolescence in male offspring, but the mechanism is still unclear. N-Cadherin regulates the expression of AMPA receptors (AMPARs), which mediate anxiety by modulating network excitability in the prefrontal cortex (PFC). Our results revealed that in adolescent male, but not female, offspring rats, PS induced anxiety-like behavior, as assessed by the open field test (OFT). Furthermore, N-cadherin and AMPAR subunit GluA1 were colocalized in the PFC, and the expression of the N-cadherin and the GluA1 decreased following PS exposure in male offspring rats. We also found that the AMPAR agonist CX546 did not alleviate anxiety-like behavior in adolescent male offspring rats; however, it increased the expression of GluA1 in the PFC but did not alter the expression of N-cadherin. In conclusion, our study suggested that the N-cadherin-GluA1 pathway in the PFC mediates anxiety-like behavior in adolescent male offspring rats and that N-cadherin might be required for sex differences in the effect of PS on adolescent offspring.


Assuntos
Caderinas , Efeitos Tardios da Exposição Pré-Natal , Animais , Ansiedade , Caderinas/genética , Feminino , Masculino , Córtex Pré-Frontal , Gravidez , Ratos , Ratos Sprague-Dawley , Estresse Psicológico
11.
BMC Bioinformatics ; 21(Suppl 13): 392, 2020 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-32938367

RESUMO

BACKGROUND: In recent years, the rapid development of single-cell RNA-sequencing (scRNA-seq) techniques enables the quantitative characterization of cell types at a single-cell resolution. With the explosive growth of the number of cells profiled in individual scRNA-seq experiments, there is a demand for novel computational methods for classifying newly-generated scRNA-seq data onto annotated labels. Although several methods have recently been proposed for the cell-type classification of single-cell transcriptomic data, such limitations as inadequate accuracy, inferior robustness, and low stability greatly limit their wide applications. RESULTS: We propose a novel ensemble approach, named EnClaSC, for accurate and robust cell-type classification of single-cell transcriptomic data. Through comprehensive validation experiments, we demonstrate that EnClaSC can not only be applied to the self-projection within a specific dataset and the cell-type classification across different datasets, but also scale up well to various data dimensionality and different data sparsity. We further illustrate the ability of EnClaSC to effectively make cross-species classification, which may shed light on the studies in correlation of different species. EnClaSC is freely available at https://github.com/xy-chen16/EnClaSC . CONCLUSIONS: EnClaSC enables highly accurate and robust cell-type classification of single-cell transcriptomic data via an ensemble learning method. We expect to see wide applications of our method to not only transcriptome studies, but also the classification of more general data.


Assuntos
Análise de Célula Única/métodos , Transcriptoma/genética , Humanos , Projetos de Pesquisa
12.
Cell Mol Biol (Noisy-le-grand) ; 66(5): 15-19, 2020 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-33040806

RESUMO

The purpose of this study was to evaluate the co-prescription efficacy of esomeprazole and flupenthixol/melitracen relative to that of solitary esomeprazole on erosive gastritis complicated with negative feelings. 140 erosive gastritis patients complicated with negative feelings enrolled in the present study. Seventy cases in the control group took esomeprazole, and 70 cases in the observation group received esomeprazole plus flupenthixol/Melitracen, both for 4 weeks. We gastroscopically checked the clinical symptoms, mucosal erosion, PGE2 and MDA levels in gastric mucosa, anxiety, depression, and recurrence before and after treatment in the groups. After treatment, the observation group had lower scores of clinical symptoms, mucosal erosions, Hamilton Depression Rating Scale (HAMD), and Hamilton Depression Rating Scale (HAMA) than the control group (p<0.05); as well, the observation group showed higher PGE2 and lower MDA levels than the control group (p<0.05); during six months of follow-up (100% follow-up rate), 16 and 34 recurrent cases occurred, respectively, in the observation and control groups (p<0.05).  Co-prescription of esomeprazole and flupenthixol/melitracen improved the clinical symptoms and mucosal erosions, relieved negative feelings and reduced the recurrence rate. The efficacy of the co-prescription is higher than that of the solitary prescription.


Assuntos
Antracenos/uso terapêutico , Emoções/efeitos dos fármacos , Esomeprazol/efeitos adversos , Esomeprazol/uso terapêutico , Flupentixol/uso terapêutico , Gastrite/tratamento farmacológico , Idoso , Ansiedade/induzido quimicamente , Terapia Combinada/métodos , Depressão/induzido quimicamente , Feminino , Mucosa Gástrica/efeitos dos fármacos , Humanos , Masculino , Pessoa de Meia-Idade , Recidiva , Úlcera Gástrica/induzido quimicamente , Resultado do Tratamento
13.
BMC Bioinformatics ; 20(Suppl 7): 0, 2019 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-31074382

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) technologies have advanced rapidly in recent years and enabled the quantitative characterization at a microscopic resolution. With the exponential growth of the number of cells profiled in individual scRNA-seq experiments, the demand for identifying putative cell types from the data has become a great challenge that appeals for novel computational methods. Although a variety of algorithms have recently been proposed for single-cell clustering, such limitations as low accuracy, inferior robustness, and inadequate stability greatly impede the scope of applications of these methods. RESULTS: We propose a novel model-based algorithm, named VPAC, for accurate clustering of single-cell transcriptomic data through variational projection, which assumes that single-cell samples follow a Gaussian mixture distribution in a latent space. Through comprehensive validation experiments, we demonstrate that VPAC can not only be applied to datasets of discrete counts and normalized continuous data, but also scale up well to various data dimensionality, different dataset size and different data sparsity. We further illustrate the ability of VPAC to detect genes with strong unique signatures of a specific cell type, which may shed light on the studies in system biology. We have released a user-friendly python package of VPAC in Github ( https://github.com/ShengquanChen/VPAC ). Users can directly import our VPAC class and conduct clustering without tedious installation of dependency packages. CONCLUSIONS: VPAC enables highly accurate clustering of single-cell transcriptomic data via a statistical model. We expect to see wide applications of our method to not only transcriptome studies for fully understanding the cell identity and functionality, but also the clustering of more general data.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Linfócitos T/metabolismo , Transcriptoma , Análise por Conglomerados , Humanos , Análise de Sequência de RNA/métodos , Linfócitos T/citologia
14.
BMC Bioinformatics ; 18(Suppl 13): 478, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-29219068

RESUMO

BACKGROUND: With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable. RESULTS: To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database. CONCLUSIONS: DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.


Assuntos
Algoritmos , DNA/química , Elementos Facilitadores Genéticos , Modelos Genéticos , Redes Neurais de Computação , Biologia Computacional , DNA/genética , Bases de Dados Factuais , Genoma Humano , Genômica , Humanos , Aprendizado de Máquina
15.
Bioinform Adv ; 4(1): vbae055, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38645715

RESUMO

Summary: Chromatin accessibility serves as a critical measurement of physical contact between nuclear macromolecules and DNA sequence, providing valuable insights into the comprehensive landscape of regulatory mechanisms, thus we previously developed the OpenAnnotate web server. However, as an increasing number of epigenomic analysis software tools emerged, web-based annotation often faced limitations and inconveniences when integrated into these software pipelines. To address these issues, we here develop two software packages named OpenAnnotatePy and OpenAnnotateR. In addition to web-based functionalities, these packages encompass supplementary features, including the capability for simultaneous annotation across multiple cell types, advanced searching of systems, tissues and cell types, and converting the result to the data structure of mainstream tools. Moreover, we applied the packages to various scenarios, including cell type revealing, regulatory element prediction, and integration into mainstream single-cell ATAC-seq analysis pipelines including EpiScanpy, Signac, and ArchR. We anticipate that OpenAnnotateApi will significantly facilitate the deciphering of gene regulatory mechanisms, and offer crucial assistance in the field of epigenomic studies. Availability and implementation: OpenAnnotateApi for R is available at https://github.com/ZjGaothu/OpenAnnotateR and for Python is available at https://github.com/ZjGaothu/OpenAnnotatePy.

16.
Artigo em Inglês | MEDLINE | ID: mdl-38442065

RESUMO

Rapid advances in single-cell chromatin accessibility sequencing (scCAS) technologies have enabled the characterization of epigenomic heterogeneity and increased the demand for automatic annotation of cell types. However, there are few computational methods tailored for cell type annotation in scCAS data and the existing methods perform poorly for differentiating and imbalanced cell types. Here, we propose CASCADE, a novel annotation method based on simulation- and denoising-based strategies. With comprehensive experiments on a number of scCAS datasets, we showed that CASCADE can effectively distinguish the patterns of different cell types and mitigate the effect of high noise levels, and thus achieve significantly better annotation performance for differentiating and imbalanced cell types. Besides, we performed model ablation experiments to show the contribution of modules in CASCADE and conducted extensive experiments to demonstrate the robustness of CASCADE to batch effect, imbalance degree, data sparsity, and number of cell types. Moreover, CASCADE significantly outperformed baseline methods for accurately annotating the cell types in newly sequenced data. We anticipate that CASCADE will greatly assist with characterizing cell heterogeneity in scCAS data analysis.


Assuntos
Cromatina , Biologia Computacional , Análise de Célula Única , Cromatina/genética , Cromatina/metabolismo , Cromatina/química , Análise de Célula Única/métodos , Humanos , Biologia Computacional/métodos , Algoritmos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos
17.
Nat Comput Sci ; 4(5): 346-359, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38730185

RESUMO

Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models-especially variational autoencoders-have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE's capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively.


Assuntos
Cromatina , Análise de Célula Única , Análise de Célula Única/métodos , Cromatina/genética , Cromatina/metabolismo , Humanos , Epigenômica/métodos , Aprendizado Profundo , Algoritmos , Heterogeneidade Genética
18.
Nat Commun ; 15(1): 2973, 2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38582890

RESUMO

Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.

19.
Nat Commun ; 15(1): 1629, 2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388573

RESUMO

Single-cell chromatin accessibility sequencing (scCAS) has emerged as a valuable tool for interrogating and elucidating epigenomic heterogeneity and gene regulation. However, scCAS data inherently suffers from limitations such as high sparsity and dimensionality, which pose significant challenges for downstream analyses. Although several methods are proposed to enhance scCAS data, there are still challenges and limitations that hinder the effectiveness of these methods. Here, we propose scCASE, a scCAS data enhancement method based on non-negative matrix factorization which incorporates an iteratively updating cell-to-cell similarity matrix. Through comprehensive experiments on multiple datasets, we demonstrate the advantages of scCASE over existing methods for scCAS data enhancement. The interpretable cell type-specific peaks identified by scCASE can provide valuable biological insights into cell subpopulations. Moreover, to leverage the large compendia of available omics data as a reference, we further expand scCASE to scCASER, which enables the incorporation of external reference data to improve enhancement performance.


Assuntos
Algoritmos , Cromatina , Cromatina/genética , Epigenômica/métodos , Regulação da Expressão Gênica , Análise de Célula Única
20.
Genome Biol ; 24(1): 225, 2023 10 09.
Artigo em Inglês | MEDLINE | ID: mdl-37814314

RESUMO

Application of the widely used droplet-based microfluidic technologies in single-cell sequencing often yields doublets, introducing bias to downstream analyses. Especially, doublet-detection methods for single-cell chromatin accessibility sequencing (scCAS) data have multiple assay-specific challenges. Therefore, we propose scIBD, a self-supervised iterative-optimizing model for boosting heterotypic doublet detection in scCAS data. scIBD introduces an adaptive strategy to simulate high-confident heterotypic doublets and self-supervise for doublet-detection in an iteratively optimizing manner. Comprehensive benchmarking on various simulated and real datasets demonstrates the outperformance and robustness of scIBD. Moreover, the downstream biological analyses suggest the efficacy of doublet-removal by scIBD.


Assuntos
Cromatina , Análise de Célula Única , Análise de Célula Única/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA