Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
1.
PLoS Comput Biol ; 20(10): e1012403, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39356722

RESUMEN

A recent paper claimed that t-SNE and UMAP embeddings of single-cell datasets are "specious" and fail to capture true biological structure. The authors argued that such embeddings are as arbitrary and as misleading as forcing the data into an elephant shape. Here we show that this conclusion was based on inadequate and limited metrics of embedding quality. More appropriate metrics quantifying neighborhood and class preservation reveal the elephant in the room: while t-SNE and UMAP embeddings of single-cell data do not preserve high-dimensional distances, they can nevertheless provide biologically relevant information.


Asunto(s)
Biología Computacional , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional/métodos , Algoritmos , Humanos , Animales
2.
Stat Med ; 43(25): 4836-4849, 2024 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-39237124

RESUMEN

The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.


Asunto(s)
Algoritmos , Simulación por Computador , Modelos Estadísticos , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Análisis Factorial , Dinámicas no Lineales
3.
PLoS Comput Biol ; 20(8): e1011854, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39093856

RESUMEN

Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.


Asunto(s)
Cromatina , Análisis de la Célula Individual , Cromatina/genética , Cromatina/metabolismo , Cromatina/química , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Humanos , Biología Computacional/métodos , Enfermedad de Alzheimer/genética , Modelos Estadísticos , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Simulación por Computador , Animales , Análisis de Secuencia de ADN/métodos , Algoritmos
4.
PLoS Comput Biol ; 20(8): e1012339, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39116191

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool in genomics research, enabling the analysis of gene expression at the individual cell level. However, scRNA-seq data often suffer from a high rate of dropouts, where certain genes fail to be detected in specific cells due to technical limitations. This missing data can introduce biases and hinder downstream analysis. To overcome this challenge, the development of effective imputation methods has become crucial in the field of scRNA-seq data analysis. Here, we propose an imputation method based on robust and non-negative matrix factorization (scRNMF). Instead of other matrix factorization algorithms, scRNMF integrates two loss functions: L2 loss and C-loss. The L2 loss function is highly sensitive to outliers, which can introduce substantial errors. We utilize the C-loss function when dealing with zero values in the raw data. The primary advantage of the C-loss function is that it imposes a smaller punishment for larger errors, which results in more robust factorization when handling outliers. Various datasets of different sizes and zero rates are used to evaluate the performance of scRNMF against other state-of-the-art methods. Our method demonstrates its power and stability as a tool for imputation of scRNA-seq data.


Asunto(s)
Algoritmos , Biología Computacional , RNA-Seq , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , RNA-Seq/métodos , RNA-Seq/estadística & datos numéricos , Biología Computacional/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Programas Informáticos , Análisis de Expresión Génica de una Sola Célula
6.
J Math Biol ; 89(3): 34, 2024 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-39162836

RESUMEN

Tumor is a complex and aggressive type of disease that poses significant health challenges. Understanding the cellular mechanisms underlying its progression is crucial for developing effective treatments. In this study, we develop a novel mathematical framework to investigate the role of cellular plasticity and heterogeneity in tumor progression. By leveraging temporal single-cell data, we propose a reaction-convection-diffusion model that effectively captures the spatiotemporal dynamics of tumor cells and macrophages within the tumor microenvironment. Through theoretical analysis, we obtain the estimate of the pulse wave speed and analyze the stability of the homogeneous steady state solutions. Notably, we employe the AddModuleScore function to quantify cellular plasticity. One of the highlights of our approach is the introduction of pulse wave speed as a quantitative measure to precisely gauge the rate of cell phenotype transitions, as well as the novel implementation of the high-plasticity cell state/low-plasticity cell state ratio as an indicator of tumor malignancy. Furthermore, the bifurcation analysis reveals the complex dynamics of tumor cell populations. Our extensive analysis demonstrates that an increased rate of phenotype transition is associated with heightened malignancy, attributable to the tumor's ability to explore a wider phenotypic space. The study also investigates how the proliferation rate and the death rate of tumor cells, phenotypic convection velocity, and the midpoint of the phenotype transition stage affect the speed of tumor cell phenotype transitions and the progression to adenocarcinoma. These insights and quantitative measures can help guide the development of targeted therapeutic strategies to regulate cellular plasticity and control tumor progression effectively.


Asunto(s)
Plasticidad de la Célula , Conceptos Matemáticos , Modelos Biológicos , Neoplasias , Fenotipo , Análisis de la Célula Individual , Microambiente Tumoral , Humanos , Microambiente Tumoral/fisiología , Neoplasias/patología , Neoplasias/fisiopatología , Análisis de la Célula Individual/estadística & datos numéricos , Progresión de la Enfermedad , Proliferación Celular , Simulación por Computador
7.
PLoS Comput Biol ; 20(7): e1012241, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38985831

RESUMEN

Dimension reduction tools preserving similarity and graph structure such as t-SNE and UMAP can capture complex biological patterns in high-dimensional data. However, these tools typically are not designed to separate effects of interest from unwanted effects due to confounders. We introduce the partial embedding (PARE) framework, which enables removal of confounders from any distance-based dimension reduction method. We then develop partial t-SNE and partial UMAP and apply these methods to genomic and neuroimaging data. For lower-dimensional visualization, our results show that the PARE framework can remove batch effects in single-cell sequencing data as well as separate clinical and technical variability in neuroimaging measures. We demonstrate that the PARE framework extends dimension reduction methods to highlight biological patterns of interest while effectively removing confounding effects.


Asunto(s)
Algoritmos , Biología Computacional , Neuroimagen , Humanos , Neuroimagen/métodos , Biología Computacional/métodos , Genómica/métodos , Genómica/estadística & datos numéricos , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos
8.
J Bioinform Comput Biol ; 22(3): 2450015, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-39036845

RESUMEN

The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , RNA-Seq/métodos , Biología Computacional/métodos , Programas Informáticos , Humanos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Algoritmos , Aprendizaje Profundo , Análisis de Expresión Génica de una Sola Célula
9.
PLoS Comput Biol ; 20(7): e1011620, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38976751

RESUMEN

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.


Asunto(s)
Biología Computacional , Análisis de la Célula Individual , Biología Computacional/métodos , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Humanos , RNA-Seq/métodos , RNA-Seq/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Algoritmos , Redes Reguladoras de Genes/genética , Modelos Estadísticos , Programas Informáticos , Análisis de Expresión Génica de una Sola Célula
10.
PLoS Comput Biol ; 20(5): e1012014, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38809943

RESUMEN

Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. To address these challenges, we developed a novel analyses framework, Single-Cell Path Metrics Profiling (scPMP), using power-weighted path metrics, which measure distances between cells in a data-driven way. Unlike Euclidean distance and other commonly used distance metrics, path metrics are density sensitive and respect the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which preserves both the global data geometry and cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms current scRNAseq clustering algorithms on a wide range of benchmarking data sets.


Asunto(s)
Algoritmos , Biología Computacional , Análisis de la Célula Individual , Análisis por Conglomerados , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Humanos , Biología Computacional/métodos , RNA-Seq/métodos , RNA-Seq/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Análisis de Expresión Génica de una Sola Célula
11.
Comput Math Methods Med ; 2022: 6534126, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35317194

RESUMEN

Objectives: Myocardial infarction (MI) is a common cardiovascular disease. Histopathology is a main molecular characteristic of MI, but often, differences between various cell subsets have been neglected. Under this premise, MI-related molecular biomarkers were screened using single-cell sequencing. Methods: This work examined immune cell abundance in normal and MI samples from GSE109048 and determined differences in the activated mast cells and activated CD4 memory T cells, resting mast cells. Weighted gene coexpression network analysis (WGCNA) demonstrated that activated CD4 memory T cells were the most closely related to the turquoise module, and 10 hub genes were screened. Single-cell sequencing data (scRNA-seq) of MI were examined. We used t-distributed stochastic neighbor embedding (t-SNE) for cell clustering. Results: We obtained 8 cell subpopulations, each of which had different marker genes. 7 out of the 10 hub genes were detected by single-cell sequencing analysis. The expression quantity and proportion of the 7 genes were different in 8 cell clusters. Conclusion: In general, our study revealed the immune characteristics and determined 7 prognostic markers for MI at the single-cell level, providing a new understanding of the molecular characteristics and mechanism of MI.


Asunto(s)
Redes Reguladoras de Genes , Marcadores Genéticos , Infarto del Miocardio/genética , Infarto del Miocardio/inmunología , Análisis de la Célula Individual/métodos , Linfocitos T CD4-Positivos/inmunología , Quimiocinas/genética , Biología Computacional , Perfilación de la Expresión Génica , Ontología de Genes , Marcadores Genéticos/inmunología , Humanos , Memoria Inmunológica/genética , Mastocitos/inmunología , Pronóstico , RNA-Seq/métodos , RNA-Seq/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Procesos Estocásticos
13.
Clin Transl Med ; 12(2): e723, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35184398

RESUMEN

BACKGROUND: Early-stage lung adenocarcinoma that radiologically manifests as part-solid nodules, consisting of both ground-glass and solid components, has distinctive growth patterns and prognosis. The characteristics of the tumour microenvironment and transcriptional features of the malignant cells of different radiological phenotypes remain poorly understood. METHODS: Twelve treatment-naive patients with radiological part-solid nodules were enrolled. After frozen pathology was confirmed as lung adenocarcinoma, two regions (ground-glass and solid) from each of the 12 part-solid nodules and 5 normal lung tissues from 5 of the12 patients were subjected to single-cell sequencing by 10x Genomics. We used Seurat v3.1.5 for data integration and analysis. RESULTS: We comprehensively dissected the multicellular ecosystem of the ground-glass and solid components of part-solid nodules at the single-cell resolution. In tumours, these components had comparable proportions of malignant cells. However, the angiogenesis, epithelial-to-mesenchymal transition, KRAS, p53, and cell-cycle signalling pathways were significantly up-regulated in malignant cells within solid components compared to those within ground-glass components. For the tumour microenvironment, the relative abundance of myeloid and NK cells tended to be higher in solid components than in ground-glass components. Slight subtype composition differences existed between the ground-glass and solid components. The T/NK cell subsets' cytotoxic function and the macrophages' pro-inflammation function were suppressed in solid components. Moreover, pericytes in solid components had a stronger communication related to angiogenesis promotion with endothelial cells and tumour cells. CONCLUSION: The cellular landscape of ground-glass components is significantly different from that of normal tissue and similar to that of solid components. However, transcriptional differences exist in the vital signalling pathways of malignant and immune cells within these components.


Asunto(s)
Adenocarcinoma del Pulmón/radioterapia , Análisis de la Célula Individual/estadística & datos numéricos , Nódulo Pulmonar Solitario/genética , Adenocarcinoma del Pulmón/fisiopatología , Humanos , Análisis de la Célula Individual/métodos , Nódulo Pulmonar Solitario/radioterapia , Microambiente Tumoral/genética
14.
Clin Transl Med ; 12(2): e730, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35184420

RESUMEN

BACKGROUND: Deciphering intra- and inter-tumoural heterogeneity is essential for understanding the biology of gastric cancer (GC) and its metastasis and identifying effective therapeutic targets. However, the characteristics of different organ-tropism metastases of GC are largely unknown. METHODS: Ten fresh human tissue samples from six patients, including primary tumour and adjacent non-tumoural samples and six metastases from different organs or tissues (liver, peritoneum, ovary, lymph node) were evaluated using single-cell RNA sequencing. Validation experiments were performed using histological assays and bulk transcriptomic datasets. RESULTS: Malignant epithelial subclusters associated with invasion features, intraperitoneal metastasis propensity, epithelial-mesenchymal transition-induced tumour stem cell phenotypes, or dormancy-like characteristics were discovered. High expression of the first three subcluster-associated genes displayed worse overall survival than those with low expression in a GC cohort containing 407 samples. Immune and stromal cells exhibited cellular heterogeneity and created a pro-tumoural and immunosuppressive microenvironment. Furthermore, a 20-gene signature of lymph node-derived exhausted CD8+ T cells was acquired to forecast lymph node metastasis and validated in GC cohorts. Additionally, although anti-NKG2A (KLRC1) antibody have not been used to treat GC patients even in clinical trials, we uncovered not only malignant tumour cells but one endothelial subcluster, mucosal-associated invariant T cells, T cell-like B cells, plasmacytoid dendritic cells, macrophages, monocytes, and neutrophils may contribute to HLA-E-KLRC1/KLRC2 interaction with cytotoxic/exhausted CD8+ T cells and/or natural killer (NK) cells, suggesting novel clinical therapeutic opportunities in GC. Additionally, our findings suggested that PD-1 expression in CD8+ T cells might predict clinical responses to PD-1 blockade therapy in GC. CONCLUSIONS: This study provided insights into heterogeneous microenvironment of GC primary tumours and organ-specific metastases and provide support for precise diagnosis and treatment.


Asunto(s)
Heterogeneidad Genética , Metástasis de la Neoplasia/genética , Neoplasias Gástricas/genética , Humanos , Metástasis de la Neoplasia/fisiopatología , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Microambiente Tumoral/genética
15.
Clin Transl Med ; 12(1): e689, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35092700

RESUMEN

BACKGROUND: Immune cells play important roles in mediating immune response and host defense against invading pathogens. However, insights into the molecular mechanisms governing circulating immune cell diversity among multiple species are limited. METHODS: In this study, we compared the single-cell transcriptomes of immune cells from 12 species. Distinct molecular profiles were characterized for different immune cell types, including T cells, B cells, natural killer cells, monocytes, and dendritic cells. RESULTS: Our data revealed the heterogeneity and compositions of circulating immune cells among 12 different species. Additionally, we explored the conserved and divergent cellular crosstalks and genetic regulatory networks among vertebrate immune cells. Notably, the ligand and receptor pair VIM-CD44 was highly conserved among the immune cells. CONCLUSIONS: This study is the first to provide a comprehensive analysis of the cross-species single-cell transcriptome atlas for peripheral blood mononuclear cells (PBMCs). This research should advance our understanding of the cellular taxonomy and fundamental functions of PBMCs, with important implications in evolutionary biology, developmental biology, and immune system disorders.


Asunto(s)
Heterogeneidad Genética , Leucocitos Mononucleares/citología , Análisis de la Célula Individual/estadística & datos numéricos , Animales , Gatos , Columbidae/genética , Ciervos/genética , Cabras/genética , Haplorrinos/genética , Humanos , Mesocricetus/genética , Ratones/genética , Conejos , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Análisis de la Célula Individual/instrumentación , Análisis de la Célula Individual/métodos , Especificidad de la Especie , Tigres/genética , Lobos/genética , Pez Cebra/genética
16.
J Comput Biol ; 29(1): 23-26, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35020490

RESUMEN

scDesign2 is a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. This article shows how to download and install the scDesign2 R package, how to fit probabilistic models (one per cell type) to real data and simulate synthetic data from the fitted models, and how to use scDesign2 to guide experimental design and benchmark computational methods. Finally, a note is given about cell clustering as a preprocessing step before model fitting and data simulation.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Programas Informáticos , Algoritmos , Animales , Análisis por Conglomerados , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Expresión Génica , Ratones , Modelos Estadísticos , RNA-Seq/estadística & datos numéricos
17.
J Comput Biol ; 29(1): 19-22, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34985990

RESUMEN

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.


Asunto(s)
Algoritmos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Genómica/estadística & datos numéricos , Heurística , Humanos , Redes Neurales de la Computación , Análisis de Secuencia/estadística & datos numéricos , Programas Informáticos , Aprendizaje Automático no Supervisado
18.
J Comput Biol ; 29(1): 27-44, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35050715

RESUMEN

We propose GRNUlar, a novel deep learning framework for supervised learning of gene regulatory networks (GRNs) from single-cell RNA-Sequencing (scRNA-Seq) data. Our framework incorporates two intertwined models. First, we leverage the expressive ability of neural networks to capture complex dependencies between transcription factors and the corresponding genes they regulate, by developing a multitask learning framework. Second, to capture sparsity of GRNs observed in the real world, we design an unrolled algorithm technique for our framework. Our deep architecture requires supervision for training, for which we repurpose existing synthetic data simulators that generate scRNA-Seq data guided by an underlying GRN. Experimental results demonstrate that GRNUlar outperforms state-of-the-art methods on both synthetic and real data sets. Our study also demonstrates the novel and successful use of expression data simulators for supervised learning of GRN inference.


Asunto(s)
Aprendizaje Profundo , Redes Reguladoras de Genes , Análisis de la Célula Individual/estadística & datos numéricos , Algoritmos , Animales , Sesgo , Biología Computacional , Simulación por Computador , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Escherichia coli/genética , Humanos , Ratones , Redes Neurales de la Computación , RNA-Seq/estadística & datos numéricos , Saccharomyces cerevisiae/genética , Aprendizaje Automático Supervisado
19.
J Comput Biol ; 29(1): 3-18, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35050714

RESUMEN

Recent advances in sequencing technologies have allowed us to capture various aspects of the genome at single-cell resolution. However, with the exception of a few of co-assaying technologies, it is not possible to simultaneously apply different sequencing assays on the same single cell. In this scenario, computational integration of multi-omic measurements is crucial to enable joint analyses. This integration task is particularly challenging due to the lack of sample-wise or feature-wise correspondences. We present single-cell alignment with optimal transport (SCOT), an unsupervised algorithm that uses the Gromov-Wasserstein optimal transport to align single-cell multi-omics data sets. SCOT performs on par with the current state-of-the-art unsupervised alignment methods, is faster, and requires tuning of fewer hyperparameters. More importantly, SCOT uses a self-tuning heuristic to guide hyperparameter selection based on the Gromov-Wasserstein distance. Thus, in the fully unsupervised setting, SCOT aligns single-cell data sets better than the existing methods without requiring any orthogonal correspondence information.


Asunto(s)
Algoritmos , Genómica/estadística & datos numéricos , Alineación de Secuencia/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Humanos , Modelos Estadísticos , Aprendizaje Automático no Supervisado
20.
Clin Transl Med ; 12(1): e700, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35051311

RESUMEN

BACKGROUND: Neurotropic virus infection can cause serious damage to the central nervous system (CNS) in both humans and animals. The complexity of the CNS poses unique challenges to investigate the infection of these viruses in the brain using traditional techniques. METHODS: In this study, we explore the use of fluorescence micro-optical sectioning tomography (fMOST) and single-cell RNA sequencing (scRNA-seq) to map the spatial and cellular distribution of a representative neurotropic virus, rabies virus (RABV), in the whole brain. Mice were inoculated with a lethal dose of a recombinant RABV encoding enhanced green fluorescent protein (EGFP) under different infection routes, and a three-dimensional (3D) view of RABV distribution in the whole mouse brain was obtained using fMOST. Meanwhile, we pinpointed the cellular distribution of RABV by utilizing scRNA-seq. RESULTS: Our fMOST data provided the 3D view of a neurotropic virus in the whole mouse brain, which indicated that the spatial distribution of RABV in the brain was influenced by the infection route. Interestingly, we provided evidence that RABV could infect multiple nuclei related to fear independent of different infection routes. More surprisingly, our scRNA-seq data revealed that besides neurons RABV could infect macrophages and the infiltrating macrophages played at least three different antiviral roles during RABV infection. CONCLUSION: This study draws a comprehensively spatial and cellular map of typical neurotropic virus infection in the mouse brain, providing a novel and insightful strategy to investigate the pathogenesis of RABV and other neurotropic viruses.


Asunto(s)
Encéfalo/citología , Virus de la Rabia/patogenicidad , Rabia/complicaciones , Animales , Encéfalo/anomalías , Modelos Animales de Enfermedad , Ratones , Rabia/fisiopatología , Virus de la Rabia/metabolismo , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Tomografía Óptica/métodos , Tomografía Óptica/estadística & datos numéricos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...