Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38628114

RESUMEN

Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.


Asunto(s)
Benchmarking , Perfilación de la Expresión Génica , Análisis por Conglomerados , Difusión , Cadenas de Markov , Análisis de Secuencia de ARN , Transcriptoma
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38349062

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool to gain biological insights at the cellular level. However, due to technical limitations of the existing sequencing technologies, low gene expression values are often omitted, leading to inaccurate gene counts. Existing methods, including advanced deep learning techniques, struggle to reliably impute gene expressions due to a lack of mechanisms that explicitly consider the underlying biological knowledge of the system. In reality, it has long been recognized that gene-gene interactions may serve as reflective indicators of underlying biology processes, presenting discriminative signatures of the cells. A genomic data analysis framework that is capable of leveraging the underlying gene-gene interactions is thus highly desirable and could allow for more reliable identification of distinctive patterns of the genomic data through extraction and integration of intricate biological characteristics of the genomic data. Here we tackle the problem in two steps to exploit the gene-gene interactions of the system. We first reposition the genes into a 2D grid such that their spatial configuration reflects their interactive relationships. To alleviate the need for labeled ground truth gene expression datasets, a self-supervised 2D convolutional neural network is employed to extract the contextual features of the interactions from the spatially configured genes and impute the omitted values. Extensive experiments with both simulated and experimental scRNA-seq datasets are carried out to demonstrate the superior performance of the proposed strategy against the existing imputation methods.


Asunto(s)
Aprendizaje Profundo , Epistasis Genética , Análisis de Datos , Genómica , Expresión Génica , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38701413

RESUMEN

With the emergence of large amount of single-cell RNA sequencing (scRNA-seq) data, the exploration of computational methods has become critical in revealing biological mechanisms. Clustering is a representative for deciphering cellular heterogeneity embedded in scRNA-seq data. However, due to the diversity of datasets, none of the existing single-cell clustering methods shows overwhelming performance on all datasets. Weighted ensemble methods are proposed to integrate multiple results to improve heterogeneity analysis performance. These methods are usually weighted by considering the reliability of the base clustering results, ignoring the performance difference of the same base clustering on different cells. In this paper, we propose a high-order element-wise weighting strategy based self-representative ensemble learning framework: scEWE. By assigning different base clustering weights to individual cells, we construct and optimize the consensus matrix in a careful and exquisite way. In addition, we extracted the high-order information between cells, which enhanced the ability to represent the similarity relationship between cells. scEWE is experimentally shown to significantly outperform the state-of-the-art methods, which strongly demonstrates the effectiveness of the method and supports the potential applications in complex single-cell data analytical problems.


Asunto(s)
Análisis de Secuencia de ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , Algoritmos , Biología Computacional/métodos , Humanos , RNA-Seq/métodos
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38366803

RESUMEN

The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.


Asunto(s)
Algoritmos , Análisis de Expresión Génica de una Sola Célula , Benchmarking , Entropía , Biblioteca de Genes , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Análisis por Conglomerados
5.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631401

RESUMEN

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
6.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38061196

RESUMEN

Identifying cell types is crucial for understanding the functional units of an organism. Machine learning has shown promising performance in identifying cell types, but many existing methods lack biological significance due to poor interpretability. However, it is of the utmost importance to understand what makes cells share the same function and form a specific cell type, motivating us to propose a biologically interpretable method. CellTICS prioritizes marker genes with cell-type-specific expression, using a hierarchy of biological pathways for neural network construction, and applying a multi-predictive-layer strategy to predict cell and sub-cell types. CellTICS usually outperforms existing methods in prediction accuracy. Moreover, CellTICS can reveal pathways that define a cell type or a cell type under specific physiological conditions, such as disease or aging. The nonlinear nature of neural networks enables us to identify many novel pathways. Interestingly, some of the pathways identified by CellTICS exhibit differential expression "variability" rather than differential expression across cell types, indicating that expression stochasticity within a pathway could be an important feature characteristic of a cell type. Overall, CellTICS provides a biologically interpretable method for identifying and characterizing cell types, shedding light on the underlying pathways that define cellular heterogeneity and its role in organismal function. CellTICS is available at https://github.com/qyyin0516/CellTICS.


Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Redes Neurales de la Computación , Aprendizaje Automático , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
7.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37183449

RESUMEN

Undoubtedly, single-cell RNA sequencing (scRNA-seq) has changed the research landscape by providing insights into heterogeneous, complex and rare cell populations. Given that more such data sets will become available in the near future, their accurate assessment with compatible and robust models for cell type annotation is a prerequisite. Considering this, herein, we developed scAnno (scRNA-seq data annotation), an automated annotation tool for scRNA-seq data sets primarily based on the single-cell cluster levels, using a joint deconvolution strategy and logistic regression. We explicitly constructed a reference profile for human (30 cell types and 50 human tissues) and a reference profile for mouse (26 cell types and 50 mouse tissues) to support this novel methodology (scAnno). scAnno offers a possibility to obtain genes with high expression and specificity in a given cell type as cell type-specific genes (marker genes) by combining co-expression genes with seed genes as a core. Of importance, scAnno can accurately identify cell type-specific genes based on cell type reference expression profiles without any prior information. Particularly, in the peripheral blood mononuclear cell data set, the marker genes identified by scAnno showed cell type-specific expression, and the majority of marker genes matched exactly with those included in the CellMarker database. Besides validating the flexibility and interpretability of scAnno in identifying marker genes, we also proved its superiority in cell type annotation over other cell type annotation tools (SingleR, scPred, CHETAH and scmap-cluster) through internal validation of data sets (average annotation accuracy: 99.05%) and cross-platform data sets (average annotation accuracy: 95.56%). Taken together, we established the first novel methodology that utilizes a deconvolution strategy for automated cell typing and is capable of being a significant application in broader scRNA-seq analysis. scAnno is available at https://github.com/liuhong-jia/scAnno.


Asunto(s)
Algoritmos , Programas Informáticos , Animales , Ratones , Humanos , Perfilación de la Expresión Génica/métodos , Leucocitos Mononucleares , Análisis de la Célula Individual/métodos , ARN/genética , Análisis de Secuencia de ARN/métodos
8.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37080771

RESUMEN

Single-cell RNA sequencing (scRNA-seq) has significantly accelerated the experimental characterization of distinct cell lineages and types in complex tissues and organisms. Cell-type annotation is of great importance in most of the scRNA-seq analysis pipelines. However, manual cell-type annotation heavily relies on the quality of scRNA-seq data and marker genes, and therefore can be laborious and time-consuming. Furthermore, the heterogeneity of scRNA-seq datasets poses another challenge for accurate cell-type annotation, such as the batch effect induced by different scRNA-seq protocols and samples. To overcome these limitations, here we propose a novel pipeline, termed TripletCell, for cross-species, cross-protocol and cross-sample cell-type annotation. We developed a cell embedding and dimension-reduction module for the feature extraction (FE) in TripletCell, namely TripletCell-FE, to leverage the deep metric learning-based algorithm for the relationships between the reference gene expression matrix and the query cells. Our experimental studies on 21 datasets (covering nine scRNA-seq protocols, two species and three tissues) demonstrate that TripletCell outperformed state-of-the-art approaches for cell-type annotation. More importantly, regardless of protocols or species, TripletCell can deliver outstanding and robust performance in annotating different types of cells. TripletCell is freely available at https://github.com/liuyan3056/TripletCell. We believe that TripletCell is a reliable computational tool for accurately annotating various cell types using scRNA-seq data and will be instrumental in assisting the generation of novel biological hypotheses in cell biology.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados
9.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35598331

RESUMEN

Differential expression (DE) gene detection in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data is a key step to understand the biological question investigated. Filtering genes is suggested to improve the performance of DE methods, but the influence of filtering genes has not been demonstrated. Furthermore, the optimal methods for different scRNA-seq datasets are divergent, and different datasets should benefit from data-specific DE gene detection strategies. However, existing tools did not take gene filtering into consideration. There is a lack of metrics for evaluating the optimal method on experimental datasets. Based on two new metrics, we propose single-cell Consensus Optimization of Differentially Expressed gene detection, an R package to automatically optimize DE gene detection for each experimental scRNA-seq dataset.


Asunto(s)
ARN , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos
10.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34864871

RESUMEN

Advances in high-throughput experimental technologies promote the accumulation of vast number of biomedical data. Biomedical link prediction and single-cell RNA-sequencing (scRNA-seq) data imputation are two essential tasks in biomedical data analyses, which can facilitate various downstream studies and gain insights into the mechanisms of complex diseases. Both tasks can be transformed into matrix completion problems. For a variety of matrix completion tasks, matrix factorization has shown promising performance. However, the sparseness and high dimensionality of biomedical networks and scRNA-seq data have raised new challenges. To resolve these issues, various matrix factorization methods have emerged recently. In this paper, we present a comprehensive review on such matrix factorization methods and their usage in biomedical link prediction and scRNA-seq data imputation. Moreover, we select representative matrix factorization methods and conduct a systematic empirical comparison on 15 real data sets to evaluate their performance under different scenarios. By summarizing the experimental results, we provide general guidelines for selecting matrix factorization methods for different biomedical matrix completion tasks and point out some future directions to further improve the performance for biomedical link prediction and scRNA-seq data imputation.


Asunto(s)
Análisis de Datos , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
11.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34913057

RESUMEN

Single-cell RNA sequencing (scRNA-seq) allows quantitative analysis of gene expression at the level of single cells, beneficial to study cell heterogeneity. The recognition of cell types facilitates the construction of cell atlas in complex tissues or organisms, which is the basis of almost all downstream scRNA-seq data analyses. Using disease-related scRNA-seq data to perform the prediction of disease status can facilitate the specific diagnosis and personalized treatment of disease. Since single-cell gene expression data are high-dimensional and sparse with dropouts, we propose scIAE, an integrative autoencoder-based ensemble classification framework, to firstly perform multiple random projections and apply integrative and devisable autoencoders (integrating stacked, denoising and sparse autoencoders) to obtain compressed representations. Then base classifiers are built on the lower-dimensional representations and the predictions from all base models are integrated. The comparison of scIAE and common feature extraction methods shows that scIAE is effective and robust, independent of the choice of dimension, which is beneficial to subsequent cell classification. By testing scIAE on different types of data and comparing it with existing general and single-cell-specific classification methods, it is proven that scIAE has a great classification power in cell type annotation intradataset, across batches, across platforms and across species, and also disease status prediction. The architecture of scIAE is flexible and devisable, and it is available at https://github.com/JGuan-lab/scIAE.


Asunto(s)
Análisis de Datos , Análisis de la Célula Individual , Perfilación de la Expresión Génica , RNA-Seq , Análisis de Secuencia de ARN , Análisis de la Célula Individual/métodos , Secuenciación del Exoma
12.
Methods ; 220: 90-97, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37952704

RESUMEN

For a given single cell RNA-seq data, it is critical to pinpoint key cellular stages and quantify cells' differentiation potency along a differentiation pathway in a time course manner. Currently, several methods based on the entropy of gene functions or PPI network have been proposed to solve the problem. Nevertheless, these methods still suffer from the inaccurate interactions and noises originating from scRNA-seq profile. In this study, we proposed a cell potency inference method based on cell-specific network entropy, called SPIDE. SPIDE introduces the local weighted cell-specific network for each cell to maintain cell heterogeneity and calculates the entropy by incorporating gene expression with network structure. In this study, we compared three cell entropy estimation models on eight scRNA-Seq datasets. The results show that SPIDE obtains consistent conclusions with real cell differentiation potency on most datasets. Moreover, SPIDE accurately recovers the continuous changes of potency during cell differentiation and significantly correlates with the stemness of tumor cells in Colorectal cancer. To conclude, our study provides a universal and accurate framework for cell entropy estimation, which deepens our understanding of cell differentiation, the development of diseases and other related biological research.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Entropía , Diferenciación Celular/genética , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos
13.
Int J Mol Sci ; 25(11)2024 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-38892162

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is widely used to interpret cellular states, detect cell subpopulations, and study disease mechanisms. In scRNA-seq data analysis, cell clustering is a key step that can identify cell types. However, scRNA-seq data are characterized by high dimensionality and significant sparsity, presenting considerable challenges for clustering. In the high-dimensional gene expression space, cells may form complex topological structures. Many conventional scRNA-seq data analysis methods focus on identifying cell subgroups rather than exploring these potential high-dimensional structures in detail. Although some methods have begun to consider the topological structures within the data, many still overlook the continuity and complex topology present in single-cell data. We propose a deep learning framework that begins by employing a zero-inflated negative binomial (ZINB) model to denoise the highly sparse and over-dispersed scRNA-seq data. Next, scZAG uses an adaptive graph contrastive representation learning approach that combines approximate personalized propagation of neural predictions graph convolution (APPNPGCN) with graph contrastive learning methods. By using APPNPGCN as the encoder for graph contrastive learning, we ensure that each cell's representation reflects not only its own features but also its position in the graph and its relationships with other cells. Graph contrastive learning exploits the relationships between nodes to capture the similarity among cells, better representing the data's underlying continuity and complex topology. Finally, the learned low-dimensional latent representations are clustered using Kullback-Leibler divergence. We validated the superior clustering performance of scZAG on 10 common scRNA-seq datasets in comparison to existing state-of-the-art clustering methods.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Humanos , RNA-Seq/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Programas Informáticos , Aprendizaje Profundo , Biología Computacional/métodos , Análisis de Expresión Génica de una Sola Célula
14.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32422654

RESUMEN

A single-sample network (SSN) is a biological molecular network constructed from single-sample data given a reference dataset and can provide insights into the mechanisms of individual diseases and aid in the development of personalized medicine. In this study, we proposed a computational method, a partial correlation-based single-sample network (P-SSN), which not only infers a network from each single-sample data given a reference dataset but also retains the direct interactions by excluding indirect interactions (https://github.com/hyhRise/P-SSN). By applying P-SSN to analyze tumor data from the Cancer Genome Atlas and single cell data, we validated the effectiveness of P-SSN in predicting driver mutation genes (DMGs), producing network distance, identifying subtypes and further classifying single cells. In particular, P-SSN is highly effective in predicting DMGs based on single-sample data. P-SSN is also efficient for subtyping complex diseases and for clustering single cells by introducing network distance between any two samples.


Asunto(s)
Biología Computacional/métodos , Algoritmos , Conjuntos de Datos como Asunto , Perfilación de la Expresión Génica/métodos , Humanos , Medicina de Precisión , Análisis de la Célula Individual/métodos
15.
BMC Genomics ; 23(1): 851, 2022 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-36564711

RESUMEN

In the analysis of single-cell RNA-sequencing (scRNA-seq) data, how to effectively and accurately identify cell clusters from a large number of cell mixtures is still a challenge. Low-rank representation (LRR) method has achieved excellent results in subspace clustering. But in previous studies, most LRR-based methods usually choose the original data matrix as the dictionary. In addition, the methods based on LRR usually use spectral clustering algorithm to complete cell clustering. Therefore, there is a matching problem between the spectral clustering method and the affinity matrix, which is difficult to ensure the optimal effect of clustering. Considering the above two points, we propose the DLNLRR method to better identify the cell type. First, DLNLRR can update the dictionary during the optimization process instead of using the predefined fixed dictionary, so it can realize dictionary learning and LRR learning at the same time. Second, DLNLRR can realize subspace clustering without relying on spectral clustering algorithm, that is, we can perform clustering directly based on the low-rank matrix. Finally, we carry out a large number of experiments on real single-cell datasets and experimental results show that DLNLRR is superior to other scRNA-seq data analysis algorithms in cell type identification.


Asunto(s)
Algoritmos , Aprendizaje , Análisis por Conglomerados , Análisis de Datos , ARN/genética , Análisis de la Célula Individual , Análisis de Secuencia de ARN
16.
Adv Exp Med Biol ; 1338: 199-208, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34973026

RESUMEN

We live in the big data era in the biomedical field, where machine learning has a very important contribution to the interpretation of complex biological processes and diseases, since it has the potential to create predictive models from multidimensional data sets. Part of the application of machine learning in biomedical science is to study and model complex cellular systems such as biological networks. In this context, the study of complex diseases, such as Alzheimer's diseases (AD), benefits from established methodologies of network science and machine learning as they offer algorithmic tools and techniques that can address the limitations and challenges of modeling and studying cellular AD-related networks. In this paper we analyze the opportunities and challenges at the intersection of machine learning and network biology and whether this can affect the biological interpretation and clarification of diseases. Specifically, we focus on GRN techniques which through omics data and the use of machine learning techniques can construct a network that captures all the information at the molecular level for the disease under study. We record the emerging machine learning techniques that are focus on ensemble tree-based techniques in the area of classification and regression. Their potential for unraveling the complexity of model cellular systems in complex diseases, such as AD, offers the opportunity for novel machine learning methodologies to decipher the mechanisms of the various AD processes.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Aprendizaje Automático , Modelos Biológicos
17.
Interdiscip Sci ; 16(1): 1-15, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37815679

RESUMEN

Single-cell RNA sequencing technology is one of the most cost-effective ways to uncover transcriptomic heterogeneity. With the rapid rise of this technology, enormous amounts of scRNA-seq data have been produced. Due to the high dimensionality, noise, sparsity and missing features of the available scRNA-seq data, accurately clustering the scRNA-seq data for downstream analysis is a significant challenge. Many computational methods have been designed to address this issue; nevertheless, the efficacy of the available methods is still inadequate. In addition, most similarity-based methods require a number of clusters as input, which is difficult to achieve in real applications. In this study, we developed a novel computational method for clustering scRNA-seq data by considering both global and local information, named GCFG. This method characterizes the global properties of data by applying concept factorization, and the regularized Gaussian graphical model is utilized to evaluate the local embedding relationship of data. To learn the cell-cell similarity matrix, we integrated the two components, and an iterative optimization algorithm was developed. The categorization of single cells is obtained by applying Louvain, a modularity-based community discovery algorithm, to the similarity matrix. The behavior of the GCFG approach is assessed on 14 real scRNA-seq datasets in terms of ACC and ARI, and comparison results with 17 other competitive methods suggest that GCFG is effective and robust.


Asunto(s)
Análisis de la Célula Individual , Análisis de Expresión Génica de una Sola Célula , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Análisis por Conglomerados
18.
Interdiscip Sci ; 16(2): 304-317, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38368575

RESUMEN

With the advent of single-cell RNA sequencing (scRNA-seq) technology, many scRNA-seq data have become available, providing an unprecedented opportunity to explore cellular composition and heterogeneity. Recently, many computational algorithms for predicting cell type composition have been developed, and these methods are typically evaluated on different datasets and performance metrics using diverse techniques. Consequently, the lack of comprehensive and standardized comparative analysis makes it difficult to gain a clear understanding of the strengths and weaknesses of these methods. To address this gap, we reviewed 20 cutting-edge unsupervised cell type identification methods and evaluated these methods comprehensively using 24 real scRNA-seq datasets of varying scales. In addition, we proposed a new ensemble cell-type identification method, named scEM, which learns the consensus similarity matrix by applying the entropy weight method to the four representative methods are selected. The Louvain algorithm is adopted to obtain the final classification of individual cells based on the consensus matrix. Extensive evaluation and comparison with 11 other similarity-based methods under real scRNA-seq datasets demonstrate that the newly developed ensemble algorithm scEM is effective in predicting cellular type composition.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Biología Computacional/métodos , Análisis de Expresión Génica de una Sola Célula
19.
Brief Funct Genomics ; 23(4): 295-302, 2024 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-38267084

RESUMEN

Numerous methods have been developed to integrate spatial transcriptomics sequencing data with single-cell RNA sequencing (scRNA-seq) data. Continuous development and improvement of these methods offer multiple options for integrating and analyzing scRNA-seq and spatial transcriptomics data based on diverse research inquiries. However, each method has its own advantages, limitations and scope of application. Researchers need to select the most suitable method for their research purposes based on the actual situation. This review article presents a compilation of 19 integration methods sourced from a wide range of available approaches, serving as a comprehensive reference for researchers to select the suitable integration method for their specific research inquiries. By understanding the principles of these methods, we can identify their similarities and differences, comprehend their applicability and potential complementarity, and lay the foundation for future method development and understanding. This review article presents 19 methods that aim to integrate scRNA-seq data and spatial transcriptomics data. The methods are classified into two main groups and described accordingly. The article also emphasizes the incorporation of High Variance Genes in annotating various technologies, aiming to obtain biologically relevant information aligned with the intended purpose.


Asunto(s)
Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Transcriptoma/genética , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Programas Informáticos , Animales , Análisis de Expresión Génica de una Sola Célula
20.
Genome Biol ; 25(1): 96, 2024 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622747

RESUMEN

We present a non-parametric statistical method called TDEseq that takes full advantage of smoothing splines basis functions to account for the dependence of multiple time points in scRNA-seq studies, and uses hierarchical structure linear additive mixed models to model the correlated cells within an individual. As a result, TDEseq demonstrates powerful performance in identifying four potential temporal expression patterns within a specific cell type. Extensive simulation studies and the analysis of four published scRNA-seq datasets show that TDEseq can produce well-calibrated p-values and up to 20% power gain over the existing methods for detecting temporal gene expression patterns.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Simulación por Computador , Expresión Génica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA