Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493338

RESUMO

In recent years, there has been a growing trend in the realm of parallel clustering analysis for single-cell RNA-seq (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) data. However, prevailing methods often treat these two data modalities as equals, neglecting the fact that the scRNA mode holds significantly richer information compared to the scATAC. This disregard hinders the model benefits from the insights derived from multiple modalities, compromising the overall clustering performance. To this end, we propose an effective multi-modal clustering model scEMC for parallel scRNA and Assay of Transposase Accessible Chromatin data. Concretely, we have devised a skip aggregation network to simultaneously learn global structural information among cells and integrate data from diverse modalities. To safeguard the quality of integrated cell representation against the influence stemming from sparse scATAC data, we connect the scRNA data with the aggregated representation via skip connection. Moreover, to effectively fit the real distribution of cells, we introduced a Zero Inflated Negative Binomial-based denoising autoencoder that accommodates corrupted data containing synthetic noise, concurrently integrating a joint optimization module that employs multiple losses. Extensive experiments serve to underscore the effectiveness of our model. This work contributes significantly to the ongoing exploration of cell subpopulations and tumor microenvironments, and the code of our work will be public at https://github.com/DayuHuu/scEMC.


Assuntos
Cromatina , RNA Citoplasmático Pequeno , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , Aprendizagem , RNA Citoplasmático Pequeno/genética , Transposases , Análise de Sequência de RNA , Perfilação da Expressão Gênica
2.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39356327

RESUMO

Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.


Assuntos
Análise de Célula Única , Análise por Conglomerados , Análise de Célula Única/métodos , Humanos , Algoritmos , Microambiente Tumoral , Biologia Computacional/métodos
3.
Mol Cell ; 72(6): 1035-1049.e5, 2018 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-30503769

RESUMO

Membrane-less organelles (MLOs) are liquid-like subcellular compartments that form through phase separation of proteins and RNA. While their biophysical properties are increasingly understood, their regulation and the consequences of perturbed MLO states for cell physiology are less clear. To study the regulatory networks, we targeted 1,354 human genes and screened for morphological changes of nucleoli, Cajal bodies, splicing speckles, PML nuclear bodies (PML-NBs), cytoplasmic processing bodies, and stress granules. By multivariate analysis of MLO features we identified hundreds of genes that control MLO homeostasis. We discovered regulatory crosstalk between MLOs, and mapped hierarchical interactions between aberrant MLO states and cellular properties. We provide evidence that perturbation of pre-mRNA splicing results in stress granule formation and reveal that PML-NB abundance influences DNA replication rates and that PML-NBs are in turn controlled by HIP kinases. Together, our comprehensive dataset is an unprecedented resource for deciphering the regulation and biological functions of MLOs.


Assuntos
Organelas/genética , Estresse Fisiológico/genética , Biologia de Sistemas/métodos , Transcriptoma , Replicação do DNA , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Células HeLa , Humanos , Organelas/metabolismo , Transição de Fase , Interferência de RNA , Precursores de RNA/genética , RNA Mensageiro/genética , Transdução de Sinais/genética , Análise de Célula Única
4.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37935617

RESUMO

Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Expressão Gênica
5.
BMC Bioinformatics ; 25(Suppl 2): 292, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237886

RESUMO

BACKGROUND: With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging. RESULTS: We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property. CONCLUSIONS: SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de RNA/métodos , Redes Reguladoras de Genes , RNA-Seq/métodos , Algoritmos , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única
6.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34553226

RESUMO

The development of single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) technology has led to great opportunities for the identification of heterogeneous cell types in complex tissues. Clustering algorithms are of great importance to effectively identify different cell types. In addition, the definition of the distance between each two cells is a critical step for most clustering algorithms. In this study, we found that different distance measures have considerably different effects on clustering algorithms. Moreover, there is no specific distance measure that is applicable to all datasets. In this study, we introduce a new single-cell clustering method called SD-h, which generates an applicable distance measure for different kinds of datasets by optimally synthesizing commonly used distance measures. Then, hierarchical clustering is performed based on the new distance measure for more accurate cell-type clustering. SD-h was tested on nine frequently used scRNA-seq datasets and it showed great superiority over almost all the compared leading single-cell clustering algorithms.


Assuntos
Algoritmos , RNA , Análise por Conglomerados , Consenso , Análise de Sequência de RNA/métodos
7.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35524494

RESUMO

Clustering analysis is widely used in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While many clustering methods have been developed for scRNA-seq analysis, most of these methods require to provide the number of clusters. However, it is not easy to know the exact number of cell types in advance, and experienced determination is not always reliable. Here, we have developed ADClust, an automatic deep embedding clustering method for scRNA-seq data, which can accurately cluster cells without requiring a predefined number of clusters. Specifically, ADClust first obtains low-dimensional representation through pre-trained autoencoder and uses the representations to cluster cells into initial micro-clusters. The clusters are then compared in between by a statistical test, and similar micro-clusters are merged into larger clusters. According to the clustering, cell representations are updated so that each cell will be pulled toward centers of its assigned cluster and similar clusters, while cells are separated to keep distances between clusters. This is accomplished through jointly optimizing the carefully designed clustering and autoencoder loss functions. This merging process continues until convergence. ADClust was tested on 11 real scRNA-seq datasets and was shown to outperform existing methods in terms of both clustering performance and the accuracy on the number of the determined clusters. More importantly, our model provides high speed and scalability for large datasets.


Assuntos
RNA , Análise de Célula Única , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , RNA/genética , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
8.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35151228

RESUMO

Identifying differential genes over conditions provides insights into the mechanisms of biological processes and disease progression. Here we present an approach, the Kullback-Leibler divergence-based differential distribution (klDD), which provides a flexible framework for quantifying changes in higher-order statistical information of genes including mean and variance/covariation. The method can well detect subtle differences in gene expression distributions in contrast to mean or variance shifts of the existing methods. In addition to effectively identifying informational genes in terms of differential distribution, klDD can be directly applied to cancer subtyping, single-cell clustering and disease early-warning detection, which were all validated by various benchmark datasets.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados , Progressão da Doença , Perfilação da Expressão Gênica/métodos , Humanos
9.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36151725

RESUMO

Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.


Assuntos
Benchmarking , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , RNA-Seq , Análise por Conglomerados , Algoritmos
10.
BMC Genomics ; 24(1): 725, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38036964

RESUMO

In recent single-cell -omics studies, both the differential activity of transcription factors regulating cell fate determination and differential genome activation have been tested for utility as descriptors of cell types. Naturally, genome accessibility and gene expression are interlinked. To understand the variability in genomic feature activation in the GABAergic neurons of different spatial origins, we have mapped accessible chromatin regions and mRNA expression in single cells derived from the developing mouse central nervous system (CNS). We first defined a reference set of open chromatin regions for scATAC-seq read quantitation across samples, allowing comparison of chromatin accessibility between brain regions and cell types directly. Second, we integrated the scATAC-seq and scRNA-seq data to form a unified resource of transcriptome and chromatin accessibility landscape for the cell types in di- and telencephalon, midbrain and anterior hindbrain of E14.5 mouse embryo. Importantly, we implemented resolution optimization at the clustering, and automatized the cell typing step. We show high level of concordance between the cell clustering based on the chromatin accessibility and the transcriptome in analyzed neuronal lineages, indicating that both genome and transcriptome features can be used for cell type definition. Hierarchical clustering by the similarity in accessible chromatin reveals that the genomic feature activation correlates with neurotransmitter phenotype, selector gene expression, cell differentiation stage and neuromere origins.


Assuntos
Cromatina , Fatores de Transcrição , Animais , Camundongos , Cromatina/genética , Diferenciação Celular/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Genoma , Encéfalo/metabolismo , Análise de Célula Única
11.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33940590

RESUMO

Single-cell clustering is an important part of analyzing single-cell RNA-sequencing data. However, the accuracy and robustness of existing methods are disturbed by noise. One promising approach for addressing this challenge is integrating pathway information, which can alleviate noise and improve performance. In this work, we studied the impact on accuracy and robustness of existing single-cell clustering methods by integrating pathways. We collected 10 state-of-the-art single-cell clustering methods, 26 scRNA-seq datasets and four pathway databases, combined the AUCell method and the similarity network fusion to integrate pathway data and scRNA-seq data, and introduced three accuracy indicators, three noise generation strategies and robustness indicators. Experiments on this framework showed that integrating pathways can significantly improve the accuracy and robustness of most single-cell clustering methods.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , Sequenciamento do Exoma , RNA-Seq , Análise de Célula Única , Análise por Conglomerados
12.
BMC Bioinformatics ; 22(1): 578, 2021 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-34856921

RESUMO

BACKGROUND: Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. RESULTS: In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell-cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell-cell crosstalk. CONCLUSIONS: To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.


Assuntos
MicroRNAs , Análise por Conglomerados , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , MicroRNAs/genética , RNA Mensageiro/genética
13.
BMC Bioinformatics ; 21(1): 440, 2020 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028196

RESUMO

BACKGROUND: Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. RESULTS: We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. CONCLUSION: The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.


Assuntos
RNA-Seq/métodos , Algoritmos , Análise por Conglomerados , Expressão Gênica , Humanos , Análise de Sequência de RNA
14.
Methods Mol Biol ; 2757: 383-445, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38668977

RESUMO

The emergence and development of single-cell RNA sequencing (scRNA-seq) techniques enable researchers to perform large-scale analysis of the transcriptomic profiling at cell-specific resolution. Unsupervised clustering of scRNA-seq data is central for most studies, which is essential to identify novel cell types and their gene expression logics. Although an increasing number of algorithms and tools are available for scRNA-seq analysis, a practical guide for users to navigate the landscape remains underrepresented. This chapter presents an overview of the scRNA-seq data analysis pipeline, quality control, batch effect correction, data standardization, cell clustering and visualization, cluster correlation analysis, and marker gene identification. Taking the two broadly used analysis packages, i.e., Scanpy and MetaCell, as examples, we provide a hands-on guideline and comparison regarding the best practices for the above essential analysis steps and data visualization. Additionally, we compare both packages and algorithms using a scRNA-seq dataset of the ctenophore Mnemiopsis leidyi, which is representative of one of the earliest animal lineages, critical to understanding the origin and evolution of animal novelties. This pipeline can also be helpful for analyses of other taxa, especially prebilaterian animals, where these tools are under development (e.g., placozoan and Porifera).


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Análise de Célula Única , Software , Análise de Célula Única/métodos , Animais , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Análise por Conglomerados , Transcriptoma/genética
15.
Methods Mol Biol ; 2812: 155-168, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39068361

RESUMO

This chapter shows applying the Asymmetric Within-Sample Transformation to single-cell RNA-Seq data matched with a previous dropout imputation. The asymmetric transformation is a special winsorization that flattens low-expressed intensities and preserves highly expressed gene levels. Before a standard hierarchical clustering algorithm, an intermediate step removes noninformative genes according to a threshold applied to a per-gene entropy estimate. Following the clustering, a time-intensive algorithm is shown to uncover the molecular features associated with each cluster. This step implements a resampling algorithm to generate a random baseline to measure up/downregulated significant genes. To this aim, we adopt a GLM model as implemented in DESeq2 package. We render the results in graphical mode. While the tools are standard heat maps, we introduce some data scaling to clarify the results' reliability.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Humanos , Perfilação da Expressão Gênica/métodos , Software , Biologia Computacional/métodos , RNA-Seq/métodos
16.
Brief Funct Genomics ; 22(4): 329-340, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-36848584

RESUMO

Single-cell clustering is the most significant part of single-cell RNA sequencing (scRNA-seq) data analysis. One main issue facing the scRNA-seq data is noise and sparsity, which poses a great challenge for the advance of high-precision clustering algorithms. This study adopts cellular markers to identify differences between cells, which contributes to feature extraction of single cells. In this work, we propose a high-precision single-cell clustering algorithm-SCMcluster (single-cell cluster using marker genes). This algorithm integrates two cell marker databases(CellMarker database and PanglaoDB database) with scRNA-seq data for feature extraction and constructs an ensemble clustering model based on the consensus matrix. We test the efficiency of this algorithm and compare it with other eight popular clustering algorithms on two scRNA-seq datasets derived from human and mouse tissues, respectively. The experimental results show that SCMcluster outperforms the existing methods in both feature extraction and clustering performance. The source code of SCMcluster is available for free at https://github.com/HaoWuLab-Bioinformatics/SCMcluster.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Animais , Humanos , Camundongos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados
17.
Front Genet ; 14: 1183099, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37091787

RESUMO

Identifying different types of cells in scRNA-seq data is a critical task in single-cell data analysis. In this paper, we propose a method called ProgClust for the decomposition of cell populations and detection of rare cells. ProgClust represents the single-cell data with clustering trees where a progressive searching method is designed to select cell population-specific genes and cluster cells. The obtained trees reveal the structure of both abundant cell populations and rare cell populations. Additionally, it can automatically determine the number of clusters. Experimental results show that ProgClust outperforms the baseline method and is capable of accurately identifying both common and rare cells. Moreover, when applied to real unlabeled data, it reveals potential cell subpopulations which provides clues for further exploration. In summary, ProgClust shows potential in identifying subpopulations of complex single-cell data.

18.
Cell Metab ; 34(8): 1214-1225.e6, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35858629

RESUMO

Cells often adopt different phenotypes, dictated by tissue-specific or local signals such as cell-cell and cell-matrix contacts or molecular micro-environment. This holds in extremis for macrophages with their high phenotypic plasticity. Their broad range of functions, some even opposing, reflects their heterogeneity, and a multitude of subsets has been described in different tissues and diseases. Such micro-environmental imprint cannot be adequately studied by single-cell applications, as cells are detached from their context, while histology-based assessment lacks the phenotypic depth due to limitations in marker combination. Here, we present a novel, integrative approach in which 15-color multispectral imaging allows comprehensive cell classification based on multi-marker expression patterns, followed by downstream analysis pipelines to link their phenotypes to contextual, micro-environmental cues, such as their cellular ("community") and metabolic ("local lipidome") niches in complex tissue. The power of this approach is illustrated for myeloid subsets and associated lipid signatures in murine atherosclerotic plaque.


Assuntos
Aterosclerose , Placa Aterosclerótica , Animais , Aterosclerose/metabolismo , Biomarcadores/metabolismo , Macrófagos/metabolismo , Espectrometria de Massas , Camundongos , Fenótipo
19.
Interdiscip Sci ; 14(2): 394-408, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35028910

RESUMO

Cell type determination based on transcriptome profiles is a key application of single-cell RNA sequencing (scRNA-seq). It is usually achieved through unsupervised clustering. Good feature selection is capable of improving the clustering accuracy and is a crucial component of single-cell clustering pipelines. However, most current single-cell feature selection methods are univariable filter methods ignoring gene dependency. Even the multivariable filter methods developed in recent years only consider "one-to-many" relationship between genes. In this paper, a novel single-cell feature selection method based on convex analysis of mixtures (FSCAM) is proposed, which takes into account "many-to-many" relationship. Compared to the previous "one-to-many" methods, FSCAM selects genes with a combination of relevancy, redundancy and completeness. Pertinent benchmarking is conducted on the real datasets to validate the superiority of FSCAM. Through plugging into the framework of partition around medoids (PAM) clustering, a single-cell clustering algorithm based on FSCAM method (SCC_FSCAM) is further developed. Comparing SCC_FSCAM with existing advanced clustering algorithms, the results show that our algorithm has advantages in both internal criteria (clustering number) and external criteria (adjusted Rand index) and has a good stability.


Assuntos
Algoritmos , Análise de Célula Única , Análise por Conglomerados , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma
20.
Genome Biol ; 22(1): 21, 2021 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-33413539

RESUMO

In any 'omics study, the scale of analysis can dramatically affect the outcome. For instance, when clustering single-cell transcriptomes, is the analysis tuned to discover broad or specific cell types? Likewise, protein communities revealed from protein networks can vary widely in sizes depending on the method. Here, we use the concept of persistent homology, drawn from mathematical topology, to identify robust structures in data at all scales simultaneously. Application to mouse single-cell transcriptomes significantly expands the catalog of identified cell types, while analysis of SARS-COV-2 protein interactions suggests hijacking of WNT. The method, HiDeF, is available via Python and Cytoscape.


Assuntos
Biologia Computacional/métodos , SARS-CoV-2/metabolismo , Algoritmos , Animais , Humanos , Camundongos , Proteínas Virais/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA