Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 9.028
Filter
Add more filters

Publication year range
1.
Cell ; 186(26): 5876-5891.e20, 2023 12 21.
Article in English | MEDLINE | ID: mdl-38134877

ABSTRACT

Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.


Subject(s)
Gene Expression Profiling , Transcriptome , Humans , Databases, Factual , Single-Cell Analysis
2.
Cell ; 185(1): 184-203.e19, 2022 01 06.
Article in English | MEDLINE | ID: mdl-34963056

ABSTRACT

Cancers display significant heterogeneity with respect to tissue of origin, driver mutations, and other features of the surrounding tissue. It is likely that individual tumors engage common patterns of the immune system-here "archetypes"-creating prototypical non-destructive tumor immune microenvironments (TMEs) and modulating tumor-targeting. To discover the dominant immune system archetypes, the University of California, San Francisco (UCSF) Immunoprofiler Initiative (IPI) processed 364 individual tumors across 12 cancer types using standardized protocols. Computational clustering of flow cytometry and transcriptomic data obtained from cell sub-compartments uncovered dominant patterns of immune composition across cancers. These archetypes were profound insofar as they also differentiated tumors based upon unique immune and tumor gene-expression patterns. They also partitioned well-established classifications of tumor biology. The IPI resource provides a template for understanding cancer immunity as a collection of dominant patterns of immune organization and provides a rational path forward to learn how to modulate these to improve therapy.


Subject(s)
Censuses , Neoplasms/genetics , Neoplasms/immunology , Transcriptome/genetics , Tumor Microenvironment/immunology , Biomarkers, Tumor , Cluster Analysis , Cohort Studies , Computational Biology/methods , Flow Cytometry/methods , Gene Expression Regulation, Neoplastic , Humans , Neoplasms/classification , Neoplasms/pathology , RNA-Seq/methods , San Francisco , Universities
3.
Cell ; 184(11): 2988-3005.e16, 2021 05 27.
Article in English | MEDLINE | ID: mdl-34019793

ABSTRACT

Clear cell renal carcinoma (ccRCC) is a heterogeneous disease with a variable post-surgical course. To assemble a comprehensive ccRCC tumor microenvironment (TME) atlas, we performed single-cell RNA sequencing (scRNA-seq) of hematopoietic and non-hematopoietic subpopulations from tumor and tumor-adjacent tissue of treatment-naive ccRCC resections. We leveraged the VIPER algorithm to quantitate single-cell protein activity and validated this approach by comparison to flow cytometry. The analysis identified key TME subpopulations, as well as their master regulators and candidate cell-cell interactions, revealing clinically relevant populations, undetectable by gene-expression analysis. Specifically, we uncovered a tumor-specific macrophage subpopulation characterized by upregulation of TREM2/APOE/C1Q, validated by spatially resolved, quantitative multispectral immunofluorescence. In a large clinical validation cohort, these markers were significantly enriched in tumors from patients who recurred following surgery. The study thus identifies TREM2/APOE/C1Q-positive macrophage infiltration as a potential prognostic biomarker for ccRCC recurrence, as well as a candidate therapeutic target.


Subject(s)
Carcinoma, Renal Cell/metabolism , Neoplasm Recurrence, Local/genetics , Tumor-Associated Macrophages/metabolism , Adult , Apolipoproteins E/genetics , Apolipoproteins E/metabolism , Biomarkers, Tumor/genetics , Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Cohort Studies , Female , Gene Expression/genetics , Gene Expression Regulation, Neoplastic/genetics , Humans , Kidney/metabolism , Kidney Neoplasms/pathology , Lymphocytes, Tumor-Infiltrating/pathology , Macrophages/metabolism , Male , Membrane Glycoproteins/genetics , Membrane Glycoproteins/metabolism , Middle Aged , Neoplasm Recurrence, Local/metabolism , Prognosis , Receptors, Complement/genetics , Receptors, Complement/metabolism , Receptors, Immunologic/genetics , Receptors, Immunologic/metabolism , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Tumor Microenvironment , Tumor-Associated Macrophages/physiology
4.
Cell ; 173(6): 1343-1355.e24, 2018 05 31.
Article in English | MEDLINE | ID: mdl-29856953

ABSTRACT

Numerous well-defined classes of retinal ganglion cells innervate the thalamus to guide image-forming vision, yet the rules governing their convergence and divergence remain unknown. Using two-photon calcium imaging in awake mouse thalamus, we observed a functional arrangement of retinal ganglion cell axonal boutons in which coarse-scale retinotopic ordering gives way to fine-scale organization based on shared preferences for other visual features. Specifically, at the ∼6 µm scale, clusters of boutons from different axons often showed similar preferences for either one or multiple features, including axis and direction of motion, spatial frequency, and changes in luminance. Conversely, individual axons could "de-multiplex" information channels by participating in multiple, functionally distinct bouton clusters. Finally, ultrastructural analyses demonstrated that retinal axonal boutons in a local cluster often target the same dendritic domain. These data suggest that functionally specific convergence and divergence of retinal axons may impart diverse, robust, and often novel feature selectivity to visual thalamus.


Subject(s)
Axons/physiology , Retina/physiology , Retinal Ganglion Cells/physiology , Thalamus/physiology , Animals , Cluster Analysis , Dendrites/physiology , Fuzzy Logic , Geniculate Bodies/physiology , Male , Mice , Mice, Inbred C57BL , Motion , Neurons/physiology , Presynaptic Terminals/physiology , Vision, Ocular , Visual Pathways
5.
Mol Cell ; 82(16): 3015-3029.e6, 2022 08 18.
Article in English | MEDLINE | ID: mdl-35728588

ABSTRACT

Light and temperature in plants are perceived by a common receptor, phytochrome B (phyB). How phyB distinguishes these signals remains elusive. Here, we report that phyB spontaneously undergoes phase separation to assemble liquid-like droplets. This capacity is driven by its C terminus through self-association, whereas the intrinsically disordered N-terminal extension (NTE) functions as a biophysical modulator of phase separation. Light exposure triggers a conformational change to subsequently alter phyB condensate assembly, while temperature sensation is directly mediated by the NTE to modulate the phase behavior of phyB droplets. Multiple signaling components are selectively incorporated into phyB droplets to form concentrated microreactors, allowing switch-like control of phyB signaling activity through phase transitions. Therefore, light and temperature cues are separately read out by phyB via allosteric changes and spontaneous phase separation, respectively. We provide a conceptual framework showing how the distinct but highly correlated physical signals are interpreted and sorted by one receptor.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Phytochrome B/genetics , Phytochrome B/metabolism , Signal Transduction , Temperature
6.
Physiol Rev ; 102(3): 1159-1210, 2022 07 01.
Article in English | MEDLINE | ID: mdl-34927454

ABSTRACT

Ion channels play a central role in the regulation of nearly every cellular process. Dating back to the classic 1952 Hodgkin-Huxley model of the generation of the action potential, ion channels have always been thought of as independent agents. A myriad of recent experimental findings exploiting advances in electrophysiology, structural biology, and imaging techniques, however, have posed a serious challenge to this long-held axiom, as several classes of ion channels appear to open and close in a coordinated, cooperative manner. Ion channel cooperativity ranges from variable-sized oligomeric cooperative gating in voltage-gated, dihydropyridine-sensitive CaV1.2 and CaV1.3 channels to obligatory dimeric assembly and gating of voltage-gated NaV1.5 channels. Potassium channels, transient receptor potential channels, hyperpolarization cyclic nucleotide-activated channels, ryanodine receptors (RyRs), and inositol trisphosphate receptors (IP3Rs) have also been shown to gate cooperatively. The implications of cooperative gating of these ion channels range from fine-tuning excitation-contraction coupling in muscle cells to regulating cardiac function and vascular tone, to modulation of action potential and conduction velocity in neurons and cardiac cells, and to control of pacemaking activity in the heart. In this review, we discuss the mechanisms leading to cooperative gating of ion channels, their physiological consequences, and how alterations in cooperative gating of ion channels may induce a range of clinically significant pathologies.


Subject(s)
Ion Channel Gating , Ryanodine Receptor Calcium Release Channel , Action Potentials , Humans , Ion Channel Gating/physiology , Neurons
7.
Mol Cell ; 78(1): 96-111.e6, 2020 04 02.
Article in English | MEDLINE | ID: mdl-32105612

ABSTRACT

Current models suggest that chromosome domains segregate into either an active (A) or inactive (B) compartment. B-compartment chromatin is physically separated from the A compartment and compacted by the nuclear lamina. To examine these models in the developmental context of C. elegans embryogenesis, we undertook chromosome tracing to map the trajectories of entire autosomes. Early embryonic chromosomes organized into an unconventional barbell-like configuration, with two densely folded B compartments separated by a central A compartment. Upon gastrulation, this conformation matured into conventional A/B compartments. We used unsupervised clustering to uncover subpopulations with differing folding properties and variable positioning of compartment boundaries. These conformations relied on tethering to the lamina to stretch the chromosome; detachment from the lamina compacted, and allowed intermingling between, A/B compartments. These findings reveal the diverse conformations of early embryonic chromosomes and uncover a previously unappreciated role for the lamina in systemic chromosome stretching.


Subject(s)
Caenorhabditis elegans/genetics , Chromosomes/chemistry , Nuclear Lamina/physiology , Animals , Caenorhabditis elegans/embryology , Chromosomes/ultrastructure , Embryo, Nonmammalian/ultrastructure , Gastrulation/genetics , In Situ Hybridization, Fluorescence , Molecular Conformation
8.
Trends Genet ; 40(2): 160-174, 2024 02.
Article in English | MEDLINE | ID: mdl-38216391

ABSTRACT

Recent imaging studies have captured the dynamics of regulatory events of transcription inside living cells. These events include transcription factor (TF) DNA binding, chromatin remodeling and modification, enhancer-promoter (E-P) proximity, cluster formation, and preinitiation complex (PIC) assembly. Together, these molecular events culminate in stochastic bursts of RNA synthesis, but their kinetic relationship remains largely unclear. In this review, we compare the timescales of upstream regulatory steps (input) with the kinetics of transcriptional bursting (output) to generate mechanistic models of transcription dynamics in single cells. We highlight open questions and potential technical advances to guide future endeavors toward a quantitative and kinetic understanding of transcription regulation.


Subject(s)
Gene Expression Regulation , Transcription, Genetic , Promoter Regions, Genetic , Chromatin Assembly and Disassembly
9.
EMBO J ; 42(17): e109738, 2023 09 04.
Article in English | MEDLINE | ID: mdl-37401899

ABSTRACT

The centrosome linker joins the two interphase centrosomes of a cell into one microtubule organizing center. Despite increasing knowledge on linker components, linker diversity in different cell types and their role in cells with supernumerary centrosomes remained unexplored. Here, we identified Ninein as a C-Nap1-anchored centrosome linker component that provides linker function in RPE1 cells while in HCT116 and U2OS cells, Ninein and Rootletin link centrosomes together. In interphase, overamplified centrosomes use the linker for centrosome clustering, where Rootletin gains centrosome linker function in RPE1 cells. Surprisingly, in cells with centrosome overamplification, C-Nap1 loss prolongs metaphase through persistent activation of the spindle assembly checkpoint indicated by BUB1 and MAD1 accumulation at kinetochores. In cells lacking C-Nap1, the reduction of microtubule nucleation at centrosomes and the delay in nuclear envelop rupture in prophase probably cause mitotic defects like multipolar spindle formation and chromosome mis-segregation. These defects are enhanced when the kinesin HSET, which normally clusters multiple centrosomes in mitosis, is partially inhibited indicating a functional interplay between C-Nap1 and centrosome clustering in mitosis.


Subject(s)
Cell Cycle Proteins , Centrosome , Centrosome/metabolism , Cell Cycle , Cell Cycle Proteins/genetics , Cell Cycle Proteins/metabolism , Interphase/physiology , Mitosis , Spindle Apparatus/genetics , Spindle Apparatus/metabolism
10.
Proc Natl Acad Sci U S A ; 121(33): e2403771121, 2024 Aug 13.
Article in English | MEDLINE | ID: mdl-39110730

ABSTRACT

Complex systems are typically characterized by intricate internal dynamics that are often hard to elucidate. Ideally, this requires methods that allow to detect and classify in an unsupervised way the microscopic dynamical events occurring in the system. However, decoupling statistically relevant fluctuations from the internal noise remains most often nontrivial. Here, we describe "Onion Clustering": a simple, iterative unsupervised clustering method that efficiently detects and classifies statistically relevant fluctuations in noisy time-series data. We demonstrate its efficiency by analyzing simulation and experimental trajectories of various systems with complex internal dynamics, ranging from the atomic- to the microscopic-scale, in- and out-of-equilibrium. The method is based on an iterative detect-classify-archive approach. In a similar way as peeling the external (evident) layer of an onion reveals the internal hidden ones, the method performs a first detection/classification of the most populated dynamical environment in the system and of its characteristic noise. The signal of such dynamical cluster is then removed from the time-series data and the remaining part, cleared-out from its noise, is analyzed again. At every iteration, the detection of hidden dynamical subdomains is facilitated by an increasing (and adaptive) relevance-to-noise ratio. The process iterates until no new dynamical domains can be uncovered, revealing, as an output, the number of clusters that can be effectively distinguished/classified in a statistically robust way as a function of the time-resolution of the analysis. Onion Clustering is general and benefits from clear-cut physical interpretability. We expect that it will help analyzing a variety of complex dynamical systems and time-series data.

11.
Proc Natl Acad Sci U S A ; 121(12): e2317284121, 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38478692

ABSTRACT

Since its emergence in late 2019, SARS-CoV-2 has diversified into a large number of lineages and caused multiple waves of infection globally. Novel lineages have the potential to spread rapidly and internationally if they have higher intrinsic transmissibility and/or can evade host immune responses, as has been seen with the Alpha, Delta, and Omicron variants of concern. They can also cause increased mortality and morbidity if they have increased virulence, as was seen for Alpha and Delta. Phylogenetic methods provide the "gold standard" for representing the global diversity of SARS-CoV-2 and to identify newly emerging lineages. However, these methods are computationally expensive, struggle when datasets get too large, and require manual curation to designate new lineages. These challenges provide a motivation to develop complementary methods that can incorporate all of the genetic data available without down-sampling to extract meaningful information rapidly and with minimal curation. In this paper, we demonstrate the utility of using algorithmic approaches based on word-statistics to represent whole sequences, bringing speed, scalability, and interpretability to the construction of genetic topologies. While not serving as a substitute for current phylogenetic analyses, the proposed methods can be used as a complementary, and fully automatable, approach to identify and confirm new emerging variants.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , Phylogeny , Machine Learning
12.
Proc Natl Acad Sci U S A ; 121(37): e2400002121, 2024 Sep 10.
Article in English | MEDLINE | ID: mdl-39226348

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) data, susceptible to noise arising from biological variability and technical errors, can distort gene expression analysis and impact cell similarity assessments, particularly in heterogeneous populations. Current methods, including deep learning approaches, often struggle to accurately characterize cell relationships due to this inherent noise. To address these challenges, we introduce scAMF (Single-cell Analysis via Manifold Fitting), a framework designed to enhance clustering accuracy and data visualization in scRNA-seq studies. At the heart of scAMF lies the manifold fitting module, which effectively denoises scRNA-seq data by unfolding their distribution in the ambient space. This unfolding aligns the gene expression vector of each cell more closely with its underlying structure, bringing it spatially closer to other cells of the same cell type. To comprehensively assess the impact of scAMF, we compile a collection of 25 publicly available scRNA-seq datasets spanning various sequencing platforms, species, and organ types, forming an extensive RNA data bank. In our comparative studies, benchmarking scAMF against existing scRNA-seq analysis algorithms in this data bank, we consistently observe that scAMF outperforms in terms of clustering efficiency and data visualization clarity. Further experimental analysis reveals that this enhanced performance stems from scAMF's ability to improve the spatial distribution of the data and capture class-consistent neighborhoods. These findings underscore the promising application potential of manifold fitting as a tool in scRNA-seq analysis, signaling a significant enhancement in the precision and reliability of data interpretation in this critical field of study.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Sequence Analysis, RNA/methods , Animals , Algorithms , RNA/genetics , Gene Expression Profiling/methods , RNA-Seq/methods
13.
Am J Hum Genet ; 110(2): 314-325, 2023 02 02.
Article in English | MEDLINE | ID: mdl-36610401

ABSTRACT

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Likelihood Functions , Population Groups , Software , Genetics, Population
14.
Development ; 150(23)2023 Dec 01.
Article in English | MEDLINE | ID: mdl-37997741

ABSTRACT

Adaptation to dehydration stress requires plants to coordinate environmental and endogenous signals to inhibit stomatal proliferation and modulate their patterning. The stress hormone abscisic acid (ABA) induces stomatal closure and restricts stomatal lineage to promote stress tolerance. Here, we report that mutants with reduced ABA levels, xer-1, xer-2 and aba2-2, developed stomatal clusters. Similarly, the ABA signaling mutant snrk2.2/2.3/2.6, which lacks core ABA signaling kinases, also displayed stomatal clusters. Exposure to ABA or inhibition of ABA catabolism rescued the increased stomatal density and spacing defects observed in xer and aba2-2, suggesting that basal ABA is required for correct stomatal density and spacing. xer-1 and aba2-2 displayed reduced expression of EPF1 and EPF2, and enhanced expression of SPCH and MUTE. Furthermore, ABA suppressed elevated SPCH and MUTE expression in epf2-1 and epf1-1, and partially rescued epf2-1 stomatal index and epf1-1 clustering defects. Genetic analysis demonstrated that XER acts upstream of the EPF2-SPCH pathway to suppress stomatal proliferation, and in parallel with EPF1 to ensure correct stomatal spacing. These results show that basal ABA and functional ABA signaling are required to fine-tune stomatal density and patterning.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Arabidopsis/metabolism , Abscisic Acid/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Plant Stomata/metabolism , Signal Transduction/genetics , Cell Proliferation/genetics , Gene Expression Regulation, Plant
15.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38855914

ABSTRACT

Cluster analysis, a pivotal step in single-cell sequencing data analysis, presents substantial opportunities to effectively unveil the molecular mechanisms underlying cellular heterogeneity and intercellular phenotypic variations. However, the inherent imperfections arise as different clustering algorithms yield diverse estimates of cluster numbers and cluster assignments. This study introduces Single Cell Consistent Clustering based on Spectral Matrix Decomposition (SCSMD), a comprehensive clustering approach that integrates the strengths of multiple methods to determine the optimal clustering scheme. Testing the performance of SCSMD across different distances and employing the bespoke evaluation metric, the methodological selection undergoes validation to ensure the optimal efficacy of the SCSMD. A consistent clustering test is conducted on 15 authentic scRNA-seq datasets. The application of SCSMD to human embryonic stem cell scRNA-seq data successfully identifies known cell types and delineates their developmental trajectories. Similarly, when applied to glioblastoma cells, SCSMD accurately detects pre-existing cell types and provides finer sub-division within one of the original clusters. The results affirm the robust performance of our SCSMD method in terms of both the number of clusters and cluster assignments. Moreover, we have broadened the application scope of SCSMD to encompass larger datasets, thereby furnishing additional evidence of its superiority. The findings suggest that SCSMD is poised for application to additional scRNA-seq datasets and for further downstream analyses.


Subject(s)
Algorithms , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Cluster Analysis , Computational Biology/methods , Glioblastoma/genetics , Glioblastoma/pathology , Glioblastoma/metabolism
16.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38426327

ABSTRACT

Cluster assignment is vital to analyzing single-cell RNA sequencing (scRNA-seq) data to understand high-level biological processes. Deep learning-based clustering methods have recently been widely used in scRNA-seq data analysis. However, existing deep models often overlook the interconnections and interactions among network layers, leading to the loss of structural information within the network layers. Herein, we develop a new self-supervised clustering method based on an adaptive multi-scale autoencoder, called scAMAC. The self-supervised clustering network utilizes the Multi-Scale Attention mechanism to fuse the feature information from the encoder, hidden and decoder layers of the multi-scale autoencoder, which enables the exploration of cellular correlations within the same scale and captures deep features across different scales. The self-supervised clustering network calculates the membership matrix using the fused latent features and optimizes the clustering network based on the membership matrix. scAMAC employs an adaptive feedback mechanism to supervise the parameter updates of the multi-scale autoencoder, obtaining a more effective representation of cell features. scAMAC not only enables cell clustering but also performs data reconstruction through the decoding layer. Through extensive experiments, we demonstrate that scAMAC is superior to several advanced clustering and imputation methods in both data clustering and reconstruction. In addition, scAMAC is beneficial for downstream analysis, such as cell trajectory inference. Our scAMAC model codes are freely available at https://github.com/yancy2024/scAMAC.


Subject(s)
Data Analysis , Single-Cell Gene Expression Analysis , Cluster Analysis , Sequence Analysis, RNA , Gene Expression Profiling , Algorithms
17.
Brief Bioinform ; 25(6)2024 Sep 23.
Article in English | MEDLINE | ID: mdl-39373051

ABSTRACT

Single-cell ribonucleic acid sequencing (scRNA-seq) technology can be used to perform high-resolution analysis of the transcriptomes of individual cells. Therefore, its application has gained popularity for accurately analyzing the ever-increasing content of heterogeneous single-cell datasets. Central to interpreting scRNA-seq data is the clustering of cells to decipher transcriptomic diversity and infer cell behavior patterns. However, its complexity necessitates the application of advanced methodologies capable of resolving the inherent heterogeneity and limited gene expression characteristics of single-cell data. Herein, we introduce a novel deep learning-based algorithm for single-cell clustering, designated scDFN, which can significantly enhance the clustering of scRNA-seq data through a fusion network strategy. The scDFN algorithm applies a dual mechanism involving an autoencoder to extract attribute information and an improved graph autoencoder to capture topological nuances, integrated via a cross-network information fusion mechanism complemented by a triple self-supervision strategy. This fusion is optimized through a holistic consideration of four distinct loss functions. A comparative analysis with five leading scRNA-seq clustering methodologies across multiple datasets revealed the superiority of scDFN, as determined by better the Normalized Mutual Information (NMI) and the Adjusted Rand Index (ARI) metrics. Additionally, scDFN demonstrated robust multi-cluster dataset performance and exceptional resilience to batch effects. Ablation studies highlighted the key roles of the autoencoder and the improved graph autoencoder components, along with the critical contribution of the four joint loss functions to the overall efficacy of the algorithm. Through these advancements, scDFN set a new benchmark in single-cell clustering and can be used as an effective tool for the nuanced analysis of single-cell transcriptomics.


Subject(s)
Algorithms , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Cluster Analysis , Humans , Deep Learning , Sequence Analysis, RNA/methods , Transcriptome , Gene Expression Profiling/methods , Computational Biology/methods , Animals , Single-Cell Gene Expression Analysis
18.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975891

ABSTRACT

Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Gene Expression Profiling/methods , Algorithms , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods
19.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38279653

ABSTRACT

Cluster analysis is one of the most widely used exploratory methods for visualization and grouping of gene expression patterns across multiple samples or treatment groups. Although several existing online tools can annotate clusters with functional terms, there is no all-in-one webserver to effectively prioritize genes/clusters using gene essentiality as well as congruency of mRNA-protein expression. Hence, we developed CAP-RNAseq that makes possible (1) upload and clustering of bulk RNA-seq data followed by identification, annotation and network visualization of all or selected clusters; and (2) prioritization using DepMap gene essentiality and/or dependency scores as well as the degree of correlation between mRNA and protein levels of genes within an expression cluster. In addition, CAP-RNAseq has an integrated primer design tool for the prioritized genes. Herein, we showed using comparisons with the existing tools and multiple case studies that CAP-RNAseq can uniquely aid in the discovery of co-expression clusters enriched with essential genes and prioritization of novel biomarker genes that exhibit high correlations between their mRNA and protein expression levels. CAP-RNAseq is applicable to RNA-seq data from different contexts including cancer and available at http://konulabapps.bilkent.edu.tr:3838/CAPRNAseq/ and the docker image is downloadable from https://hub.docker.com/r/konulab/caprnaseq.


Subject(s)
Proteomics , Sequence Analysis, RNA/methods , RNA-Seq , RNA, Messenger/genetics
20.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38300514

ABSTRACT

Somatic copy number alterations (SCNAs) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating results consisting of frequently large sets of SCNA segments. However, the automated annotation and integration of such data are particularly challenging because the measured signals reflect biased, relative copy number ratios. In this study, we introduce labelSeg, an algorithm designed for rapid and accurate annotation of CNA segments, with the aim of enhancing the interpretation of tumor SCNA profiles. Leveraging density-based clustering and exploiting the length-amplitude relationships of SCNA, our algorithm proficiently identifies distinct relative copy number states from individual segment profiles. Its compatibility with most CNA measurement platforms makes it suitable for large-scale integrative data analysis. We confirmed its performance on both simulated and sample-derived data from The Cancer Genome Atlas reference dataset, and we demonstrated its utility in integrating heterogeneous segment profiles from different data sources and measurement platforms. Our comparative and integrative analysis revealed common SCNA patterns in cancer and protein-coding genes with a strong correlation between SCNA and messenger RNA expression, promoting the investigation into the role of SCNA in cancer development.


Subject(s)
DNA Copy Number Variations , Neoplasms , Humans , Neoplasms/genetics , Algorithms , Cluster Analysis , Data Analysis
SELECTION OF CITATIONS
SEARCH DETAIL